cltl / opendutchwordnet Goto Github PK
View Code? Open in Web Editor NEWThis repo provides a python module to work with Open Dutch WordNet. It was created using python 3.4.
License: Other
This repo provides a python module to work with Open Dutch WordNet. It was created using python 3.4.
License: Other
s = set()
for synset in instance.synsets_get_generator():
for relation in synset.get_all_relations():
s.add(r.get_reltype())
print(s)
{'has_hyperonym'}
I would like to see a proper install script (i.e. a setup.py
) so that I can use the wordnet in my own programs without having to explicitly copy the sources somewhere. Is there a reason this wasn't done for this project?
Some words yield more than 8000 synonyms. Try:
from OpenDutchWordnet import Wn_grid_parser
instance = Wn_grid_parser(Wn_grid_parser.odwn)
syns = instance.les_lemma_synonyms('generiek')
print(len(syns), list(syns)[:20], '...')
8812 ['wildkamperen', 'indiscretie', 'inbegrip', 'Noordzee', 'borstwijdte', 'Turkmeense', 'Moskou', 'onvruchtbaarheid', 'augurk', 'expresweg', 'saloondeuren', 'oud-minister', 'raclette', 'samenpakken', 'Armenië', 'werkloosheidsprobleem', 'Zevengebergte', 'luchtledige', 'spellingwijziging', 'treinstaking'] ...
syns = instance.les_lemma_synonyms('kalender')
print(len(syns), list(syns)[:20], '...')
8817 ['wildkamperen', 'indiscretie', 'inbegrip', 'Noordzee', 'borstwijdte', 'Turkmeense', 'Moskou', 'onvruchtbaarheid', 'augurk', 'expresweg', 'saloondeuren', 'oud-minister', 'raclette', 'samenpakken', 'Armenië', 'werkloosheidsprobleem', 'Zevengebergte', 'luchtledige', 'spellingwijziging', 'treinstaking'] ...
syns = instance.les_lemma_synonyms('post')
print(len(syns), list(syns)[:20], '...')
8834 ['wildkamperen', 'indiscretie', 'inbegrip', 'Noordzee', 'borstwijdte', 'Turkmeense', 'Moskou', 'onvruchtbaarheid', 'augurk', 'expresweg', 'saloondeuren', 'oud-minister', 'raclette', 'samenpakken', 'Armenië', 'werkloosheidsprobleem', 'Zevengebergte', 'luchtledige', 'spellingwijziging', 'treinstaking'] ...
Most words work fine (in al list of 90 words, 10 yield the 8000+, the others not, without any apparent logic). The xml entries in odwn_orbn_gwg-LMF_1.3.xml.gz look fine (to me) too. Is this a bug in les_lemma_synonyms?
Regards,
Marc
Using the Wordnet for English you can look for a word, e.g., car
and it will tell you that it is a noun. Is there a similar function in this module?
It would be handy if the database was provided in the same format as other wordnet databases. If this were the case, already existing interfaces, such as the wordnet-cli
/wn
interfaces provided by the Princeton WordNet project, could be use together with this Dutch database. This will be really beneficial for the scripting community at large; e.g. to my knowledge, there is no good offline dutch traditional dictionary database (except maybe this one) but it's quite hard to properly integrate the Dutch database as it exists now into existing dictionary lookup programs. Same for other purposes you might think of. As it stands now, the current database---at a glance at least---seems kinda scattered and to be made out of different kind of files in different formats.
For example, the Princeton wordnet program contains a directory structure like this (and theoretically already allows you to use different "dictionaries"):
...
/usr/share/wordnet/dict/adj.exc
/usr/share/wordnet/dict/adv.exc
/usr/share/wordnet/dict/cntlist
/usr/share/wordnet/dict/data.adj
/usr/share/wordnet/dict/data.adv
/usr/share/wordnet/dict/data.noun
/usr/share/wordnet/dict/data.verb
/usr/share/wordnet/dict/noun.exc
...
Currently, the directory structure of the project is set up in such a way that I have to explicitly do cd ..
to use the examples in the README.MD
file. This could be solved by moving the sources to another directory (typically the name of the project, i.e. opendutchwordnet
for this project). This would also solve the docs and the sources being jumbled together.
Compare:
>>> x = instance.synsets_get_generator()
>>> s = next(x)
>>> s.get_pos()
'n'
With:
>>> le_el = instance.les_find_le("havenplaats-n-1")
>>> le_el.get_pos()
'noun'
I tried installing OpenDutchWordnet, using the install.sh
(or actually create_virtual_env.sh
). The script succeeds, but if I run the example code, >>> from OpenDutchWordnet import Wn_grid_parser
I get ImportError: No module named 'OpenDutchWordnet'
.
If I look at the script, it seems no module is installed, it just creates a virtualenv and installs lxml from requirements.txt. Also, there is no setup.py
.
What am I missing?
Hi,
I am not able to find hypernyms of a certain lemma.
With Wordnet, it's possible to do
dog = wn.synset('dog.n.01') dog.hypernyms()
Resulting in:
[Synset('canine.n.02'), Synset('domestic_animal.n.01')]
Is there a similar method for the Dutch Wordnet?
Thank you in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.