Code Monkey home page Code Monkey logo

opendutchwordnet's Issues

Can only find hyperonym relations

s = set()
for synset in instance.synsets_get_generator():
    for relation in synset.get_all_relations():
        s.add(r.get_reltype())

print(s)
{'has_hyperonym'}

Add proper install

I would like to see a proper install script (i.e. a setup.py) so that I can use the wordnet in my own programs without having to explicitly copy the sources somewhere. Is there a reason this wasn't done for this project?

8000+ synonyms for some words

Some words yield more than 8000 synonyms. Try:

from OpenDutchWordnet import Wn_grid_parser
instance = Wn_grid_parser(Wn_grid_parser.odwn)
syns = instance.les_lemma_synonyms('generiek')
print(len(syns), list(syns)[:20], '...')
8812 ['wildkamperen', 'indiscretie', 'inbegrip', 'Noordzee', 'borstwijdte', 'Turkmeense', 'Moskou', 'onvruchtbaarheid', 'augurk', 'expresweg', 'saloondeuren', 'oud-minister', 'raclette', 'samenpakken', 'Armenië', 'werkloosheidsprobleem', 'Zevengebergte', 'luchtledige', 'spellingwijziging', 'treinstaking'] ...
syns = instance.les_lemma_synonyms('kalender')
print(len(syns), list(syns)[:20], '...')
8817 ['wildkamperen', 'indiscretie', 'inbegrip', 'Noordzee', 'borstwijdte', 'Turkmeense', 'Moskou', 'onvruchtbaarheid', 'augurk', 'expresweg', 'saloondeuren', 'oud-minister', 'raclette', 'samenpakken', 'Armenië', 'werkloosheidsprobleem', 'Zevengebergte', 'luchtledige', 'spellingwijziging', 'treinstaking'] ...
syns = instance.les_lemma_synonyms('post')
print(len(syns), list(syns)[:20], '...')
8834 ['wildkamperen', 'indiscretie', 'inbegrip', 'Noordzee', 'borstwijdte', 'Turkmeense', 'Moskou', 'onvruchtbaarheid', 'augurk', 'expresweg', 'saloondeuren', 'oud-minister', 'raclette', 'samenpakken', 'Armenië', 'werkloosheidsprobleem', 'Zevengebergte', 'luchtledige', 'spellingwijziging', 'treinstaking'] ...

Most words work fine (in al list of 90 words, 10 yield the 8000+, the others not, without any apparent logic). The xml entries in odwn_orbn_gwg-LMF_1.3.xml.gz look fine (to me) too. Is this a bug in les_lemma_synonyms?

Regards,

Marc

Compatibility with Princeton wordnet

It would be handy if the database was provided in the same format as other wordnet databases. If this were the case, already existing interfaces, such as the wordnet-cli/wn interfaces provided by the Princeton WordNet project, could be use together with this Dutch database. This will be really beneficial for the scripting community at large; e.g. to my knowledge, there is no good offline dutch traditional dictionary database (except maybe this one) but it's quite hard to properly integrate the Dutch database as it exists now into existing dictionary lookup programs. Same for other purposes you might think of. As it stands now, the current database---at a glance at least---seems kinda scattered and to be made out of different kind of files in different formats.

For example, the Princeton wordnet program contains a directory structure like this (and theoretically already allows you to use different "dictionaries"):

...
/usr/share/wordnet/dict/adj.exc
/usr/share/wordnet/dict/adv.exc
/usr/share/wordnet/dict/cntlist
/usr/share/wordnet/dict/data.adj
/usr/share/wordnet/dict/data.adv
/usr/share/wordnet/dict/data.noun
/usr/share/wordnet/dict/data.verb
/usr/share/wordnet/dict/noun.exc
...

Change directory structure

Currently, the directory structure of the project is set up in such a way that I have to explicitly do cd .. to use the examples in the README.MD file. This could be solved by moving the sources to another directory (typically the name of the project, i.e. opendutchwordnet for this project). This would also solve the docs and the sources being jumbled together.

Inconsistent parts of speech

Compare:

>>> x = instance.synsets_get_generator()
>>> s = next(x)
>>> s.get_pos()
'n'

With:

>>> le_el = instance.les_find_le("havenplaats-n-1")
>>> le_el.get_pos()
'noun'

Problem installing/using

I tried installing OpenDutchWordnet, using the install.sh (or actually create_virtual_env.sh). The script succeeds, but if I run the example code, >>> from OpenDutchWordnet import Wn_grid_parser I get ImportError: No module named 'OpenDutchWordnet'.

If I look at the script, it seems no module is installed, it just creates a virtualenv and installs lxml from requirements.txt. Also, there is no setup.py.

What am I missing?

Get hypernyms

Hi,

I am not able to find hypernyms of a certain lemma.
With Wordnet, it's possible to do
dog = wn.synset('dog.n.01') dog.hypernyms()
Resulting in:
[Synset('canine.n.02'), Synset('domestic_animal.n.01')]

Is there a similar method for the Dutch Wordnet?

Thank you in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.