cltl / opendutchwordnet Goto Github PK

This repo provides a python module to work with Open Dutch WordNet. It was created using python 3.4.

License: Other

Python 10.17% Shell 0.28% HTML 83.41% CSS 1.48% JavaScript 0.98% TeX 0.03% Jupyter Notebook 3.65%

opendutchwordnet's Issues

Can only find hyperonym relations

s = set()
for synset in instance.synsets_get_generator():
    for relation in synset.get_all_relations():
        s.add(r.get_reltype())

print(s)
{'has_hyperonym'}

Add proper install

I would like to see a proper install script (i.e. a setup.py) so that I can use the wordnet in my own programs without having to explicitly copy the sources somewhere. Is there a reason this wasn't done for this project?

How do I obtain synonyms?

8000+ synonyms for some words

Some words yield more than 8000 synonyms. Try:

from OpenDutchWordnet import Wn_grid_parser
instance = Wn_grid_parser(Wn_grid_parser.odwn)
syns = instance.les_lemma_synonyms('generiek')
print(len(syns), list(syns)[:20], '...')
8812 ['wildkamperen', 'indiscretie', 'inbegrip', 'Noordzee', 'borstwijdte', 'Turkmeense', 'Moskou', 'onvruchtbaarheid', 'augurk', 'expresweg', 'saloondeuren', 'oud-minister', 'raclette', 'samenpakken', 'Armenië', 'werkloosheidsprobleem', 'Zevengebergte', 'luchtledige', 'spellingwijziging', 'treinstaking'] ...
syns = instance.les_lemma_synonyms('kalender')
print(len(syns), list(syns)[:20], '...')
8817 ['wildkamperen', 'indiscretie', 'inbegrip', 'Noordzee', 'borstwijdte', 'Turkmeense', 'Moskou', 'onvruchtbaarheid', 'augurk', 'expresweg', 'saloondeuren', 'oud-minister', 'raclette', 'samenpakken', 'Armenië', 'werkloosheidsprobleem', 'Zevengebergte', 'luchtledige', 'spellingwijziging', 'treinstaking'] ...
syns = instance.les_lemma_synonyms('post')
print(len(syns), list(syns)[:20], '...')
8834 ['wildkamperen', 'indiscretie', 'inbegrip', 'Noordzee', 'borstwijdte', 'Turkmeense', 'Moskou', 'onvruchtbaarheid', 'augurk', 'expresweg', 'saloondeuren', 'oud-minister', 'raclette', 'samenpakken', 'Armenië', 'werkloosheidsprobleem', 'Zevengebergte', 'luchtledige', 'spellingwijziging', 'treinstaking'] ...

Most words work fine (in al list of 90 words, 10 yield the 8000+, the others not, without any apparent logic). The xml entries in odwn_orbn_gwg-LMF_1.3.xml.gz look fine (to me) too. Is this a bug in les_lemma_synonyms?

Regards,

Marc

Check if a word is a noun or not

Using the Wordnet for English you can look for a word, e.g., car and it will tell you that it is a noun. Is there a similar function in this module?

Compatibility with Princeton wordnet

It would be handy if the database was provided in the same format as other wordnet databases. If this were the case, already existing interfaces, such as the wordnet-cli/wn interfaces provided by the Princeton WordNet project, could be use together with this Dutch database. This will be really beneficial for the scripting community at large; e.g. to my knowledge, there is no good offline dutch traditional dictionary database (except maybe this one) but it's quite hard to properly integrate the Dutch database as it exists now into existing dictionary lookup programs. Same for other purposes you might think of. As it stands now, the current database---at a glance at least---seems kinda scattered and to be made out of different kind of files in different formats.

For example, the Princeton wordnet program contains a directory structure like this (and theoretically already allows you to use different "dictionaries"):

...
/usr/share/wordnet/dict/adj.exc
/usr/share/wordnet/dict/adv.exc
/usr/share/wordnet/dict/cntlist
/usr/share/wordnet/dict/data.adj
/usr/share/wordnet/dict/data.adv
/usr/share/wordnet/dict/data.noun
/usr/share/wordnet/dict/data.verb
/usr/share/wordnet/dict/noun.exc
...

Can you also use ODWN within a Java program?

Change directory structure

Currently, the directory structure of the project is set up in such a way that I have to explicitly do cd .. to use the examples in the README.MD file. This could be solved by moving the sources to another directory (typically the name of the project, i.e. opendutchwordnet for this project). This would also solve the docs and the sources being jumbled together.

'grondtoon' wrongly marked as hyponym of 'kleur'

Inconsistent parts of speech

Compare:

>>> x = instance.synsets_get_generator()
>>> s = next(x)
>>> s.get_pos()
'n'

With:

>>> le_el = instance.les_find_le("havenplaats-n-1")
>>> le_el.get_pos()
'noun'

Problem installing/using

I tried installing OpenDutchWordnet, using the install.sh (or actually create_virtual_env.sh). The script succeeds, but if I run the example code, >>> from OpenDutchWordnet import Wn_grid_parser I get ImportError: No module named 'OpenDutchWordnet'.

If I look at the script, it seems no module is installed, it just creates a virtualenv and installs lxml from requirements.txt. Also, there is no setup.py.

What am I missing?

Get hypernyms

Hi,

I am not able to find hypernyms of a certain lemma.
With Wordnet, it's possible to do
dog = wn.synset('dog.n.01') dog.hypernyms()
Resulting in:
[Synset('canine.n.02'), Synset('domestic_animal.n.01')]

Is there a similar method for the Dutch Wordnet?

Thank you in advance!

cltl / opendutchwordnet Goto Github PK

opendutchwordnet's Issues

Can only find hyperonym relations

Add proper install

How do I obtain synonyms?

8000+ synonyms for some words

Check if a word is a noun or not

Compatibility with Princeton wordnet

Can you also use ODWN within a Java program?

Change directory structure

'grondtoon' wrongly marked as hyponym of 'kleur'

Inconsistent parts of speech

Problem installing/using

Get hypernyms

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent