alvations / pywsd Goto Github PK
View Code? Open in Web Editor NEWPython Implementations of Word Sense Disambiguation (WSD) Technologies.
License: MIT License
Python Implementations of Word Sense Disambiguation (WSD) Technologies.
License: MIT License
I am just looking into pywsd. It looks very interesting.
I tried the sample with Lesk as per Usage.
I get :
Python 2.7.6 (default, Sep 9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
from pywsd.lesk import simple_lesk
sent = 'I went to the bank to deposit my money'
ambiguous = 'bank'
answer = simple_lesk(sent, ambiguous, nbest=True, keepscore=True)
print answer
[(2, Synset('deposit.v.02')), (2, Synset('bank.n.09')), .......
print answer[0][1].definition()
put into a bank account
print answer[1][1].definition()
a building in which the business of banking transacted
instead of
print answer.definition()
a financial institution that accepts deposits and channels the money into lending activities
I just downloaded and installed NLTK this morning :
import nltk;
print nltk.version
3.0.5
Any idea about what could be going on ?
Right now in baseline.py, the seed is set for the global Random() instance. Therefore the seed will be shared between all other imports of random and will cause issues if the person using pywsd is not aware of it.
Line 11 in d650c4e
Python random Documentation - https://docs.python.org/2/library/random.html
The functions supplied by this module are actually bound methods of a hidden instance of the random.Random class. You can instantiate your own instances of Random to get generators that don’t share state.
import random
custom_random = random.Random(0)
s = "would sentiment"
disambiguate(s, algorithm=maxsim, similarity_option='path', keepLemmas=True)
IndexError Traceback (most recent call last)
in ()
1 s = "would sentiment"
----> 2 disambiguate(s, algorithm=maxsim, similarity_option='path', keepLemmas=True)
1 frames
/usr/local/lib/python3.6/dist-packages/pywsd/allwords_wsd.py in disambiguate(sentence, algorithm, context_is_lemmatized, similarity_option, keepLemmas, prefersNone, from_cache, tokenizer)
43 synset = algorithm(lemma_sentence, lemma, from_cache=from_cache)
44 elif algorithm == max_similarity:
---> 45 synset = algorithm(lemma_sentence, lemma, pos=pos, option=similarity_option)
46 else:
47 synset = algorithm(lemma_sentence, lemma, pos=pos, context_is_lemmatized=True,
/usr/local/lib/python3.6/dist-packages/pywsd/similarity.py in max_similarity(context_sentence, ambiguous_word, option, lemma, context_is_lemmatized, pos, best)
125 result = sorted([(v,k) for k,v in result.items()],reverse=True)
126
--> 127 return result[0][1] if best else result
IndexError: list index out of range
Hi,
I was wondering if here https://github.com/alvations/pywsd/blob/master/pywsd/similarity.py#L22-L23
the senses should be flipped in the second argument of the max function. I.e.:
return max(wn.path_similarity(sense1,sense2), wn.path_similarity(sense2,sense1))
Also, when using Python3 this call tends to crash if there exists no path. Then wn.path_similarity returns None and max(None, 1) throws an exception in Python3.
This is e.g. the case for Synset('bank.n.01') and Synset('one.s.01') which is checked when running
max_similarity('I went to the bank to deposit my money', 'bank', 'path', pos='n')
context_sentence = lemmatize_sentence(sentence)
NameError: global name 'sentence' is not defined
Check line 196 and 218 in Lesk.py
Replace "".split()
with nltk.word_tokenizer()
Hi,
It seems pywsd
is not compatible with Python 3.
If I run from pywsd.lesk import simple_lesk
in Python 3, the following error is given:
Traceback (most recent call last):
File "wsd.py", line 28, in
from pywsd.lesk import simple_lesk
File "/home/coiby/nlp/pywsd/pywsd/init.py", line 9, in
import lesk
ImportError: No module named 'lesk'
I manually add pywsd to path
(sys.path.append(os.path.join(os.path.dirname(__file__), 'pywsd'))
), another issue occurs:
Traceback (most recent call last):
File "wsd.py", line 50, in
answer = simple_lesk(sent, ambiguous)
File "/home/coiby/nlp/pywsd/pywsd/lesk.py", line 151, in simple_lesk
context_sentence = lemmatize_sentence(context_sentence)
File "pywsd/utils.py", line 104, in lemmatize_sentence
for word, pos in postagger(tokenizer(sentence)):
File "/usr/local/lib/python3.4/dist-packages/nltk/tokenize/init.py", line 104, in word_tokenize
return [token for sent in sent_tokenize(text, language)
File "/usr/local/lib/python3.4/dist-packages/nltk/tokenize/init.py", line 89, in sent_tokenize
return tokenizer.tokenize(text)
File "/usr/local/lib/python3.4/dist-packages/nltk/tokenize/punkt.py", line 1226, in tokenize
return list(self.sentences_from_text(text, realign_boundaries))
File "/usr/local/lib/python3.4/dist-packages/nltk/tokenize/punkt.py", line 1274, in sentences_from_text
return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
File "/usr/local/lib/python3.4/dist-packages/nltk/tokenize/punkt.py", line 1265, in span_tokenize
return [(sl.start, sl.stop) for sl in slices]
File "/usr/local/lib/python3.4/dist-packages/nltk/tokenize/punkt.py", line 1265, in
return [(sl.start, sl.stop) for sl in slices]
File "/usr/local/lib/python3.4/dist-packages/nltk/tokenize/punkt.py", line 1304, in _realign_boundaries
for sl1, sl2 in _pair_iter(slices):
File "/usr/local/lib/python3.4/dist-packages/nltk/tokenize/punkt.py", line 310, in _pair_iter
prev = next(it)
File "/usr/local/lib/python3.4/dist-packages/nltk/tokenize/punkt.py", line 1278, in _slices_from_text
for match in self._lang_vars.period_context_re().finditer(text):
TypeError: can't use a string pattern on a bytes-like object
Have there been any studies quantifying the most accurate WSD algorithm over generalized content (any genre)? I've ruled out any information-content approaches since they most likely only work well on input similar to the corpus on which they were trained. Therefore I'd be interested in a comparison between any of the lesk algorithms and any of the max-path-similarity algorithms.
For a general thesaurus plugin which algorithm do you think I should use?
max_similarity(context_sentence="art entertainment hobby creative art",
ambiguous_word='creative', option="path", lemma=True,
context_is_lemmatized=True, pos='n', best=True)
The above function call throws an exception
IndexError Traceback (most recent call last)
<ipython-input-63-3333bb3d5eca> in <module>()
----> 1 max_similarity(context_sentence="art entertainment hobby creative art", ambiguous_word='creative', option="path", lemma=True, context_is_lemmatized=True, pos='n', best=True)
/root/anaconda2/lib/python2.7/site-packages/pywsd/similarity.pyc in max_similarity(context_sentence, ambiguous_word, option, lemma, context_is_lemmatized, pos, best)
106 result = sorted([(v,k) for k,v in result.items()],reverse=True)
107 ##print result
--> 108 if best: return result[0][1];
109 return result
110
IndexError: list index out of range
It works when i remove the pos argument.
hi
how to install pywsd for python 3 via pip? I am getting error when i run from pywsd.lesk import simple_lesk
ImportError: No module named 'lesk'
Good afternoon,
I was wondering if it would be possible to adapt this tool to other languages such as French or Spanish.
If it is feasible, could you give me some indications on how to do these modifications?
Thank you very much!
Hi,
I installed pywsd using pip in an Anaconda environment. Now I am trying to use the lesk algorithm using signatures computed with WordNet 3.0. To do this I used Precompute Signatures.ipynb where I specified the wordnet_30_dir parameter to generate the signatures. I then copy the generated signatures.pkl file into the lib directory of the installed pywsd and replace the default file that came with the installation.
My code then fails at this line: from pywsd import disambiguate
With the following error: KeyError: 'simple'
Re-running the same line goes through the previous error and produces the following error:
AttributeError: module 'pywsd' has no attribute 'lesk'
I'm trying to start using wsdpy, but with running test_wsd.py or any simple example I get error like "in File lesk.py line 116, in simple_signature
signature+= list(chain(*[i.lemma_names() for i in ss_hypohypernyms]))
TypeError: 'list' object is not callable" and other errors. I don't know whether I missed something or it is a bug.
There's a need to check through whether disambiguate is doing the same thing as calling lesk individually.
as you recommend I try to install
'averaged_perceptron_tagger'
but windows 10 computer gives error
Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation. All rights reserved.
e:\nltk_data>python -m nltk.downloader
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
e:\nltk_data>python -m nltk.downloader 'averaged_perceptron_tagger'
[nltk_data] Error loading 'averaged_perceptron_tagger': Package
[nltk_data] "'averaged_perceptron_tagger'" not found in index
Error installing package. Retry? [n/y/e]
y
Traceback (most recent call last):
File "C:\Users\cde3\Anaconda3\lib\runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "C:\Users\cde3\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\cde3\Anaconda3\lib\site-packages\nltk\downloader.py", line 2268, in
halt_on_error=options.halt_on_error)
File "C:\Users\cde3\Anaconda3\lib\site-packages\nltk\downloader.py", line 677, in download
if not self.download(msg.package.id, download_dir,
AttributeError: 'NoneType' object has no attribute 'id'
e:\nltk_data>
Since you explicitly use nltk.wordnet
calls I have no idea why you code does not work, but here you go:
from pywsd.similarity import similarity_by_infocontent as sim
sim(syn,syn1,'res')
0
from nltk.corpus import wordnet as wn
from nltk.corpus import wordnet_ic as wic
resnik = wic.ic('ic-bnc-resnik.dat')
wn.res_similarity(syn,syn1,resnik)
1.5972986298343528
Unfortunately that's true not only for "Resnik" but for any other sim method.
Now pywsd
is doing WSD for each word given a context sentence. To do all-words WSD, one has to do something like test_allwords_wsd.py
.
Is it possible to automate this process such that users can do:
from nltk.corpus import brown
from pywsd.allwords_wsd import wsd
>>> for sent in brown.sents():
... print wsd(sent)
... break
[(u'the', '#STOPWORD/PUNCTUATION#'), (u'fulton', Synset('fulton.n.01')), (u'county', Synset('county.n.02')), (u'grand', Synset('thousand.n.01')), (u'jury', Synset('jury.n.01')), (u'said', Synset('state.v.01')), (u'friday', Synset('friday.n.01')), (u'an', '#STOPWORD/PUNCTUATION#'), (u'investigation', Synset('probe.n.01')), (u'of', '#STOPWORD/PUNCTUATION#'), (u'atlanta', Synset('atlanta.n.02')), (u"'s", '#NOT_IN_WN#'), (u'recent', Synset('recent.s.01')), (u'primary', Synset('primary.n.01')), (u'election', Synset('election.n.01')), (u'produced', Synset('produce.v.04')), (u'``', '#NOT_IN_WN#'), (u'no', '#STOPWORD/PUNCTUATION#'), (u'evidence', Synset('testify.v.02')), (u"''", '#NOT_IN_WN#'), (u'that', '#STOPWORD/PUNCTUATION#'), (u'any', '#STOPWORD/PUNCTUATION#'), (u'irregularity', Synset('irregularity.n.03')), (u'took', Synset('take.v.41')), (u'place', Synset('stead.n.01')), (u'.', '#STOPWORD/PUNCTUATION#')]
Code-sprint from 28th Sept - 2nd Oct 2014
Scheduled release with semeval data + supervised methods: 3rd Oct 2014
sklearn
The End Of Life of Python 2.7 is 2020. https://pythonclock.org/
We're going to ahead and peel the band-aid fast... So we're going to fast forward and never support Python 2.7 from March onwards and we'll see our CI test passing again =)
From 1.2 onwards all Python 2.7 compatible code will be wiped out! ETA: 15 Mar 2019
Anyone still depending on Python 2.7, you'll be stuck with the last stable version 1.1.7
I propose using Morphy instead of PorterStemmer
It receives an optional POS, which can easily be derived from nltk's pos_tag
tagger on the context sentence. I'm not sure how it works without the POS, especially since off the top of my head I'd be hard-pressed to find a word with different lemmas depending on the POS.
The test_*.py
are nice examples on how to use pywsd
but it's time to document the toolkit.
Love the tool! Super helpful. However, it bugs out if you try to run the maxsim disambiguation on a sentence where the wn.sysnet pos doesn't match the NLTK tagged pos.
Try running
sen = 'these potato chips are great'
disambiguate(sen, algorithm=maxsim)
and you get an index out of range error because result
in max_similarity
in similarity.py is []
, because wn.synsets(ambiguous_word, pos=pos)
is nothing as NLTK has (incorrectly) decided the part of speach of 'Potato' is an adjective, and there's no synset for that.
A very simple fix- change line 114 from
for i in wn.synsets(ambiguous_word, pos=pos):
to
for i in wn.synsets(ambiguous_word, pos=pos) or wn.synsets(ambiguous_word):
to provide a fallback option
`def similarity_by_path(sense1, sense2, option="path"):
""" Returns maximum path similarity between two senses. """
if option.lower() in ["path", "path_similarity"]: # Path similaritys
return max(wn.path_similarity(sense1,sense2),
wn.path_similarity(sense1,sense2))`
Error in max(wn.path_similarity(sense1,sense2), wn.path_similarity(sense1,sense2))
finding max between same functions
Synset.definition
to a function (i.e. Synset.definition()
).None
.>>> from pywsd.lesk import simple_lesk, original_lesk
>>> sent = "people should be able to marry a person of their choice"
>>> original_lesk(sent, 'able')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pywsd/lesk.py", line 118, in original_lesk
in wn.synsets(ambiguous_word)}
File "pywsd/lesk.py", line 117, in <dictcomp>
dictionary = {ss:ss.definition.split() for ss \
AttributeError: 'function' object has no attribute 'split'
>>> simple_lesk(sent, 'able')
Synset('able.s.03')
>>> simple_lesk(sent, 'able', pos='s')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pywsd/lesk.py", line 181, in simple_lesk
normalizescore=normalizescore)
File "pywsd/lesk.py", line 107, in compare_overlaps
return ranked_synsets[0]
IndexError: list index out of range
>>> simple_lesk(sent, 'able', pos='a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pywsd/lesk.py", line 181, in simple_lesk
normalizescore=normalizescore)
File "pywsd/lesk.py", line 107, in compare_overlaps
return ranked_synsets[0]
IndexError: list index out of range
This is an amazing library, not only because it agglomerates many disparate methods, but because it's easy to read. For someone without any NLP experience its a great way to learn more about WSD algorithms. In any case I like having all these different methods in a single library. However my application needs ranked synsets, so it would be great if pywsd would return the ranking, and leaving the burden of selecting the most appropriate sense to the user (for all algorithms that return rankings). Something like:
>>> answer = simple_lesk(sent, ambiguous)
>>> print answer
>>> {Synset('...'): 0.1, Synset('...'): 0.5, Synset('...'): 0.3, Synset('...'): 0.7}
Hi, according with my terminal I have successfully installed pywsd on python 3 (see install log below) however, when I import pyswd from python I get the following error. Can you help me to fix it? Thanks a lot!
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
import pywsd
File "/usr/share/java/pycharm-community/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/pywsd/init.py", line 14, in
from wn import WordNet
File "/usr/share/java/pycharm-community/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/wn/init.py", line 10, in
from wn.constants import *
File "/usr/share/java/pycharm-community/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/wn/constants.py", line 196, in
exception_map = load_exception_map()
File "/usr/local/lib/python3.7/site-packages/wn/constants.py", line 126, in load_exception_map
with open(wordnet_dir+'%s.exc' % suffix) as fin:
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.7/site-packages/wn/data/wordnet-3.3/adj.exc'
########## INSTALL LOG ###########
Collecting pywsd
Downloading https://files.pythonhosted.org/packages/8c/79/39597ff5510a63f44c9d4ce2f6a8200bbb1ae9c7b50ef90fe1f851f2c10d/pywsd-1.2.1.tar.gz (23.7MB)
100% |████████████████████████████████| 23.7MB 1.3MB/s
Requirement already satisfied: nltk in /usr/local/lib/python3.7/site-packages (from pywsd) (3.4.4)
Requirement already satisfied: numpy in /usr/local/lib64/python3.7/site-packages (from pywsd) (1.16.4)
Collecting pandas (from pywsd)
Downloading https://files.pythonhosted.org/packages/7e/ab/ea76361f9d3e732e114adcd801d2820d5319c23d0ac5482fa3b412db217e/pandas-0.25.1-cp37-cp37m-manylinux1_x86_64.whl (10.4MB)
100% |████████████████████████████████| 10.4MB 1.2MB/s
Collecting wn (from pywsd)
Downloading https://files.pythonhosted.org/packages/c4/ee/171109f853370256cce3fc10e2574bc4b4165503332e1c327217f855bf92/wn-0.0.20.tar.gz (12.0MB)
100% |████████████████████████████████| 12.1MB 2.9MB/s
Requirement already satisfied: six in /usr/lib/python3.7/site-packages (from pywsd) (1.11.0)
Requirement already satisfied: python-dateutil>=2.6.1 in /usr/lib/python3.7/site-packages (from pandas->pywsd) (2.7.5)
Requirement already satisfied: pytz>=2017.2 in /usr/lib/python3.7/site-packages (from pandas->pywsd) (2018.5)
Building wheels for collected packages: pywsd, wn
Running setup.py bdist_wheel for pywsd ... done
Stored in directory: /root/.cache/pip/wheels/0f/44/85/3829bb6c6188f30e13ba8981e8038c61db494a9788ea3bed01
Running setup.py bdist_wheel for wn ... done
Stored in directory: /root/.cache/pip/wheels/80/68/3b/f1101703d1b65ef59fb45b1e4d2623d8329349785304db5fa2
Successfully built pywsd wn
Installing collected packages: pandas, wn, pywsd
Successfully installed pandas-0.25.1 pywsd-1.2.1 wn-0.0.20
Since a contextual sentence is provided, it might be a good idea to run a POS tagger and filter only senses with matching POS before running any WSD algorithms.
In NLTK there exist classes that support loading WordNet-like objects for other languages. Where does your library depend on WordNet?
If the dependency is explicit, it might be possible to easily extend your work for other languages.
After installing pywsd with the command pip3 install --user pywsd
I get the following error when importing the module in python3.
>>> from pywsd.similarity import max_similarity
Warming up PyWSD (takes ~10 secs)... Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/anon/.local/lib/python3.6/site-packages/pywsd/__init__.py", line 19, in <module>
import pywsd.lesk
File "/home/anon/.local/lib/python3.6/site-packages/pywsd/lesk.py", line 19, in <module>
from pywsd.utils import word_tokenize
File "/home/anon/.local/lib/python3.6/site-packages/pywsd/utils.py", line 20, in <module>
_treebank_word_tokenizer.STARTING_QUOTES.insert(0, (improved_open_quote_regex, r' \1 '))
AttributeError: 'TreebankWordTokenizer' object has no attribute 'STARTING_QUOTES'
I have python 2.7 and i work on windows10 operating system. I installed the lesk
library as per documentation. When i try to import the module using
from pywsd.lesk import simple_lesk
I end up getting error
Warming up PyWSD (takes ~10 secs)...
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
c:\python35\lib\site-packages\pandas\io\pickle.py in try_read(path, encoding)
51 1 1 6
---> 52 2 2 7
53 3 3 8
c:\python35\lib\site-packages\pandas\core\indexes\base.py in <module>()
16 from pandas.core.accessor import CachedAccessor
---> 17 from pandas.core.arrays import ExtensionArray
18 from pandas.core.dtypes.generic import (
c:\python35\lib\site-packages\pandas\core\arrays\__init__.py in <module>()
1 from .base import ExtensionArray # noqa
----> 2 from .categorical import Categorical # noqa
c:\python35\lib\site-packages\pandas\core\arrays\categorical.py in <module>()
13 ABCSeries, ABCIndexClass, ABCCategoricalIndex)
---> 14 from pandas.core.dtypes.missing import isna, notna
15 from pandas.core.dtypes.inference import is_hashable
c:\python35\lib\site-packages\pandas\core\dtypes\missing.py in <module>()
9 ABCExtensionArray)
---> 10 from .common import (is_string_dtype, is_datetimelike,
11 is_datetimelike_v_numeric, is_float_dtype,
c:\python35\lib\site-packages\pandas\core\dtypes\common.py in <module>()
16 ABCIndexClass, ABCDateOffset)
---> 17 from .inference import is_string_like, is_list_like
18 from .inference import * # noqa
c:\python35\lib\site-packages\pandas\core\dtypes\inference.py in <module>()
7 from numbers import Number
----> 8 from pandas.compat import (PY2, string_types, text_type,
9 string_and_binary_types, re_type)
ImportError: cannot import name 're_type'
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
c:\python35\lib\site-packages\pandas\io\pickle.py in try_read(path, encoding)
56
---> 57 >>> unpickled_df = pd.read_pickle("./dummy.pkl")
58 >>> unpickled_df
c:\python35\lib\site-packages\pandas\compat\pickle_compat.py in load(fh, encoding, compat, is_verbose)
116
--> 117 # 19939, add timedeltaindex, float64index compat from 15998 move
118 ('pandas.tseries.tdi', 'TimedeltaIndex'):
c:\python35\lib\pickle.py in load(self)
1038 assert isinstance(key, bytes_types)
-> 1039 dispatch[key[0]](self)
1040 except _Stop as stopinst:
c:\python35\lib\pickle.py in load_global(self)
1333 name = self.readline()[:-1].decode("utf-8")
-> 1334 klass = self.find_class(module, name)
1335 self.append(klass)
c:\python35\lib\pickle.py in find_class(self, module, name)
1383 module = _compat_pickle.IMPORT_MAPPING[module]
-> 1384 __import__(module, level=0)
1385 if self.proto >= 4:
c:\python35\lib\site-packages\pandas\core\indexes\base.py in <module>()
16 from pandas.core.accessor import CachedAccessor
---> 17 from pandas.core.arrays import ExtensionArray
18 from pandas.core.dtypes.generic import (
c:\python35\lib\site-packages\pandas\core\arrays\__init__.py in <module>()
1 from .base import ExtensionArray # noqa
----> 2 from .categorical import Categorical # noqa
c:\python35\lib\site-packages\pandas\core\arrays\categorical.py in <module>()
13 ABCSeries, ABCIndexClass, ABCCategoricalIndex)
---> 14 from pandas.core.dtypes.missing import isna, notna
15 from pandas.core.dtypes.inference import is_hashable
c:\python35\lib\site-packages\pandas\core\dtypes\missing.py in <module>()
9 ABCExtensionArray)
---> 10 from .common import (is_string_dtype, is_datetimelike,
11 is_datetimelike_v_numeric, is_float_dtype,
c:\python35\lib\site-packages\pandas\core\dtypes\common.py in <module>()
16 ABCIndexClass, ABCDateOffset)
---> 17 from .inference import is_string_like, is_list_like
18 from .inference import * # noqa
c:\python35\lib\site-packages\pandas\core\dtypes\inference.py in <module>()
7 from numbers import Number
----> 8 from pandas.compat import (PY2, string_types, text_type,
9 string_and_binary_types, re_type)
ImportError: cannot import name 're_type'
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
c:\python35\lib\site-packages\pandas\io\pickle.py in read_pickle(path)
64 4 4 9
---> 65
66 >>> import os
c:\python35\lib\site-packages\pandas\io\pickle.py in try_read(path, encoding)
61 1 1 6
---> 62 2 2 7
63 3 3 8
c:\python35\lib\site-packages\pandas\compat\pickle_compat.py in load(fh, encoding, compat, is_verbose)
116
--> 117 # 19939, add timedeltaindex, float64index compat from 15998 move
118 ('pandas.tseries.tdi', 'TimedeltaIndex'):
c:\python35\lib\pickle.py in load(self)
1038 assert isinstance(key, bytes_types)
-> 1039 dispatch[key[0]](self)
1040 except _Stop as stopinst:
c:\python35\lib\pickle.py in load_global(self)
1333 name = self.readline()[:-1].decode("utf-8")
-> 1334 klass = self.find_class(module, name)
1335 self.append(klass)
c:\python35\lib\pickle.py in find_class(self, module, name)
1383 module = _compat_pickle.IMPORT_MAPPING[module]
-> 1384 __import__(module, level=0)
1385 if self.proto >= 4:
c:\python35\lib\site-packages\pandas\core\indexes\base.py in <module>()
16 from pandas.core.accessor import CachedAccessor
---> 17 from pandas.core.arrays import ExtensionArray
18 from pandas.core.dtypes.generic import (
c:\python35\lib\site-packages\pandas\core\arrays\__init__.py in <module>()
1 from .base import ExtensionArray # noqa
----> 2 from .categorical import Categorical # noqa
c:\python35\lib\site-packages\pandas\core\arrays\categorical.py in <module>()
13 ABCSeries, ABCIndexClass, ABCCategoricalIndex)
---> 14 from pandas.core.dtypes.missing import isna, notna
15 from pandas.core.dtypes.inference import is_hashable
c:\python35\lib\site-packages\pandas\core\dtypes\missing.py in <module>()
9 ABCExtensionArray)
---> 10 from .common import (is_string_dtype, is_datetimelike,
11 is_datetimelike_v_numeric, is_float_dtype,
c:\python35\lib\site-packages\pandas\core\dtypes\common.py in <module>()
16 ABCIndexClass, ABCDateOffset)
---> 17 from .inference import is_string_like, is_list_like
18 from .inference import * # noqa
c:\python35\lib\site-packages\pandas\core\dtypes\inference.py in <module>()
7 from numbers import Number
----> 8 from pandas.compat import (PY2, string_types, text_type,
9 string_and_binary_types, re_type)
ImportError: cannot import name 're_type'
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
c:\python35\lib\site-packages\pandas\io\pickle.py in try_read(path, encoding)
51 1 1 6
---> 52 2 2 7
53 3 3 8
c:\python35\lib\site-packages\pandas\core\indexes\base.py in <module>()
16 from pandas.core.accessor import CachedAccessor
---> 17 from pandas.core.arrays import ExtensionArray
18 from pandas.core.dtypes.generic import (
c:\python35\lib\site-packages\pandas\core\arrays\__init__.py in <module>()
1 from .base import ExtensionArray # noqa
----> 2 from .categorical import Categorical # noqa
c:\python35\lib\site-packages\pandas\core\arrays\categorical.py in <module>()
13 ABCSeries, ABCIndexClass, ABCCategoricalIndex)
---> 14 from pandas.core.dtypes.missing import isna, notna
15 from pandas.core.dtypes.inference import is_hashable
c:\python35\lib\site-packages\pandas\core\dtypes\missing.py in <module>()
9 ABCExtensionArray)
---> 10 from .common import (is_string_dtype, is_datetimelike,
11 is_datetimelike_v_numeric, is_float_dtype,
c:\python35\lib\site-packages\pandas\core\dtypes\common.py in <module>()
16 ABCIndexClass, ABCDateOffset)
---> 17 from .inference import is_string_like, is_list_like
18 from .inference import * # noqa
c:\python35\lib\site-packages\pandas\core\dtypes\inference.py in <module>()
7 from numbers import Number
----> 8 from pandas.compat import (PY2, string_types, text_type,
9 string_and_binary_types, re_type)
ImportError: cannot import name 're_type'
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
c:\python35\lib\site-packages\pandas\io\pickle.py in try_read(path, encoding)
56
---> 57 >>> unpickled_df = pd.read_pickle("./dummy.pkl")
58 >>> unpickled_df
c:\python35\lib\site-packages\pandas\compat\pickle_compat.py in load(fh, encoding, compat, is_verbose)
116
--> 117 # 19939, add timedeltaindex, float64index compat from 15998 move
118 ('pandas.tseries.tdi', 'TimedeltaIndex'):
c:\python35\lib\pickle.py in load(self)
1038 assert isinstance(key, bytes_types)
-> 1039 dispatch[key[0]](self)
1040 except _Stop as stopinst:
c:\python35\lib\pickle.py in load_global(self)
1333 name = self.readline()[:-1].decode("utf-8")
-> 1334 klass = self.find_class(module, name)
1335 self.append(klass)
c:\python35\lib\pickle.py in find_class(self, module, name)
1383 module = _compat_pickle.IMPORT_MAPPING[module]
-> 1384 __import__(module, level=0)
1385 if self.proto >= 4:
c:\python35\lib\site-packages\pandas\core\indexes\base.py in <module>()
16 from pandas.core.accessor import CachedAccessor
---> 17 from pandas.core.arrays import ExtensionArray
18 from pandas.core.dtypes.generic import (
c:\python35\lib\site-packages\pandas\core\arrays\__init__.py in <module>()
1 from .base import ExtensionArray # noqa
----> 2 from .categorical import Categorical # noqa
c:\python35\lib\site-packages\pandas\core\arrays\categorical.py in <module>()
13 ABCSeries, ABCIndexClass, ABCCategoricalIndex)
---> 14 from pandas.core.dtypes.missing import isna, notna
15 from pandas.core.dtypes.inference import is_hashable
c:\python35\lib\site-packages\pandas\core\dtypes\missing.py in <module>()
9 ABCExtensionArray)
---> 10 from .common import (is_string_dtype, is_datetimelike,
11 is_datetimelike_v_numeric, is_float_dtype,
c:\python35\lib\site-packages\pandas\core\dtypes\common.py in <module>()
16 ABCIndexClass, ABCDateOffset)
---> 17 from .inference import is_string_like, is_list_like
18 from .inference import * # noqa
c:\python35\lib\site-packages\pandas\core\dtypes\inference.py in <module>()
7 from numbers import Number
----> 8 from pandas.compat import (PY2, string_types, text_type,
9 string_and_binary_types, re_type)
ImportError: cannot import name 're_type'
During handling of the above exception, another exception occurred:
ImportError Traceback (most recent call last)
<ipython-input-4-a91d9624c173> in <module>()
1 #from pywsd.lesk import simple_lesk
----> 2 import pywsd.lesk
c:\python35\lib\site-packages\pywsd\__init__.py in <module>()
17 start = time.time()
18
---> 19 import pywsd.lesk
20 import pywsd.baseline
21 import pywsd.similarity
c:\python35\lib\site-packages\pywsd\lesk.py in <module>()
24 EN_STOPWORDS = set(stopwords.words('english') + list(string.punctuation) + pywsd_stopwords)
25 signatures_picklefile = os.path.dirname(os.path.abspath(__file__)) + '/data/signatures/signatures.pkl'
---> 26 cached_signatures = pd.read_pickle(signatures_picklefile)
27
28 def synset_signatures_from_cache(ss, hyperhypo=True, adapted=False, original_lesk=False):
c:\python35\lib\site-packages\pandas\io\pickle.py in read_pickle(path)
66 >>> import os
67 >>> os.remove("./dummy.pkl")
---> 68 """
69 path = _stringify_path(path)
70 inferred_compression = _infer_compression(path, compression)
c:\python35\lib\site-packages\pandas\io\pickle.py in try_read(path, encoding)
60 0 0 5
61 1 1 6
---> 62 2 2 7
63 3 3 8
64 4 4 9
c:\python35\lib\site-packages\pandas\compat\pickle_compat.py in load(fh, encoding, compat, is_verbose)
115 ('pandas.core.arrays', 'Categorical'),
116
--> 117 # 19939, add timedeltaindex, float64index compat from 15998 move
118 ('pandas.tseries.tdi', 'TimedeltaIndex'):
119 ('pandas.core.indexes.timedeltas', 'TimedeltaIndex'),
c:\python35\lib\pickle.py in load(self)
1037 raise EOFError
1038 assert isinstance(key, bytes_types)
-> 1039 dispatch[key[0]](self)
1040 except _Stop as stopinst:
1041 return stopinst.value
c:\python35\lib\pickle.py in load_global(self)
1332 module = self.readline()[:-1].decode("utf-8")
1333 name = self.readline()[:-1].decode("utf-8")
-> 1334 klass = self.find_class(module, name)
1335 self.append(klass)
1336 dispatch[GLOBAL[0]] = load_global
c:\python35\lib\pickle.py in find_class(self, module, name)
1382 elif module in _compat_pickle.IMPORT_MAPPING:
1383 module = _compat_pickle.IMPORT_MAPPING[module]
-> 1384 __import__(module, level=0)
1385 if self.proto >= 4:
1386 return _getattribute(sys.modules[module], name)[0]
c:\python35\lib\site-packages\pandas\core\indexes\base.py in <module>()
15
16 from pandas.core.accessor import CachedAccessor
---> 17 from pandas.core.arrays import ExtensionArray
18 from pandas.core.dtypes.generic import (
19 ABCSeries, ABCDataFrame,
c:\python35\lib\site-packages\pandas\core\arrays\__init__.py in <module>()
1 from .base import ExtensionArray # noqa
----> 2 from .categorical import Categorical # noqa
c:\python35\lib\site-packages\pandas\core\arrays\categorical.py in <module>()
12 from pandas.core.dtypes.generic import (
13 ABCSeries, ABCIndexClass, ABCCategoricalIndex)
---> 14 from pandas.core.dtypes.missing import isna, notna
15 from pandas.core.dtypes.inference import is_hashable
16 from pandas.core.dtypes.cast import (
c:\python35\lib\site-packages\pandas\core\dtypes\missing.py in <module>()
8 ABCIndexClass, ABCGeneric,
9 ABCExtensionArray)
---> 10 from .common import (is_string_dtype, is_datetimelike,
11 is_datetimelike_v_numeric, is_float_dtype,
12 is_datetime64_dtype, is_datetime64tz_dtype,
c:\python35\lib\site-packages\pandas\core\dtypes\common.py in <module>()
15 ABCSparseArray, ABCSparseSeries, ABCCategoricalIndex,
16 ABCIndexClass, ABCDateOffset)
---> 17 from .inference import is_string_like, is_list_like
18 from .inference import * # noqa
19
c:\python35\lib\site-packages\pandas\core\dtypes\inference.py in <module>()
6 from collections import Iterable
7 from numbers import Number
----> 8 from pandas.compat import (PY2, string_types, text_type,
9 string_and_binary_types, re_type)
10 from pandas._libs import lib
ImportError: cannot import name 're_type'
Hi, I'm super new to github and nltk, and your project pywsd seemed like the only one that could return the lesk measure between two words, but it looks like it's only meant for comparing words to sentences? Is there a way to compare just two words and get their similarity score or does your program just not do that? If not, is there anything for python that can? Sorry, I just didn't know of any other way to contact you.
First of all, thank you for this library. I've been using your simple_lesk implementation for a project. But now, after I've installed the latest version of your lib on a different machine, I fail to be able to call the wup_similarity on the object returned by the simple_lesk function.
example = simple_lesk("This is an example", "example")
example.wup_similarity(example)
>>>Traceback (most recent call last):
>>> File "<input>", line 1, in <module>
>>>AttributeError: 'Synset' object has no attribute 'wup_similarity'
In a previous implementation I have been able to do that. I just wanted to understand whether this is my system causing the issue or if there is a fix to that.
Thank you for your time.
Let's test the different algorithm on these corpora.
Even though it compiles I think there is a superfluous curly bracke in your pywsd citation.
The below version works only with \usepackage[square,sort,comma,numbers]{natbib}
(but it turns all your citations into [digit] format which looks really bad)
@misc{pywsd14,
author = {Liling Tan},
title = {Pywsd: Python Implementations of Word Sense Disambiguation (WSD) Technologies [software]},
howpublished = {https://github.com/alvations/pywsd}},
year = {2014}
}
Also, an alternative version that does not break the compilation in case \usepackage{natbib}
used.
@misc{pywsd14,
author = {Liling Tan},
title = {{Pywsd: Python Implementations of Word Sense Disambiguation (WSD) Technologies [software]}},
howpublished = "\url{https://github.com/alvations/pywsd}",
year = {2014},
note = "[Online; accessed 17-July-2016]"
}
Hi,
the function disambiguate seems to throw an exception when used like this:
disambiguate(' letters oed much co')
The exception is:
Traceback (most recent call last):
File "C:\Program Files\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
disambiguate(' letters oed much co')
File "C:\Program Files\Anaconda3\lib\site-packages\pywsd\allwords_wsd.py", line 35, in disambiguate
surface_words, lemmas, morphy_poss = lemmatize_sentence(sentence, keepWordPOS=True)
File "C:\Program Files\Anaconda3\lib\site-packages\pywsd\utils.py", line 107, in lemmatize_sentence
lemmatizer, stemmer))
File "C:\Program Files\Anaconda3\lib\site-packages\pywsd\utils.py", line 79, in lemmatize
stem = stemmer.stem(ambiguous_word)
File "C:\Program Files\Anaconda3\lib\site-packages\nltk\stem\porter.py", line 665, in stem
stem = self._step1b(stem)
File "C:\Program Files\Anaconda3\lib\site-packages\nltk\stem\porter.py", line 376, in _step1b
lambda stem: (self._measure(stem) == 1 and
File "C:\Program Files\Anaconda3\lib\site-packages\nltk\stem\porter.py", line 258, in _apply_rule_list
if suffix == '*d' and self._ends_double_consonant(word):
File "C:\Program Files\Anaconda3\lib\site-packages\nltk\stem\porter.py", line 214, in _ends_double_consonant
word[-1] == word[-2] and
IndexError: string index out of range
TODO: change the NLTK Synset properties to methods appropriately because of http://goo.gl/hO79KO
from pywsd.allwords_wsd import disambiguate
disambiguate('I have five lights')
Traceback (most recent call last):
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'light.n.04'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/allwords_wsd.py", line 51, in disambiguate
from_cache=from_cache)
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 251, in simple_lesk
from_cache=from_cache)
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 226, in simple_signatures
from_cache=from_cache)
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 123, in signatures
from_cache=from_cache)
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 48, in synset_signatures
return synset_signatures_from_cache(ss, hyperhypo, adapted, original_lesk)
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 35, in synset_signatures_from_cache
return cached_signatures[ss.name()][signature_type]
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pandas/core/frame.py", line 2980, in getitem
indexer = self.columns.get_loc(key)
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'light.n.04'
Maxsimiliarity Algorithm is not giving correct results.
Results Obtained:
disambiguate('I went to the bank to deposit my money', algorithm=maxsim, similarity_option='wup', keepLemmas=True)
[('I', 'i', None), ('went', 'go', Synset('travel.v.01')), ('to', 'to', None), ('the', 'the', None), ('bank', 'bank', None), ('to', 'to', None), ('deposit', 'deposit', Synset('deposit.v.02')), ('my', 'my', None), ('money', 'money', None)]
Expected Result:
[('I', 'i', None), ('went', u'go', Synset('sound.v.02')), ('to', 'to', None), ('the', 'the', None), ('bank', 'bank', Synset('bank.n.06')), ('to', 'to', None), ('deposit', 'deposit', Synset('deposit.v.02')), ('my', 'my', None), ('money', 'money', Synset('money.n.01'))]
It is not disambiguating nouns.
Buggy POS reliance.
When extracting signatures, pywsd lemmatizes with POS knowledge when counting overlap, POS was not consider. This happens when we start to rely on POS for lemmatization because of http://stackoverflow.com/questions/27659179/porter-stemming-of-fried/27660340#27660340
For individual WSD it's fine since we can specify POS from the start and that will resolve the issue. The main issue comes when you do all-words WSD and the POS is recognized when lemmatizing but when disambiguating it goes wrong.
Please Update the Link.
Could you please add a setup.py so this can be easily installed using pip?
When running from pywsd.lesk import cached_signatures, simple_lesk
I get the following error
Warming up PyWSD (takes ~10 secs)... Traceback (most recent call last):
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'bar.n.04'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/init.py", line 33, in
pywsd.lesk.simple_lesk('This is a foo bar sentence', 'bar')
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 251, in simple_lesk
from_cache=from_cache)
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 226, in simple_signatures
from_cache=from_cache)
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 123, in signatures
from_cache=from_cache)
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 48, in synset_signatures
return synset_signatures_from_cache(ss, hyperhypo, adapted, original_lesk)
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pywsd/lesk.py", line 35, in synset_signatures_from_cache
return cached_signatures[ss.name()][signature_type]
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pandas/core/frame.py", line 2980, in getitem
indexer = self.columns.get_loc(key)
File "/Users/rreilly/anaconda3/envs/-/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'bar.n.04'
Hi, when I run adapted_lesk on a given sentence and word, I get an IndexError. This happens very often in my corpus and this is a disaster for my project. Could you please tell me what to do to avoid this error? Thanks in advance!
Here are two examples:
answer = adapted_lesk("Because of the attacks, in which at least 11 vehicles were gutted by flames, transportation companies suspended all cargo_shipments along the highway, police said. The raids were carried out by national_liberation_army (ELN) rebels who have ordered the suspension of all economic activity in the eastern part of Antioquia province this week, military officials said.","gutted",'v')
answer = adapted_lesk("In Kiev, foreign ministry official Stanislav Lazebnyk also said agreements were being readied on the black_sea fleet issue.","readied",'v')
And the error:
Traceback (most recent call last):
File "test-lesk.py", line 7, in
answer = adapted_lesk("In Kiev, foreign ministry official Stanislav Lazebnyk also said agreements were being readied on the black_sea fleet issue.","readied",'v')
File "/data2/REUTERS/pywsd-master/lesk.py", line 180, in adapted_lesk
normalizescore=normalizescore)
File "/data2/REUTERS/pywsd-master/lesk.py", line 74, in compare_overlaps
return ranked_synsets[0]
IndexError: list index out of range
I get the following error, possibly because no synset exists.
File "lesk.py", line 67, in compare_overlaps return ranked_synsets[0]
IndexError: list index out of range
How many of these can we get implemented? https://web.stanford.edu/~jurafsky/slp3/C.pdf
I have written a thesis paper about a computer system in which I used pywsd. I would like to cite the usage of it in my paper, and as of now I am citing it like this:
Alvations (2014) Pywsd. GitHub Repository. Retrieved
20 April, 2014, from https://github.com/alvations/pywsd
Now this is no good because I am citing your user name. Instead I would like to cite your actual name. It might be a good idea to include a small note in the readme with a sample citation for anyone who wants to cite it. If this is possible, please let me know asap.
This is more of a performance than a theoretical issue. In theory, it's implemented as they are presented with their respective papers, simple overlaps.
Going after the state-of-art will mean that the implementation is not as represented in the paper. Supervised learning part is going to be a long shot since feature extraction is another headache.
Possibly, improving the overlaps should be a better move for the current code.
The implementation is just great! I would love if you also provided a detailed documentation. If this is too much to ask, then writing a one line per parameter in the docstrings would be just great. I'm now fumbling around with all the combinations and understanding what they mean from the outputs.
Hi,
I got this error when calling adapted_lesk
File "concept_extraction/wordnet_extractor.py", line 78, in annotate_wordnet_concept_lesk
synset = adapted_lesk(text_split, pos[0], 'n')
File "/users/iris/gnguyen/miniconda3/lib/python3.5/site-packages/pywsd/lesk.py", line 197, in adapted_lesk
signature = [lemmatize(i) for i in signature]
UnboundLocalError: local variable 'signature' referenced before assignment
Can you check what is possible reason?
Thank you,
Hi,
I've noticed there are some problems with Wordnet, I've give two examples.
Word | Context | Definition by Wordnet | Correct sense (Longman) |
---|---|---|---|
reflex | Virtual assistants also require a conscious decision to stop doing the current task and actively seek out the virtual assistant, which is a reflex many users haven't developed. | an automatic instinctive unlearned reaction to a stimulus | something that you do without thinking, as a reaction to a situation (there's conditioned reflex and unconditioned unlearned reflex) |
impetus | Companies with the resources to invest in AI are already creating an impetus for others to follow suit or risk not having a competitive seat at the table. | the act of applying force suddenly | an influence that makes something happen or makes it happen more quickly |
So I wonder if I can use dictionaries like Longman to replace Wordnet.
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.