Comments (7)
@coiby , to use a custom dictionary, you could use original_lesk()
where you can input your own dictionary:
from nltk import word_tokenize
from pywsd.lesk import original_lesk
dictionary = {'reflex.a.1' : 'Physiology. noting or pertaining to an involuntary response to a stimulus, the nerve impulse from a receptor being transmitted inward to a nerve center that in turn transmits it outward to an effector.',
'reflex.a.2' : 'occurring in reaction; responsive.',
'reflex.a.3' : 'cast back; reflected, as light, color, etc.',
'reflex.a.4' : 'bent or turned back.',
'reflex.a.5' : 'designating a radio apparatus in which the same circuit or part performs two functions.',
'reflex.n.1' : 'Physiology. Also called reflex act. movement caused by a reflex response. Also called reflex action. the entire physiological process activating such movement.',
'reflex.n.2' :'any automatic, unthinking, often habitual behavior or response.',
'reflex.n.3' : 'the reflection or image of an object, as exhibited by a mirror or the like.',
'reflex.n.4' : 'a reproduction, as if in a mirror.',
'reflex.n.5' : 'a copy; adaptation.',
'reflex.n.6' : 'reflected light, color, etc.',
'reflex.n.7' : 'Historical Linguistics. an element in a language, as a sound, that has developed from a corresponding element in an earlier form of the language'}
# Tokenize your definitions from the custom dictionary.
# It's a weird step (possibly, i should change the function
# to do this automatically)
dictionary = {k:" ".join(word_tokenize(v)) for k,v in dictionary.items()}
context = "Virtual assistants also require a conscious decision to stop doing the current task and actively seek out the virtual assistant, which is a reflex many users haven't developed."
context = " ".join(word_tokenize(context))
disambiguated = original_lesk(context, 'reflex', dictionary)
print disambiguated
print dictionary[disambiguated]
[out]:
reflex.n.7
Historical Linguistics . an element in a language , as a sound , that has developed from a corresponding element in an earlier form of the language
Note that the original lesk is flawed, hence the various version of lesk.
But do note that a dictionary has many sense per word like a synset so when you build your custom dictionary you need 1 key-value pair per sense and not per word.
For copyrights reason, I am not able to allow the software to use longman or oxford. But I could use other dictionaries that are open like wiktionary. I will have some free time after april and let me see whether I can move the software to version 1.1. I'll add wiktionary
as one of the things to add.
But you're working with a dictionary company that allows me to use the dictionary freely for pywsd
, I'll surely add it as an alternative resource to wordnet ;P
from pywsd.
Ah yes, that synset would have been the only adjective indexed with "reflex". But do note that in NLTK using WordNet v3.0, it's POS is a satellite adjective instead of a
.
>>> from pywsd.lesk import simple_lesk
>>> from nltk.corpus import wordnet as wn
>>> simple_lesk("Virtual assistants also require a conscious decision to stop doing the current task and actively seek out the virtual assistant, which is a reflex many users haven't developed.", 'reflex', pos='s')
Synset('automatic.s.03')
>>> wn.synset('automatic.s.03').definition()
u'without volition or conscious control'
>>> wn.synset('automatic.s.03').offset()
2522669
from pywsd.
@alvations Thank you for the comprehensive instructions.
My goal is to accumulate vocabulary words by memorizing new words using spaced repetition learning technique. Given a word and the context, I want to find definition which has the same sense together with the example sentence from the dictionary like Longman automatically. So building flashcards for memorization can also be done automatically. That's why I'm interested in this project.
I try to minimize the efforts of manually editing flashcards. It'll be boring. So WSD precision is one of my major concerns.
Unfortunately, I'm not from a dictionary company. But if I buy the CD-ROM and extract the data for personal use, it will not infringe copyright, right?
from pywsd.
We do have in the Princeton Wordnet a sense for "reflex" that is similar of the sense from Longman:
http://wnpt.brlcloud.com/wn/synset?id=02522669-a
http://compling.hss.ntu.edu.sg/omw/cgi-bin/wn-gridx.cgi?synset=02522669-a
Same for impetus:
http://wnpt.brlcloud.com/wn/synset?id=11447851-n
from pywsd.
@arademaker Thank you for providing info about Wordnet.
For another sense of reflex,
without volition or conscious control
it's considered to be a satellite adjective
by Wordnet as pointed out by @alvations. But Longman or Oxford consider both two senses as nouns. Btw, I'm curious why Wordnet introduce satellite adjective
.
The reason I think Wordnet doesn't provide proper definition for the first sense is that reflex can be unlearned (unconditioned) or learned (conditioned)
noun
an automatic instinctive unlearned reaction to a stimulus (Freq. 5)
• Syn:
reflex response, reflex action, instinctive reflex, innate reflex, inborn reflex, unconditioned reflex, physiological reaction
• Hypernyms: reaction, response
But I think pyWSD make the right choice.
For impetus, you mean "a force that moves something along", right? pyWSD fail to choose the correct one this time. But I think the metaphorical sense ( an influence that makes something happen or makes it happen more quickly) is better explained by Longman/Oxford.
from pywsd.
@coiby IMHO, satellite adjective
is a historical issue. This QnA describes it partially: http://stackoverflow.com/questions/18817396/what-part-of-speech-does-s-stand-for-in-nltk-synsets I usually normalize all my s
-> a
when I use word sense related features for NLP.
I'm not sure about the license of your dictionary that you've purchased so I can't really commment on that.
As for pywsd
accuracy, I really need to sit down and evaluate them but note that when using lesk
related functions, usually it boils down to what words appear in both the context and the definition (aka "signatures" in the original lesk paper).
from pywsd.
Actually, this is a decision for encoding PWN in text files. See https://wordnet.princeton.edu/documentation/wndb5wn and https://wordnet.princeton.edu/documentation/lexnames5wn. The synset type is not the syntactic category or part-of-speech as we call it nowadays. This is confusing because the same values are used: a, v, n, and r (but nor s). Many adjectives in PWN are organized into clusters and an a
synset can be the HEAD of a cluster and s
ones, are satellites of a cluster.
from pywsd.
Related Issues (20)
- Link in Maximizing Similarity in README.md Showing an Error HOT 2
- Maxsimiliarity Algorithm HOT 1
- Using wup_similarity on simple_lesk output HOT 2
- pos mismatch breaks similiarity
- pywsd correctly installed but get error when import (python 3) HOT 2
- simple_lesk bug HOT 1
- disambiguate bug HOT 7
- IndexError when using disambiguate() with maxsim algorithm HOT 2
- Using signatures computed using wordNet 3.0
- Support for other languages
- Notice: please pin wn dependency HOT 2
- version inconsistency (GitHub vs. PyPI)
- hit ImportError: cannot import name 'WordNet' from 'wn' HOT 8
- Proposing a PR to fix a few small typos
- Proposing a PR to fix a few small typos
- partially disambiguated sentence
- PyWSD with `wn` HOT 1
- Cached signatures could be replaced by json to improve performance
- ModuleNotFoundError: No module named 'BeautifulSoup'
- wrong repo HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pywsd.