cfiltnlp / pyiwn Goto Github PK
View Code? Open in Web Editor NEWA Python based API to access Indian language WordNets.
Home Page: http://www.cfilt.iitb.ac.in/
License: Creative Commons Attribution Share Alike 4.0 International
A Python based API to access Indian language WordNets.
Home Page: http://www.cfilt.iitb.ac.in/
License: Creative Commons Attribution Share Alike 4.0 International
Trying to import the pyiwn module using:
from pyiwn import pyiwn
gives the following error
2020-05-29:19:19:06,219 INFO [helpers.py:20] Downloading IndoWordNet data of size ~31 MB...
2020-05-29:19:19:07,468 ERROR [helpers.py:25] HTTPSConnectionPool(host='www.dropbox.com', port=443): Max retries exceeded with url: /s/t29eqq19nt5eygs/iwn_data.tar.gz?dl=1 (Caused by SSLError(SSLError("bad handshake: SysCallError(10054, 'WSAECONNRESET')")))
2020-05-29:19:19:07,473 ERROR [init.py:14] Could not download IndoWordNet data.
hi when i execute this code
iwn = pyiwn.IndoWordNet()
im getting this error
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 16: character maps to
could you help out pls
thanks
Is there any way to retrieve all the senses of a word?
UnicodeDecodeError Traceback (most recent call last)
in
----> 1 iwn = pyiwn.IndoWordNet(lang=pyiwn.Language.KASHMIRI)
~\anaconda3\lib\site-packages\pyiwn\iwn.py in init(self, lang)
43 logger.info(f'Loading {lang.value} language synsets...')
44 self._synset_idx_map = {}
---> 45 self._synset_df = self._load_synset_file(lang.value)
46 self._synset_relations_dict = self._load_synset_relations()
47
~\anaconda3\lib\site-packages\pyiwn\iwn.py in _load_synset_file(self, lang)
49 filename = os.path.join(*[constants.IWN_DATA_PATH, 'synsets', 'all.{}'.format(lang)])
50 f = open(filename)
---> 51 synsets = list(map(lambda line: self._load_synset(line), f.readlines()))
52 synset_df = pd.DataFrame(synsets, columns=['synset_id', 'synsets', 'pos'])
53 synset_df = synset_df.dropna()
~\anaconda3\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 5: character maps to
I'm facing problem with accessing kannada language synsets
from pyiwn import pyiwn
iwn = pyiwn.IndoWordNet('kannada')
same error for all the 3 words mentioned above. However, these words are present in all.kannada file.
I request you to help me resolve this issue.
Thanks and regards
Shashirekha
iwn = pyiwn.IndoWordNet()
UnicodeDecodeError Traceback (most recent call last)
Cell In [5], line 2
1 # language defaults to Hindi
----> 2 iwn = pyiwn.IndoWordNet()
File ~\anaconda3\envs\py38torch\lib\site-packages\pyiwn\iwn.py:45, in IndoWordNet.init(self, lang)
43 logger.info(f'Loading {lang.value} language synsets...')
44 self._synset_idx_map = {}
---> 45 self._synset_df = self._load_synset_file(lang.value)
46 self._synset_relations_dict = self._load_synset_relations()
File ~\anaconda3\envs\py38torch\lib\site-packages\pyiwn\iwn.py:51, in IndoWordNet._load_synset_file(self, lang)
49 filename = os.path.join(*[constants.IWN_DATA_PATH, 'synsets', 'all.{}'.format(lang)])
50 f = open(filename)
---> 51 synsets = list(map(lambda line: self._load_synset(line), f.readlines()))
52 synset_df = pd.DataFrame(synsets, columns=['synset_id', 'synsets', 'pos'])
53 synset_df = synset_df.dropna()
File ~\anaconda3\envs\py38torch\lib\encodings\cp1252.py:23, in IncrementalDecoder.decode(self, input, final)
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 13: character maps to
I am trying to set up pyien in local system. I have run below commands to install the library in Pycharm:
git clone https://github.com/riteshpanjwani/pyiwn.git
cd pyiwn
python setup.py install --user
cd ..
After this I am trying to run the commands shown in the example.ipynb but for all commands it showing attribute error.
For e.g. On trying to execute list(map(str, pyiwn.Language)) I get below error
Traceback (most recent call last):
File "C:\Program Files\Python38\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "", line 1, in
AttributeError: module 'pyiwn' has no attribute 'Language'
iwn = pyiwn.IndoWordNet()
AttributeError: module 'pyiwn' has no attribute 'IndoWordNet'
The same thing happens when I try to run in Google Colab also. If I restart the runtime it will start working. However, I am interested in running this library locally. Hence I request you to kindly help me in this regard.
How to find antonyms for a given Hindi word?
The hypernym-hyponym and meronym-holonym relations have some discrepancies in the sense that if A is a hypernym of B, then B should b a hyponym of A and similarly with meronym-holonym but it is not the case. These discrepancies are also there in the database (the dropbox link in the constanst.py file)
a basic example:
** the total number of direct hyponym relations is: 30884
** the total number of direct hypernym relations is: 3530
another example:
** the total number of meronym (component object) relations is: 718
**the total number of holonym (component object) relations is: 714
The two numbers should be the same in both the case (as is the case in English wordnet provided by nltk)
code for finding hypernyms:
num_hypernym=0
for v in iwn.all_synsets.() :
num_hypernym += len(iwn.synset_relation( v , pyiwn.SynsetRelations.HYPONYMY))
Not sure if this is same in the original hindi/indo wordnet database or it is specific to pyiwn.
from pyiwn import pyiwn
iwn = pyiwn.IndoWordNet('telugu')
print(iwn.synsets('జ్వరం'),pos=pyiwn.NOUN)
Traceback (most recent call last):
print(iwn.synsets('జ్వరం'),pos=pyiwn.NOUN)
synset_data = utils.synset_data(sp, pos)
examples = re.sub('"', '', gloss_examples_sp[1]).split(' / ')
IndexError: list index out of range
By using pyiwn, Is it possible to convert English word into corresponding Hindi word?
When i go through the paper "pyiwn: A Python-based API to access Indian Language WordNets" it describes a linkage between English wordnet and iwn. I have offset of words from English wordnet , can I map that to any Hindi or other Indian language words?
I have tried the indowordnet online and could access the english words as well but I think in this library english word is not supported, kindly provide the required file.
Although English is listed as a supported langauges in demo.py, looks like the download does not contain English data.
Regards,
Anoop.
I am using python version 3.6.4.. and still facing this issue..
thank you.
Hi,
While trying to access indowordnet for some languages, I encounter IndexError.
Here is an example:
for lang in ['assamese']:
print("===== Language: {}".format(lang))
# Choose a language and create an object of the IndoWordNet class.
iwn = pyiwn.IndoWordNet(lang)
# All Synsets
syns = iwn.all_synsets()
syn_count[lang]=len(syns)
The following is the error observed:
===== Language: assamese
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-28-bbfe767b3595> in <module>()
7
8 # All Synsets
----> 9 syns = iwn.all_synsets()
10 syn_count[lang]=len(syns)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyiwn-1.0-py3.6.egg\pyiwn\pyiwn.py in all_synsets(self, pos)
58 for line in fo:
59 sp = utils.clean_line(line)
---> 60 synset_data = utils.synset_data(sp, pos)
61 synset_id, head_word, lemma_names, pos, gloss, examples = synset_data[0], synset_data[1], synset_data[2], synset_data[3], synset_data[4], synset_data[5]
62 synsets.append(Synset(synset_id, head_word, lemma_names, pos, gloss, examples))
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyiwn-1.0-py3.6.egg\pyiwn\utils.py in synset_data(data, pos)
19 gloss_examples_sp = data[2].split(':')
20 gloss = gloss_examples_sp[0]
---> 21 examples = re.sub('"', '', gloss_examples_sp[1]).split(' / ')
22 else:
23 gloss, examples = '', []
IndexError: list index out of range
I found this error while trying to access all words in some languages also.
I am trying to use Morphological Analyzer of IndoWordNet.
I am trying it for Hindi and Marathi language.
First I executed following commands.
from pyiwn import pyiwn
pyiwn.download()
iwn = pyiwn.IndoWordNet('hindi')
iwn.all_words()
All commands ran successfully. But when I used
iwn.morph('some word in hindi')
it shows following error .
AttributeError: 'IndoWordNet' object has no attribute 'morph'
Can you please help.
iwn = pyiwn.IndoWordNet()
2019-10-04:13:09:42,355 INFO [iwn.py:43] Loading hindi language synsets...
Traceback (most recent call last):
File "<pyshell#2>", line 1, in
iwn = pyiwn.IndoWordNet()
File "C:\Users\mahe\AppData\Local\Programs\Python\Python37\lib\site-packages\pyiwn\iwn.py", line 45, in init
self._synset_df = self._load_synset_file(lang.value)
File "C:\Users\mahe\AppData\Local\Programs\Python\Python37\lib\site-packages\pyiwn\iwn.py", line 51, in _load_synset_file
synsets = list(map(lambda line: self._load_synset(line), f.readlines()))
File "C:\Users\mahe\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 13: character maps to
Nice work! Just a question on pyiwn.util.Searcher
?
What does the object do? I might have missed it but I can't find the use in the code base.
from pyiwn import pyiwn
Error :
ImportError Traceback (most recent call last)
C:\Users\PURVIJ~1\AppData\Local\Temp/ipykernel_1344/650345743.py in
----> 1 from pyiwn import pyiwn
ImportError: cannot import name 'pyiwn' from 'pyiwn**' (C:\ProgramData\Anaconda3\lib\site-packages\pyiwn_init_.py)
use pip3 install --upgrade pyiwn and it successfully installed.
please guide the for the import issue ..
I am using 3.9 python
While integrating pyiwn with my own product..getting error..
2020-08-27:12:31:07,306 INFO [pythServer.py:239] Evaluation of script Completed.
2020-8-27 12:31:07,3 ERROR [2020-08-27 12:31:07,306] INFO in pythServer: Evaluation of script Completed.
2020-8-27 13:03:13,5 ERROR 2020-08-27:13:03:13,578 INFO [helpers.py:20] Downloading IndoWordNet data of size ~31 MB...
2020-8-27 13:03:28,6 ERROR --- Logging error ---
2020-8-27 13:03:28,6 ERROR 2020-08-27:13:03:28,661 ERROR [pythServer.py:235] 'charmap' codec can't encode character '\u2588' in position 2: character maps to
2020-8-27 13:03:28,6 ERROR Traceback (most recent call last):
2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\PythService\pythServer.py", line 200, in evaluatePythonScript
2020-8-27 13:03:28,6 ERROR exec(str1,globals(),locals())
2020-8-27 13:03:28,6 ERROR File "", line 13, in
2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\lib\site-packages\pyiwn_init_.py", line 13, in
2020-8-27 13:03:28,6 ERROR if not download():
2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\lib\site-packages\pyiwn\helpers.py", line 39, in download
2020-8-27 13:03:28,6 ERROR sys.stdout.write('\r[{}{}]'.format('\u2588' * done, '.' * (50 - done)))
2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\lib\encodings\cp1252.py", line 19, in encode
2020-8-27 13:03:28,6 ERROR return codecs.charmap_encode(input,self.errors,encoding_table)[0]
2020-8-27 13:03:28,6 ERROR UnicodeEncodeError: 'charmap' codec can't encode character '\u2588' in position 2: character maps to
from pyiwn import pyiwn
Traceback (most recent call last):
File "", line 1, in
File "/Users/suman_ajit/anaconda/lib/python2.7/site-packages/pyiwn/pyiwn.py", line 1, in
from pyiwn import utils
ImportError: cannot import name utils
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.