Code Monkey home page Code Monkey logo

pyiwn's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pyiwn's Issues

Download Error

Trying to import the pyiwn module using:
from pyiwn import pyiwn
gives the following error

2020-05-29:19:19:06,219 INFO [helpers.py:20] Downloading IndoWordNet data of size ~31 MB...
2020-05-29:19:19:07,468 ERROR [helpers.py:25] HTTPSConnectionPool(host='www.dropbox.com', port=443): Max retries exceeded with url: /s/t29eqq19nt5eygs/iwn_data.tar.gz?dl=1 (Caused by SSLError(SSLError("bad handshake: SysCallError(10054, 'WSAECONNRESET')")))
2020-05-29:19:19:07,473 ERROR [init.py:14] Could not download IndoWordNet data.

UnicodeDecodeError

hi when i execute this code
iwn = pyiwn.IndoWordNet()

im getting this error
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 16: character maps to

could you help out pls

thanks

Want to explore IndoWordNet

2021-02-14:11:58:58,246 INFO [iwn.py:43] Loading kashmiri language synsets...

UnicodeDecodeError Traceback (most recent call last)
in
----> 1 iwn = pyiwn.IndoWordNet(lang=pyiwn.Language.KASHMIRI)

~\anaconda3\lib\site-packages\pyiwn\iwn.py in init(self, lang)
43 logger.info(f'Loading {lang.value} language synsets...')
44 self._synset_idx_map = {}
---> 45 self._synset_df = self._load_synset_file(lang.value)
46 self._synset_relations_dict = self._load_synset_relations()
47

~\anaconda3\lib\site-packages\pyiwn\iwn.py in _load_synset_file(self, lang)
49 filename = os.path.join(*[constants.IWN_DATA_PATH, 'synsets', 'all.{}'.format(lang)])
50 f = open(filename)
---> 51 synsets = list(map(lambda line: self._load_synset(line), f.readlines()))
52 synset_df = pd.DataFrame(synsets, columns=['synset_id', 'synsets', 'pos'])
53 synset_df = synset_df.dropna()

~\anaconda3\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 5: character maps to

Problem with accessing kannada language synsets

I'm facing problem with accessing kannada language synsets

from pyiwn import pyiwn
iwn = pyiwn.IndoWordNet('kannada')

for some words I'm able to get the synsets

print(iwn.synsets('ಗಂಡಸು'))
[Synset('ಮಾನವ.None.858')]

For the words ಮನೆ, ಮಾನವ, ಗುಡುಗುಡು
print(iwn.synsets('ಮನೆ'))
print(iwn.synsets('ಮಾನವ'))
print(iwn.synsets('ಗುಡುಗುಡು'))
I'm getting the following error:

File "<pyshell#11>", line 1, in
print(iwn.synsets('ಮನೆ'))
File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pyiwn\pyiwn.py", line 61, in synsets
pos = sp[2] if pos == None else pos
IndexError: list index out of range

same error for all the 3 words mentioned above. However, these words are present in all.kannada file.
I request you to help me resolve this issue.

Thanks and regards
Shashirekha

'charmap' codec can't decode byte 0x8d in position 13: character maps to <undefined>

iwn = pyiwn.IndoWordNet()

2022-11-06:13:21:14,789 INFO [iwn.py:43] Loading hindi language synsets...

UnicodeDecodeError Traceback (most recent call last)
Cell In [5], line 2
1 # language defaults to Hindi
----> 2 iwn = pyiwn.IndoWordNet()

File ~\anaconda3\envs\py38torch\lib\site-packages\pyiwn\iwn.py:45, in IndoWordNet.init(self, lang)
43 logger.info(f'Loading {lang.value} language synsets...')
44 self._synset_idx_map = {}
---> 45 self._synset_df = self._load_synset_file(lang.value)
46 self._synset_relations_dict = self._load_synset_relations()

File ~\anaconda3\envs\py38torch\lib\site-packages\pyiwn\iwn.py:51, in IndoWordNet._load_synset_file(self, lang)
49 filename = os.path.join(*[constants.IWN_DATA_PATH, 'synsets', 'all.{}'.format(lang)])
50 f = open(filename)
---> 51 synsets = list(map(lambda line: self._load_synset(line), f.readlines()))
52 synset_df = pd.DataFrame(synsets, columns=['synset_id', 'synsets', 'pos'])
53 synset_df = synset_df.dropna()

File ~\anaconda3\envs\py38torch\lib\encodings\cp1252.py:23, in IncrementalDecoder.decode(self, input, final)
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 13: character maps to

Unable to set up pyiwn in local system

I am trying to set up pyien in local system. I have run below commands to install the library in Pycharm:
git clone https://github.com/riteshpanjwani/pyiwn.git
cd pyiwn
python setup.py install --user
cd ..

After this I am trying to run the commands shown in the example.ipynb but for all commands it showing attribute error.

For e.g. On trying to execute list(map(str, pyiwn.Language)) I get below error
Traceback (most recent call last):
File "C:\Program Files\Python38\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "", line 1, in
AttributeError: module 'pyiwn' has no attribute 'Language'

iwn = pyiwn.IndoWordNet()
AttributeError: module 'pyiwn' has no attribute 'IndoWordNet'

The same thing happens when I try to run in Google Colab also. If I restart the runtime it will start working. However, I am interested in running this library locally. Hence I request you to kindly help me in this regard.

Hypernyms-hyponyms, meronym-holonym not symmetric

The hypernym-hyponym and meronym-holonym relations have some discrepancies in the sense that if A is a hypernym of B, then B should b a hyponym of A and similarly with meronym-holonym but it is not the case. These discrepancies are also there in the database (the dropbox link in the constanst.py file)

  • a basic example:
    ** the total number of direct hyponym relations is: 30884
    ** the total number of direct hypernym relations is: 3530

  • another example:
    ** the total number of meronym (component object) relations is: 718
    **the total number of holonym (component object) relations is: 714

The two numbers should be the same in both the case (as is the case in English wordnet provided by nltk)

code for finding hypernyms:
num_hypernym=0
for v in iwn.all_synsets.() :
num_hypernym += len(iwn.synset_relation( v , pyiwn.SynsetRelations.HYPONYMY))

Not sure if this is same in the original hindi/indo wordnet database or it is specific to pyiwn.

IndexError: list index out of range while obtaining telugu synsets.

from pyiwn import pyiwn
iwn = pyiwn.IndoWordNet('telugu')
print(iwn.synsets('జ్వరం'),pos=pyiwn.NOUN)

Traceback (most recent call last):
print(iwn.synsets('జ్వరం'),pos=pyiwn.NOUN)
synset_data = utils.synset_data(sp, pos)
examples = re.sub('"', '', gloss_examples_sp[1]).split(' / ')
IndexError: list index out of range

English word to hindi

By using pyiwn, Is it possible to convert English word into corresponding Hindi word?

When i go through the paper "pyiwn: A Python-based API to access Indian Language WordNets" it describes a linkage between English wordnet and iwn. I have offset of words from English wordnet , can I map that to any Hindi or other Indian language words?

English language not supported

I have tried the indowordnet online and could access the english words as well but I think in this library english word is not supported, kindly provide the required file.

English support

Although English is listed as a supported langauges in demo.py, looks like the download does not contain English data.

Regards,
Anoop.

IndexError: list index out of range for many languages

Hi,

While trying to access indowordnet for some languages, I encounter IndexError.

Here is an example:

for lang in ['assamese']: 

        print("===== Language: {}".format(lang))

        # Choose a language and create an object of the IndoWordNet class.
        iwn = pyiwn.IndoWordNet(lang)

        # All Synsets
        syns = iwn.all_synsets()
        syn_count[lang]=len(syns)

The following is the error observed:

===== Language: assamese
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-28-bbfe767b3595> in <module>()
      7 
      8         # All Synsets
----> 9         syns = iwn.all_synsets()
     10         syn_count[lang]=len(syns)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyiwn-1.0-py3.6.egg\pyiwn\pyiwn.py in all_synsets(self, pos)
     58             for line in fo:
     59                 sp = utils.clean_line(line)
---> 60                 synset_data = utils.synset_data(sp, pos)
     61                 synset_id, head_word, lemma_names, pos, gloss, examples = synset_data[0], synset_data[1], synset_data[2], synset_data[3], synset_data[4], synset_data[5]
     62                 synsets.append(Synset(synset_id, head_word, lemma_names, pos, gloss, examples))

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pyiwn-1.0-py3.6.egg\pyiwn\utils.py in synset_data(data, pos)
     19         gloss_examples_sp = data[2].split(':')
     20         gloss = gloss_examples_sp[0]
---> 21         examples = re.sub('"', '', gloss_examples_sp[1]).split(' / ')
     22     else:
     23         gloss, examples = '', []

IndexError: list index out of range

I found this error while trying to access all words in some languages also.

AttributeError: 'IndoWordNet' object has no attribute 'morph'

I am trying to use Morphological Analyzer of IndoWordNet.
I am trying it for Hindi and Marathi language.

First I executed following commands.
from pyiwn import pyiwn
pyiwn.download()
iwn = pyiwn.IndoWordNet('hindi')
iwn.all_words()

All commands ran successfully. But when I used
iwn.morph('some word in hindi')

it shows following error .
AttributeError: 'IndoWordNet' object has no attribute 'morph'

Can you please help.

I am new to pyiwn. I am install pyiwn via pip.iwn=pyiwn.IndoWordNet() comment give the ERROR:'charmap' codec can't decode byte 0x8d in position 13: character maps to <undefined>.please help me as soon as possible.

iwn = pyiwn.IndoWordNet()
2019-10-04:13:09:42,355 INFO [iwn.py:43] Loading hindi language synsets...
Traceback (most recent call last):
File "<pyshell#2>", line 1, in
iwn = pyiwn.IndoWordNet()
File "C:\Users\mahe\AppData\Local\Programs\Python\Python37\lib\site-packages\pyiwn\iwn.py", line 45, in init
self._synset_df = self._load_synset_file(lang.value)
File "C:\Users\mahe\AppData\Local\Programs\Python\Python37\lib\site-packages\pyiwn\iwn.py", line 51, in _load_synset_file
synsets = list(map(lambda line: self._load_synset(line), f.readlines()))
File "C:\Users\mahe\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 13: character maps to

Searcher usage

Nice work! Just a question on pyiwn.util.Searcher?

What does the object do? I might have missed it but I can't find the use in the code base.

import error in statment from pyiwn import pyiwn

from pyiwn import pyiwn
Error :
ImportError Traceback (most recent call last)
C:\Users\PURVIJ~1\AppData\Local\Temp/ipykernel_1344/650345743.py in
----> 1 from pyiwn import pyiwn
ImportError: cannot import name 'pyiwn' from 'pyiwn**' (C:\ProgramData\Anaconda3\lib\site-packages\pyiwn_init_.py)

use pip3 install --upgrade pyiwn and it successfully installed.
please guide the for the import issue ..
I am using 3.9 python

'charmap' codec can't encode character '\u2588' in position 2: character maps to <undefined>

While integrating pyiwn with my own product..getting error..

2020-08-27:12:31:07,306 INFO [pythServer.py:239] Evaluation of script Completed.
2020-8-27 12:31:07,3 ERROR [2020-08-27 12:31:07,306] INFO in pythServer: Evaluation of script Completed.
2020-8-27 13:03:13,5 ERROR 2020-08-27:13:03:13,578 INFO [helpers.py:20] Downloading IndoWordNet data of size ~31 MB...
2020-8-27 13:03:28,6 ERROR --- Logging error ---
2020-8-27 13:03:28,6 ERROR 2020-08-27:13:03:28,661 ERROR [pythServer.py:235] 'charmap' codec can't encode character '\u2588' in position 2: character maps to
2020-8-27 13:03:28,6 ERROR Traceback (most recent call last):
2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\PythService\pythServer.py", line 200, in evaluatePythonScript
2020-8-27 13:03:28,6 ERROR exec(str1,globals(),locals())
2020-8-27 13:03:28,6 ERROR File "", line 13, in
2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\lib\site-packages\pyiwn_init_.py", line 13, in
2020-8-27 13:03:28,6 ERROR if not download():
2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\lib\site-packages\pyiwn\helpers.py", line 39, in download
2020-8-27 13:03:28,6 ERROR sys.stdout.write('\r[{}{}]'.format('\u2588' * done, '.' * (50 - done)))
2020-8-27 13:03:28,6 ERROR File "C:\Program Files\Tramp\lib\encodings\cp1252.py", line 19, in encode
2020-8-27 13:03:28,6 ERROR return codecs.charmap_encode(input,self.errors,encoding_table)[0]
2020-8-27 13:03:28,6 ERROR UnicodeEncodeError: 'charmap' codec can't encode character '\u2588' in position 2: character maps to

pyiwn import problem

from pyiwn import pyiwn
Traceback (most recent call last):
File "", line 1, in
File "/Users/suman_ajit/anaconda/lib/python2.7/site-packages/pyiwn/pyiwn.py", line 1, in
from pyiwn import utils
ImportError: cannot import name utils

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.