Upgraded Sumy and get this error upon running it. The debugged progr

Full error for 'English' <div class="snippet-clipboard-content notranslate positio

Full error for 'english': <div class="snippet-clipboard-content notranslate positi

Error with using 'English' as language. about sumy HOT 19 CLOSED

shavid commented on May 25, 2024

Error with using 'English' as language.

from sumy.

Comments (19)

shavid commented on May 25, 2024

Never mind, updating to most recent version seems to fix this, however i now have to enter 'English' not 'english' as the language. This causes problem's as i believe in nltk it's called 'english' not 'English'

I currently am using a workaround for this by listing the language as 'czech'

from sumy.

miso-belica commented on May 25, 2024

Hi, I don't understand where is the problem after update sumy. Passing english in lower case should be fine. There is the test for it.

I also don't understand what you mean by "listing the language as 'czech'". Can you provide little more context and minimal example of failing code (or the test case ideally)?

from sumy.

shavid commented on May 25, 2024

Hey ,

My apologies wrote the above in a bit of a rush and should have provided more information.

I get the following error when i run my code: LookupError: Stemmer is not available for language english.

The beginning of the file looks something like:

import unicodedata, os,  re,  hashlib,  sys
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words
import traceback
LANGUAGE = "english" #Set the language to English

If i set the language = to "English" i get : Resource 'tokenizers/punkt/English.pickle' not found.

What i meant by listing the language as 'czech' is that currently, the only way for me to get my file to execute, is to change the language setting to 'czech' as apposed to 'English' or 'english'.

from sumy.

shavid commented on May 25, 2024

Full error for 'English'

Traceback (most recent call last):
  File "/home/david/Desktop/mypthon/summary.py", line 165, in <module>
    parser = PlaintextParser.from_file(FileName, Tokenizer(LANGUAGE)) #Initalizes what kind of parser we will use (in this case plain text)
  File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/tokenizers.py", line 25, in __init__
    self._sentence_tokenizer = self._sentence_tokenizer(tokenizer_language)
  File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/tokenizers.py", line 33, in _sentence_tokenizer
    return nltk.data.load(path)
  File "/usr/lib/python2.7/dist-packages/nltk/data.py", line 594, in load
    resource_val = pickle.load(_open(resource_url))
  File "/usr/lib/python2.7/dist-packages/nltk/data.py", line 673, in _open
    return find(path).open()
  File "/usr/lib/python2.7/dist-packages/nltk/data.py", line 455, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource 'tokenizers/punkt/English.pickle' not found.  Please
  use the NLTK Downloader to obtain the resource: >>>
  nltk.download().
  Searched in:
    - '/home/david/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'

from sumy.

shavid commented on May 25, 2024

Full error for 'english':

Traceback (most recent call last):
  File "/home/david/Desktop/mypthon/summary.py", line 166, in <module>
    stemmer = Stemmer(LANGUAGE) #initalizes the stemmer in our set language
  File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/stemmers/__init__.py", line 28, in __init__
    raise LookupError("Stemmer is not available for language %s." % language)
LookupError: Stemmer is not available for language english.

Maggie Smith for Alan Bennett film.txt
Traceback (most recent call last):
    return nltk.data.load(path)
  File "<stdin>", line 1
    return nltk.data.load(path)
   ^
IndentationError: unexpected indent

from sumy.

shavid commented on May 25, 2024

Hi,

Sorry about that.

Wrote in a bit of a rush. I've posted on Github.

Date: Tue, 13 May 2014 10:52:59 -0700
From: [email protected]
To: [email protected]
CC: [email protected]
Subject: Re: [sumy] Error - Unsure how to proceed (#13)

Hi, I don't understand where is the problem after update sumy. Passing english in lower case should be fine. There is the test for it.

I also don't understand what you mean by "listing the language as 'czech'". Can you provide little more context and minimal example of failing code (or the test case ideally)?

—
Reply to this email directly or view it on GitHub.

from sumy.

miso-belica commented on May 25, 2024

Ok, I already know where is the problem. You are not downloaded tokenizers for NLTK. You have to run following in the python interpret: nltk.download("punkt"). The command downloads the stemmers and everything should be fine for LANGUAGE = "english".

from sumy.

shavid commented on May 25, 2024

Should have mentioned I have executed said 'command' and have downloaded punkt (along with every other nltk.download option).

Previously i used the option 'english' but this is the one that is coming up as a non - existent language now. And i do believe English is not part of the NLTK kit but i could be wrong.

from sumy.

miso-belica commented on May 25, 2024

Do you run it in Python 3.4? I have the same issue with NLTK & Python 3.4. I have executed nltk.download() but stemmer is not found anyway. But for v26+/3.2+ it's OK.

from sumy.

shavid commented on May 25, 2024

I'm using Python version 2.7, not sure if this makes any difference?
The summary's seem to work fine having Czech listed, but it would obviously be useful if English could become an option.

from sumy.

miso-belica commented on May 25, 2024

Sorry, but I think I can't help with this. Try to fetch english NLTK tokenizer directly by nltk.data.load('tokenizers/punkt/english.pickle'). If it fails the error is somewhere in the NLTK and I really don't know where. But if you figure it out please let me know.

from sumy.

shavid commented on May 25, 2024

I'll take a look, for whatever reason it just seems to be rejecting 'english' and accepting 'English' where as before it was doing the opposite?

from sumy.

shavid commented on May 25, 2024

Just to re-affirm i tried a direct fetch of the tokenizer you posted and still receive this error:

Traceback (most recent call last):
  File "/home/david/Desktop/mypthon/summary.py", line 166, in <module>
    stemmer = Stemmer(LANGUAGE) #initalizes the stemmer in our set language
  File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/stemmers/__init__.py", line 28, in __init__
    raise LookupError("Stemmer is not available for language %s." % language)
LookupError: Stemmer is not available for language english.

Again , i didn't have this issue in sumy 0.1 so I'm not sure if it's a change that has caused it. If i get time tomorrow will try and look through your updates for anything worth noticing.

from sumy.

miso-belica commented on May 25, 2024

@shavid what was the cause? Do you solved it and how?

from sumy.

jbroudou commented on May 25, 2024

I too am seeing that error about the English stemmer not being available. I'm using python 2.7 and executing the command sumy lex-rank --length=10 --url=http://en.wikipedia.org/wiki/Automatic_summarization. I've followed the instrcutions to download the english specific tokenizers, but I get the following error

Traceback (most recent call last):
  File "/usr/local/bin/sumy", line 9, in <module>
    load_entry_point('sumy==0.3.0', 'console_scripts', 'sumy')()
  File "/usr/local/lib/python2.7/dist-packages/sumy/__main__.py", line 65, in main
    summarizer, parser, items_count = handle_arguments(args)
  File "/usr/local/lib/python2.7/dist-packages/sumy/__main__.py", line 102, in handle_arguments
    stemmer = Stemmer(language)
  File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/stemmers/__init__.py", line 28, in __init__
    raise LookupError("Stemmer is not available for language %s." % language)
LookupError: Stemmer is not available for language english.

from sumy.

miso-belica commented on May 25, 2024

Sorry, but I have no problem with that so I can't help you. Try to fetch Stemmer directly from nltk.stem.snowball import EnglishStemmer. Is it working? Then try to fetch english stemmer like this. What is the error then?

import nltk.stem.snowball as nltk_stemmers_module
language = "english"
stemmer_classname = language.capitalize() + 'Stemmer'
stemmer_class = getattr(nltk_stemmers_module, stemmer_classname)

from sumy.

jbroudou commented on May 25, 2024

This looks like a problem with my local setup. The following

from nltk.stem.snowball import SnowballStemmer
print(" ".join(SnowballStemmer.languages))

produces danish dutch finnish french german hungarian italian norwegian portuguese romanian russian spanish swedish

I've no idea as to why english is missing or how I would go about installing it.

from sumy.

jbroudou commented on May 25, 2024

Seems like upgrading NLTK to version 3.0 fixed this.

from sumy.

miso-belica commented on May 25, 2024

Glad to hear and thanks for providing more info about the issue here :)

from sumy.

Error with using 'English' as language. about sumy HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent