Code Monkey home page Code Monkey logo

Comments (19)

shavid avatar shavid commented on May 25, 2024

Never mind, updating to most recent version seems to fix this, however i now have to enter 'English' not 'english' as the language. This causes problem's as i believe in nltk it's called 'english' not 'English'

I currently am using a workaround for this by listing the language as 'czech'

from sumy.

miso-belica avatar miso-belica commented on May 25, 2024

Hi, I don't understand where is the problem after update sumy. Passing english in lower case should be fine. There is the test for it.

I also don't understand what you mean by "listing the language as 'czech'". Can you provide little more context and minimal example of failing code (or the test case ideally)?

from sumy.

shavid avatar shavid commented on May 25, 2024

Hey ,

My apologies wrote the above in a bit of a rush and should have provided more information.

I get the following error when i run my code: LookupError: Stemmer is not available for language english.

The beginning of the file looks something like:

import unicodedata, os,  re,  hashlib,  sys
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words
import traceback
LANGUAGE = "english" #Set the language to English

If i set the language = to "English" i get : Resource 'tokenizers/punkt/English.pickle' not found.

What i meant by listing the language as 'czech' is that currently, the only way for me to get my file to execute, is to change the language setting to 'czech' as apposed to 'English' or 'english'.

from sumy.

shavid avatar shavid commented on May 25, 2024

Full error for 'English'

Traceback (most recent call last):
  File "/home/david/Desktop/mypthon/summary.py", line 165, in <module>
    parser = PlaintextParser.from_file(FileName, Tokenizer(LANGUAGE)) #Initalizes what kind of parser we will use (in this case plain text)
  File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/tokenizers.py", line 25, in __init__
    self._sentence_tokenizer = self._sentence_tokenizer(tokenizer_language)
  File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/tokenizers.py", line 33, in _sentence_tokenizer
    return nltk.data.load(path)
  File "/usr/lib/python2.7/dist-packages/nltk/data.py", line 594, in load
    resource_val = pickle.load(_open(resource_url))
  File "/usr/lib/python2.7/dist-packages/nltk/data.py", line 673, in _open
    return find(path).open()
  File "/usr/lib/python2.7/dist-packages/nltk/data.py", line 455, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource 'tokenizers/punkt/English.pickle' not found.  Please
  use the NLTK Downloader to obtain the resource: >>>
  nltk.download().
  Searched in:
    - '/home/david/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'

from sumy.

shavid avatar shavid commented on May 25, 2024

Full error for 'english':

Traceback (most recent call last):
  File "/home/david/Desktop/mypthon/summary.py", line 166, in <module>
    stemmer = Stemmer(LANGUAGE) #initalizes the stemmer in our set language
  File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/stemmers/__init__.py", line 28, in __init__
    raise LookupError("Stemmer is not available for language %s." % language)
LookupError: Stemmer is not available for language english.

Maggie Smith for Alan Bennett film.txt
Traceback (most recent call last):
    return nltk.data.load(path)
  File "<stdin>", line 1
    return nltk.data.load(path)
   ^
IndentationError: unexpected indent

from sumy.

shavid avatar shavid commented on May 25, 2024

Hi,

Sorry about that.

Wrote in a bit of a rush. I've posted on Github.

Date: Tue, 13 May 2014 10:52:59 -0700
From: [email protected]
To: [email protected]
CC: [email protected]
Subject: Re: [sumy] Error - Unsure how to proceed (#13)

Hi, I don't understand where is the problem after update sumy. Passing english in lower case should be fine. There is the test for it.

I also don't understand what you mean by "listing the language as 'czech'". Can you provide little more context and minimal example of failing code (or the test case ideally)?


Reply to this email directly or view it on GitHub.

from sumy.

miso-belica avatar miso-belica commented on May 25, 2024

Ok, I already know where is the problem. You are not downloaded tokenizers for NLTK. You have to run following in the python interpret: nltk.download("punkt"). The command downloads the stemmers and everything should be fine for LANGUAGE = "english".

from sumy.

shavid avatar shavid commented on May 25, 2024

Should have mentioned I have executed said 'command' and have downloaded punkt (along with every other nltk.download option).

Previously i used the option 'english' but this is the one that is coming up as a non - existent language now. And i do believe English is not part of the NLTK kit but i could be wrong.

from sumy.

miso-belica avatar miso-belica commented on May 25, 2024

Do you run it in Python 3.4? I have the same issue with NLTK & Python 3.4. I have executed nltk.download() but stemmer is not found anyway. But for v26+/3.2+ it's OK.

from sumy.

shavid avatar shavid commented on May 25, 2024

I'm using Python version 2.7, not sure if this makes any difference?
The summary's seem to work fine having Czech listed, but it would obviously be useful if English could become an option.

from sumy.

miso-belica avatar miso-belica commented on May 25, 2024

Sorry, but I think I can't help with this. Try to fetch english NLTK tokenizer directly by nltk.data.load('tokenizers/punkt/english.pickle'). If it fails the error is somewhere in the NLTK and I really don't know where. But if you figure it out please let me know.

from sumy.

shavid avatar shavid commented on May 25, 2024

I'll take a look, for whatever reason it just seems to be rejecting 'english' and accepting 'English' where as before it was doing the opposite?

from sumy.

shavid avatar shavid commented on May 25, 2024

Just to re-affirm i tried a direct fetch of the tokenizer you posted and still receive this error:

Traceback (most recent call last):
  File "/home/david/Desktop/mypthon/summary.py", line 166, in <module>
    stemmer = Stemmer(LANGUAGE) #initalizes the stemmer in our set language
  File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/stemmers/__init__.py", line 28, in __init__
    raise LookupError("Stemmer is not available for language %s." % language)
LookupError: Stemmer is not available for language english.

Again , i didn't have this issue in sumy 0.1 so I'm not sure if it's a change that has caused it. If i get time tomorrow will try and look through your updates for anything worth noticing.

from sumy.

miso-belica avatar miso-belica commented on May 25, 2024

@shavid what was the cause? Do you solved it and how?

from sumy.

jbroudou avatar jbroudou commented on May 25, 2024

I too am seeing that error about the English stemmer not being available. I'm using python 2.7 and executing the command sumy lex-rank --length=10 --url=http://en.wikipedia.org/wiki/Automatic_summarization. I've followed the instrcutions to download the english specific tokenizers, but I get the following error

Traceback (most recent call last):
  File "/usr/local/bin/sumy", line 9, in <module>
    load_entry_point('sumy==0.3.0', 'console_scripts', 'sumy')()
  File "/usr/local/lib/python2.7/dist-packages/sumy/__main__.py", line 65, in main
    summarizer, parser, items_count = handle_arguments(args)
  File "/usr/local/lib/python2.7/dist-packages/sumy/__main__.py", line 102, in handle_arguments
    stemmer = Stemmer(language)
  File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/stemmers/__init__.py", line 28, in __init__
    raise LookupError("Stemmer is not available for language %s." % language)
LookupError: Stemmer is not available for language english.

from sumy.

miso-belica avatar miso-belica commented on May 25, 2024

Sorry, but I have no problem with that so I can't help you. Try to fetch Stemmer directly from nltk.stem.snowball import EnglishStemmer. Is it working? Then try to fetch english stemmer like this. What is the error then?

import nltk.stem.snowball as nltk_stemmers_module
language = "english"
stemmer_classname = language.capitalize() + 'Stemmer'
stemmer_class = getattr(nltk_stemmers_module, stemmer_classname)

from sumy.

jbroudou avatar jbroudou commented on May 25, 2024

This looks like a problem with my local setup. The following

from nltk.stem.snowball import SnowballStemmer
print(" ".join(SnowballStemmer.languages))

produces danish dutch finnish french german hungarian italian norwegian portuguese romanian russian spanish swedish

I've no idea as to why english is missing or how I would go about installing it.

from sumy.

jbroudou avatar jbroudou commented on May 25, 2024

Seems like upgrading NLTK to version 3.0 fixed this.

from sumy.

miso-belica avatar miso-belica commented on May 25, 2024

Glad to hear and thanks for providing more info about the issue here :)

from sumy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.