Comments (19)
Never mind, updating to most recent version seems to fix this, however i now have to enter 'English' not 'english' as the language. This causes problem's as i believe in nltk it's called 'english' not 'English'
I currently am using a workaround for this by listing the language as 'czech'
from sumy.
Hi, I don't understand where is the problem after update sumy
. Passing english in lower case should be fine. There is the test for it.
I also don't understand what you mean by "listing the language as 'czech'". Can you provide little more context and minimal example of failing code (or the test case ideally)?
from sumy.
Hey ,
My apologies wrote the above in a bit of a rush and should have provided more information.
I get the following error when i run my code: LookupError: Stemmer is not available for language english.
The beginning of the file looks something like:
import unicodedata, os, re, hashlib, sys
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words
import traceback
LANGUAGE = "english" #Set the language to English
If i set the language = to "English" i get : Resource 'tokenizers/punkt/English.pickle' not found.
What i meant by listing the language as 'czech' is that currently, the only way for me to get my file to execute, is to change the language setting to 'czech' as apposed to 'English' or 'english'.
from sumy.
Full error for 'English'
Traceback (most recent call last):
File "/home/david/Desktop/mypthon/summary.py", line 165, in <module>
parser = PlaintextParser.from_file(FileName, Tokenizer(LANGUAGE)) #Initalizes what kind of parser we will use (in this case plain text)
File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/tokenizers.py", line 25, in __init__
self._sentence_tokenizer = self._sentence_tokenizer(tokenizer_language)
File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/tokenizers.py", line 33, in _sentence_tokenizer
return nltk.data.load(path)
File "/usr/lib/python2.7/dist-packages/nltk/data.py", line 594, in load
resource_val = pickle.load(_open(resource_url))
File "/usr/lib/python2.7/dist-packages/nltk/data.py", line 673, in _open
return find(path).open()
File "/usr/lib/python2.7/dist-packages/nltk/data.py", line 455, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource 'tokenizers/punkt/English.pickle' not found. Please
use the NLTK Downloader to obtain the resource: >>>
nltk.download().
Searched in:
- '/home/david/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
from sumy.
Full error for 'english':
Traceback (most recent call last):
File "/home/david/Desktop/mypthon/summary.py", line 166, in <module>
stemmer = Stemmer(LANGUAGE) #initalizes the stemmer in our set language
File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/stemmers/__init__.py", line 28, in __init__
raise LookupError("Stemmer is not available for language %s." % language)
LookupError: Stemmer is not available for language english.
Maggie Smith for Alan Bennett film.txt
Traceback (most recent call last):
return nltk.data.load(path)
File "<stdin>", line 1
return nltk.data.load(path)
^
IndentationError: unexpected indent
from sumy.
Hi,
Sorry about that.
Wrote in a bit of a rush. I've posted on Github.
Date: Tue, 13 May 2014 10:52:59 -0700
From: [email protected]
To: [email protected]
CC: [email protected]
Subject: Re: [sumy] Error - Unsure how to proceed (#13)
Hi, I don't understand where is the problem after update sumy. Passing english in lower case should be fine. There is the test for it.
I also don't understand what you mean by "listing the language as 'czech'". Can you provide little more context and minimal example of failing code (or the test case ideally)?
—
Reply to this email directly or view it on GitHub.
from sumy.
Ok, I already know where is the problem. You are not downloaded tokenizers for NLTK. You have to run following in the python interpret: nltk.download("punkt")
. The command downloads the stemmers and everything should be fine for LANGUAGE = "english"
.
from sumy.
Should have mentioned I have executed said 'command' and have downloaded punkt (along with every other nltk.download option).
Previously i used the option 'english' but this is the one that is coming up as a non - existent language now. And i do believe English is not part of the NLTK kit but i could be wrong.
from sumy.
Do you run it in Python 3.4? I have the same issue with NLTK & Python 3.4. I have executed nltk.download()
but stemmer is not found anyway. But for v26+/3.2+ it's OK.
from sumy.
I'm using Python version 2.7, not sure if this makes any difference?
The summary's seem to work fine having Czech listed, but it would obviously be useful if English could become an option.
from sumy.
Sorry, but I think I can't help with this. Try to fetch english NLTK tokenizer directly by nltk.data.load('tokenizers/punkt/english.pickle')
. If it fails the error is somewhere in the NLTK and I really don't know where. But if you figure it out please let me know.
from sumy.
I'll take a look, for whatever reason it just seems to be rejecting 'english' and accepting 'English' where as before it was doing the opposite?
from sumy.
Just to re-affirm i tried a direct fetch of the tokenizer you posted and still receive this error:
Traceback (most recent call last):
File "/home/david/Desktop/mypthon/summary.py", line 166, in <module>
stemmer = Stemmer(LANGUAGE) #initalizes the stemmer in our set language
File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/stemmers/__init__.py", line 28, in __init__
raise LookupError("Stemmer is not available for language %s." % language)
LookupError: Stemmer is not available for language english.
Again , i didn't have this issue in sumy 0.1 so I'm not sure if it's a change that has caused it. If i get time tomorrow will try and look through your updates for anything worth noticing.
from sumy.
@shavid what was the cause? Do you solved it and how?
from sumy.
I too am seeing that error about the English stemmer not being available. I'm using python 2.7 and executing the command sumy lex-rank --length=10 --url=http://en.wikipedia.org/wiki/Automatic_summarization
. I've followed the instrcutions to download the english specific tokenizers, but I get the following error
Traceback (most recent call last):
File "/usr/local/bin/sumy", line 9, in <module>
load_entry_point('sumy==0.3.0', 'console_scripts', 'sumy')()
File "/usr/local/lib/python2.7/dist-packages/sumy/__main__.py", line 65, in main
summarizer, parser, items_count = handle_arguments(args)
File "/usr/local/lib/python2.7/dist-packages/sumy/__main__.py", line 102, in handle_arguments
stemmer = Stemmer(language)
File "/usr/local/lib/python2.7/dist-packages/sumy/nlp/stemmers/__init__.py", line 28, in __init__
raise LookupError("Stemmer is not available for language %s." % language)
LookupError: Stemmer is not available for language english.
from sumy.
Sorry, but I have no problem with that so I can't help you. Try to fetch Stemmer directly from nltk.stem.snowball import EnglishStemmer
. Is it working? Then try to fetch english stemmer like this. What is the error then?
import nltk.stem.snowball as nltk_stemmers_module
language = "english"
stemmer_classname = language.capitalize() + 'Stemmer'
stemmer_class = getattr(nltk_stemmers_module, stemmer_classname)
from sumy.
This looks like a problem with my local setup. The following
from nltk.stem.snowball import SnowballStemmer
print(" ".join(SnowballStemmer.languages))
produces danish dutch finnish french german hungarian italian norwegian portuguese romanian russian spanish swedish
I've no idea as to why english is missing or how I would go about installing it.
from sumy.
Seems like upgrading NLTK to version 3.0 fixed this.
from sumy.
Glad to hear and thanks for providing more info about the issue here :)
from sumy.
Related Issues (20)
- Limitation of length input HOT 2
- sumbasic: KeyError HOT 5
- Ability to access UserWarnings HOT 2
- Summarising books by verbs HOT 4
- question: how could I extract a specific number of keywords instead of sentence? HOT 2
- A HuggingFace space for sumy HOT 2
- Luhn's summarizer 'significant percentage' comment HOT 2
- how to remove sentences from ODM HOT 4
- power_method produces NaN, inf values HOT 1
- Is it possible to get how many texts summarized by the summarizer? HOT 7
- replace docpot with docopt-ng HOT 3
- What is the point of Docker image ? HOT 1
- wrong question HOT 1
- Console being spammed when using library. HOT 5
- PlaintextParser incompatibility with Python 3.10, easy fix HOT 2
- Prepare for NumPy v2
- Tip: how to make it summarize mid-tail languages, e.g. Polish HOT 2
- Division by zero by rouge.py, only in some algos HOT 1
- Lowercase of all languages needed in utils.py HOT 1
- Would you like to start adding type annotations to this project? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sumy.