sumbasic failed on text: <a href="https://github.com/miso-belica/sumy/files/907502

Same error from the docker version: <div class="snippet-clipboard-content notransl

sumbasic: KeyError about sumy HOT 5 OPEN

mrx23dot commented on June 20, 2024

sumbasic: KeyError

from sumy.

Comments (5)

slvcsl commented on June 20, 2024 1

My understanding is that it is because _get_content_words_in_sentence and _get_all_content_words_in_doc use a different preprocessing.

I modified _get_all_content_words_in_doc to have the same preprocessing as in:

def _get_all_content_words_in_doc(self, sentences):
        normalized_words = []
        for s in sentences:
            normalized_words += self._normalize_words(s.words)
        normalized_content_words = self._filter_out_stop_words(normalized_words)
        stemmed_normalized_content_words = self._stem_words(normalized_content_words)
        return stemmed_normalized_content_words

It works now, but I still had no time to double-check that this is the correct solution.

from sumy.

slvcsl commented on June 20, 2024

Hi! Any news on this? Thanks a lot for your work!

from sumy.

mrx23dot commented on June 20, 2024

Maybe this could help word_freq_in_doc.get(w, 0)
I guess it encounter a word not in dict.

from sumy.

tezer commented on June 20, 2024

Same error from the docker version:

Traceback (most recent call last):
  File "/usr/local/bin/sumy", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/sumy/__main__.py", line 70, in main
    for sentence in summarizer(parser.document, items_count):
  File "/usr/local/lib/python3.10/site-packages/sumy/summarizers/sum_basic.py", line 27, in __call__
    ratings = self._compute_ratings(sentences)
  File "/usr/local/lib/python3.10/site-packages/sumy/summarizers/sum_basic.py", line 110, in _compute_ratings
    best_sentence_index = self._find_index_of_best_sentence(word_freq, sentences_as_words)
  File "/usr/local/lib/python3.10/site-packages/sumy/summarizers/sum_basic.py", line 92, in _find_index_of_best_sentence
    word_freq_avg = self._compute_average_probability_of_words(word_freq, words)
  File "/usr/local/lib/python3.10/site-packages/sumy/summarizers/sum_basic.py", line 75, in _compute_average_probability_of_words
    word_freq_sum = sum([word_freq_in_doc[w] for w in content_words_in_sentence])
  File "/usr/local/lib/python3.10/site-packages/sumy/summarizers/sum_basic.py", line 75, in <listcomp>
    word_freq_sum = sum([word_freq_in_doc[w] for w in content_words_in_sentence])
KeyError: 'own'

from sumy.

nefastosaturo commented on June 20, 2024

Hello there.

I encountered this error too.

The problems are in the two functions in sum_basic.py _get_content_word_in_sentence and _get_all_content_words_in_doc but mostly here

The different steps in those functions creates two different set/list of words due by the stop words list called befor or after normalization or stemmer. Also _get_all_words function calls the stemmer too, creating confusion for the stop word filtering.

So I just changed them like that:

    def _get_all_words_in_doc(self, sentences):
        # return self._stem_words([w for s in sentences for w in s.words])
        return [w for s in sentences for w in s.words]

    def _get_content_words_in_sentence(self, sentence): 
        # firstly normalize
        normalized_words = self._normalize_words(sentence.words) 
        # then filter out stop words
        normalized_content_words = self._filter_out_stop_words(normalized_words)
        # then stem
        stemmed_normalized_content_words = self._stem_words(normalized_content_words)
        return stemmed_normalized_content_words

    def _get_all_content_words_in_doc(self, sentences):
        all_words = self._get_all_words_in_doc(sentences)
        normalized_words = self._normalize_words(all_words)
        normalized_content_words = self._filter_out_stop_words(normalized_words)
        stemmed_normalized_content_words = self._stem_words(normalized_content_words)
        return stemmed_normalized_content_words

from sumy.

sumbasic: KeyError about sumy HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent