Hello, I am trying to use the KEA algorithm for French language. I want to kno

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Language Issue :"French" & clarifications about inputs about pke HOT 3 CLOSED

boudinfl commented on May 29, 2024

Language Issue :"French" & clarifications about inputs

from pke.

Comments (3)

boudinfl commented on May 29, 2024

Hi @fatimalaoui,

The df file contains the document frequency counts (i.e. the number of documents that contain each word) and the model_file contains the parameter for the classifier.

pke only ships with pre-trained english models, so that is maybe why you are facing with issue. Nevertheless, can you give me a code snippet so I can reproduce your error?

from pke.

fatimalaoui commented on May 29, 2024

Hi @boudinfl , Thank you for your reply.
Here is a code snippet that i use for english but that doesn't work also.
`#Kea keyphrase extraction model. Parameterized example::
import pke
from nltk.corpus import stopwords
from pke.supervised.feature_based.kea import Kea

define a list of stopwords
stoplist = stopwords.words('english')

create a Kea extractor.
extractor = Kea()
load the content of the document.
extractor.load_document(input='C-1.txt',
language='./Programs/Python/Python37/Lib/site-packages/spacy/lang/en',
normalization=None)
select 1-3 grams that do not start or end with a stopword as
candidates. Candidates that contain punctuation marks as words
are discarded.
extractor.candidate_selection(stoplist=stoplist)
classify candidates as keyphrase or not keyphrase.
df = pke.load_document_frequency_file(input_file='./Desktop/model.txt')
model_file = './Desktop/df-semeval2010.tsv'
extractor.candidate_weighting(self,model_file=model_file,df=df)
get the 10-highest scored candidates as keyphrases
keyphrases = extractor.get_n_best(n=10)`

For the df and model_file , i'm not sure what to put there exactly. (refering to the files that already exist in the package pke)
Here is the spacy file i have in my python directory .. is there something missing maybe or is it installed correctly?
The error i'm getting is the following:

Could not read meta.json from C:\Users\Fatima\AppData\Local\Programs\Python\Python37\Lib\site-packages\spacy\lang\en\meta.json

from pke.

boudinfl commented on May 29, 2024

The language parameter of load_document() should be set to en. I also think you have issues with the df and model files (df should be document frequency counts and model should be a sklearn model parameters file). You can simply try to use Kea with the standard models by simply calling extractor.candidate_weighting()

from pke.

Recommend Projects

Language Issue :"French" & clarifications about inputs about pke HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent