lpty / nlp_base Goto Github PK

View Code? Open in Web Editor NEW

563.0 24.0 204.0 68 KB

自然语言基础模型

Python 93.55% Makefile 2.83% Batchfile 3.62%

nlp-machine-learning

nlp_base's People

Contributors

Stargazers

Watchers

Forkers

boragocode aiedward qinggege huangpd csc19960608 geomaticsandrs ryfan-rs vvvictorlee mkyann samchen1981 ajoeajoe 1148270327 mattxia yang-ybb jz3707 leozhanpang tanyufei pst2016 tutty427 wesley-weiming miss993 baifengbai ningpengtao-coder haif-liu pigliangliang toookah wangbin321 gdh756462786 yjingyu 15810856129 xyyhcl zhanglv0209 arkeyang huangxiancun sjyttkl scorpiusalpha airob wuyongdec bigdataedison jvwke cjopengler copperdong ganddmao tianch2750 charlotteliu color4 liangwq hanchenan zhangyanbo2007 zw76859420 baobaobaobaobao kyroad chuanfanyoudong alucardmini hjian lidhcs illool jiniaoxu wangyiyan3318 mrbendy lengsihui phychaos zhuyuuyuhz asuolai smirkcao pkujcy haisimao yyxt11 stevenyesz feileyu bravestpeng liudicsu jdjw6688 wccms coolchenshan leichangqing kyang888 weixx-group liyijincom carrychang polaris2019 michaelliu03 nino1123 george191 yanveii dbtxy xnz2272292521 zoe-zhanghan zhaocaicat profiket jingmouren unasm aresbit shunyuanxue ztl-35 allensmile cclauss legendtianjin juary88 shugao0810

nlp_base's Issues

中文疑问句判别模型出来的结果都是统一分值

我自己制作了数据集来训练中文疑问句判别模型，测试的时候，无论输入什么样的句子，都是一样的得分prob, 并且都是负类，求原因？

运行 interrogative 示例时，No such file or directory: 'data/question_recog.csv'

应该是缺少模型训练的语料

AttributeError: 'float' object has no attribute 'decode'

So sorry to bother you again...

when I use "train()"

the error occur:
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.173 seconds.
Prefix dict has been built succesfully.

Traceback (most recent call last):
File "", line 1, in
File "interrogative/api.py", line 17, in train
model.train()
File "interrogative/model.py", line 76, in train
self.initialize_model()
File "interrogative/model.py", line 31, in initialize_model
train, label = self.corpus.generator()
File "interrogative/corpus.py", line 62, in generator
corpus = cls.read_corpus_from_file(corpus_path)
File "interrogative/corpus.py", line 34, in perform_word_segment
tokenizer = jieba.Tokenizer()
File "/home1/liuxin/anaconda3/envs/py27/lib/python2.7/site-packages/pandas/core/series.py", line 3591, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2217, in pandas._libs.lib.map_infer
File "interrogative/corpus.py", line 34, in
tokenizer = jieba.Tokenizer()
File "/home1/liuxin/.local/lib/python2.7/site-packages/jieba/init.py", line 282, in cut
sentence = strdecode(sentence)
File "/home1/liuxin/.local/lib/python2.7/site-packages/jieba/_compat.py", line 37, in strdecode
sentence = sentence.decode('utf-8')
AttributeError: 'float' object has no attribute 'decode'

基于Xgboost的中文疑问句判别模型有语料没亲，给个样理也好啊

基于Xgboost的中文疑问句判别模型，我想基于你的这个去做，但是需要语料

data set error

Content of question_recog.csv is:
content,label
在么,1
你好,0
公司在哪里,1
需要多少钱,1
未成年可以贷款吗,1
你现在在干什么,1
我在这里,0

And when I use 'train()', the error occurs:

Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.201 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "", line 1, in
File "interrogative/api.py", line 17, in train
model.train()
File "interrogative/model.py", line 77, in train
_, best_param, best_iter_round = self.model_param_select()
File "interrogative/model.py", line 63, in model_param_select
early_stopping_rounds=self.early_stopping_rounds) # stop when metrics not get better
File "/home1/liuxin/anaconda3/envs/py27/lib/python2.7/site-packages/xgboost/training.py", line 446, in cv
res = aggcv([f.eval(i, feval) for f in cvfolds])
File "/home1/liuxin/anaconda3/envs/py27/lib/python2.7/site-packages/xgboost/training.py", line 234, in eval
return self.bst.eval_set(self.watchlist, iteration, feval)
File "/home1/liuxin/anaconda3/envs/py27/lib/python2.7/site-packages/xgboost/core.py", line 1173, in eval_set
ctypes.byref(msg)))
File "/home1/liuxin/anaconda3/envs/py27/lib/python2.7/site-packages/xgboost/core.py", line 178, in _check_call
raise XGBoostError(_LIB.XGBGetLastError())
xgboost.core.XGBoostError: [10:15:09] /workspace/src/metric/rank_metric.cc:144: Check failed: !auc_error AUC: the dataset only contains pos or neg samples

中文疑问句判断的数据集能够提供下？

如题

如果我希望用自己的語料去train pos tagger model,,,應預備甚麼？

我己完成word segmentation model，效果令人滿意，如果我希望用閣下的pos tagger去train一個基於分詞結果而判別各word tokens的pos tagger，應該把語料轉換成怎樣的結構？

请问能否提供下训练数据的下载链接？

如题

Do you have the trained model

I want to run your code,but i didn't find any useful model.could you send me your model you have trained and vocabunary dictionary？

CRF训练命名实体识别的时候，如何在字特征的基础上增加词性特征

CRF训练命名实体识别的时候，如何在字特征的基础上增加词性特征，使得精确度更高

打扰了这个xgboost.core.XGBoostError: Invalid Parameter format for silent expect boolean but value='91'怎么改呀，谢谢

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\Lenovo\AppData\Local\Temp\jieba.cache
Loading model cost 0.906 seconds.
Prefix dict has been built succesfully.
Traceback (most recent call last):
File "C:/Users/Lenovo/Desktop/interrogative/manage.py", line 3, in
train()
File "C:\Users\Lenovo\Desktop\interrogative\src\api.py", line 17, in train
model.train()
File "C:\Users\Lenovo\Desktop\interrogative\src\model.py", line 81, in train
_, best_param, best_iter_round = self.model_param_select()
File "C:\Users\Lenovo\Desktop\interrogative\src\model.py", line 67, in model_param_select
early_stopping_rounds=self.early_stopping_rounds) # stop when metrics not get better
File "D:\python project\venv\lib\site-packages\xgboost\training.py", line 445, in cv
fold.update(i, obj)
File "D:\python project\venv\lib\site-packages\xgboost\training.py", line 230, in update
self.bst.update(self.dtrain, iteration, fobj)
File "D:\python project\venv\lib\site-packages\xgboost\core.py", line 1109, in update
dtrain.handle))
File "D:\python project\venv\lib\site-packages\xgboost\core.py", line 176, in _check_call
raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: Invalid Parameter format for silent expect boolean but value='91'

关于NER的F1值

你好；

有幸看到你的代码，我在跑CRF的时候，想问一下，用类别（BIO）做的F1是不是不好啊，我感觉应该用实际的识别出的实体的结果做F1会好一点？

Xgboost 的中文疑问句判别模型中读取配置文件转换 json 出错

你好，我下载你的代码学习过程中，运行 /interrogative/manage.py 出现报错：

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1758, in <module>
    main()
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1752, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1147, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/jinglun/PycharmProjects/DownloadProjects/nlp_base/interrogative/manage.py", line 4, in <module>
    train()
  File "/Users/jinglun/PycharmProjects/DownloadProjects/nlp_base/interrogative/src/api.py", line 17, in train
    model.train()
  File "/Users/jinglun/PycharmProjects/DownloadProjects/nlp_base/interrogative/src/model.py", line 81, in train
    self.initialize_model()
  File "/Users/jinglun/PycharmProjects/DownloadProjects/nlp_base/interrogative/src/model.py", line 40, in initialize_model
    self.max_depth = to_json(self.config.get('model', 'max_depth'))
  File "/Users/jinglun/PycharmProjects/DownloadProjects/nlp_base/interrogative/src/util.py", line 16, in to_json
    return demjson.decode(text, encoding='utf-8')
  File "/Users/jinglun/software/miniconda2/envs/nlp_base36/lib/python3.6/site-packages/demjson.py", line 5699, in decode
    return_stats=(return_stats or write_stats) )
  File "/Users/jinglun/software/miniconda2/envs/nlp_base36/lib/python3.6/site-packages/demjson.py", line 4915, in decode
    raise errors[0]
  File "/Users/jinglun/software/miniconda2/envs/nlp_base36/lib/python3.6/site-packages/demjson.py", line 2428, in set_input
    self.buf = buffered_stream( txt, encoding=encoding )
  File "/Users/jinglun/software/miniconda2/envs/nlp_base36/lib/python3.6/site-packages/demjson.py", line 1614, in __init__
    self.set_text( txt, encoding )
  File "/Users/jinglun/software/miniconda2/envs/nlp_base36/lib/python3.6/site-packages/demjson.py", line 1685, in set_text
    raise newerr
  File "/Users/jinglun/software/miniconda2/envs/nlp_base36/lib/python3.6/site-packages/demjson.py", line 1675, in set_text
    decoded = helpers.unicode_decode( txt, encoding )
  File "/Users/jinglun/software/miniconda2/envs/nlp_base36/lib/python3.6/site-packages/demjson.py", line 1256, in unicode_decode
    unitxt, numbytes = cdk.decode( txt, **cdk_kw )  # DO THE DECODE HERE!
  File "/Users/jinglun/software/miniconda2/envs/nlp_base36/lib/python3.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
demjson.JSONDecodeError: a Unicode decoding error occurred

具体代码行是如下 /interrogative/model.py 下这行：

self.max_depth = to_json(self.config.get('model', 'max_depth'))

我的 config.py 中 model 配置没有改动，如下：

'model': {
                'max_depth': [4, 5, 6],
                'eta': [0.1, 0.05, 0.02],
                'subsample': [0.5, 0.7, 1.0],
                'max_iterations': 100,
                'objective': ['binary:logistic'],
                'silent': [1],
                'num_boost_round': 2000,
                'nfold': 5,
                'stratified': 1,
                'metrics': 'auc',
                'early_stopping_rounds': 50,
                'model_path': ' src/data/{}.model'
            }

不是很明白为什么会报这个错误，网上搜索也没有找到解决方法，请教一下这个可以怎么解决吗？

help

您好，请问数据能够分享么，或者，能够提供获取训练语料的途径么？

你好，我想问一下，您的（依存分析：基于序列标注的中文依存句法分析模型实现 https://blog.csdn.net/sinat_33741547/article/details/79321401），对语料的预处理的程序能提供一下吗？还有我的呈现报错，错误为：ConfigParser.NoSectionError: No section: 'depparser'

你好，我想问一下，您的（依存分析：基于序列标注的中文依存句法分析模型实现 https://blog.csdn.net/sinat_33741547/article/details/79321401），对语料的预处理的程序能提供一下吗？还有我的呈现报错，错误为：ConfigParser.NoSectionError: No section: 'depparser'，如果可以能远程指导一下吗？都调试了好多天了，联系：[email protected].
您的分词的代码，我已经调通，非常希望能得到您的帮助

ModuleNotFoundError: No module named 'model'

When I use 'from interrogative.api import *'

The error occurs:
Traceback (most recent call last):
File "", line 1, in
File "/home1/lx/nlp_base/interrogative/interrogative/api.py", line 7, in
from model import get_model
ModuleNotFoundError: No module named 'model'

训练完之后，安装上面的例子进行命名实体识别，没有识别出来。

from ner.api import recognize
sentence = u'新华社北京十二月三十一日电(**人民广播电台记者刘振英、新华社记者张宿堂)今天是一九九七年的最后一天。'
u'辞旧迎新之际,国务院总理李鹏今天上午来到北京石景山发电总厂考察,向广大企业职工表示节日的祝贺,'
u'向将要在节日期间坚守工作岗位的同志们表示慰问'
predict = recognize(sentence)

##########################################################################
y_predict = self.model.predict(features)
这一步出来的是<type 'list'>: [['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']]，请问遇见过这个问题吗

NER的训练语料现在找不到了

NER的训练语料现在找不到了，能提供下地址吗