hit-scir / pyltp Goto Github PK
View Code? Open in Web Editor NEWThis project forked from huangfj/pyltp
pyltp: the python extension for LTP
This project forked from huangfj/pyltp
pyltp: the python extension for LTP
比如:阿里联手工商破获重庆亿元刷单案,分词结果是:阿里|联手|工商|破获|重庆亿|元|刷单|案,怎么来解决“重庆亿”粘合的问题,我加了自定义词典了,还是会出现这个问题。
命名实体识别,语义角色标注在调用后返回为空。分词,词性标注,分词仍然能用。
用的anaconda自带的Spyder编辑器,python 3.5
`from pyltp import Parser,Postagger,Segmentor,NamedEntityRecognizer,SementicRoleLabeller
segmentor = Segmentor() # 分词
segmentor.load('E:\ltp-data-v3.3.1(1)\ltp_data\cws.model') # 加载模型
words = segmentor.segment('运用几个功能返回结果为空,但是并未报错。') # 分词
print ('\t'.join(words))
segmentor.release() # 释放模型
postagger=Postagger()#词性标注
postagger.load('E:\ltp-data-v3.3.1(1)\ltp_data\pos.model')
postages=postagger.postag(words)
print( '\t'.join(postages))
postagger.release()
parser = Parser() # 依存句法分析
parser.load('E:\ltp-data-v3.3.1(1)\ltp_data\parser.model') # 加载模型
arcs = parser.parse(words, postages) # 句法分析
print ("\t".join("%d:%s" % (arc.head, arc.relation) for arc in arcs))
parser.release() # 释放模型
recognizer = NamedEntityRecognizer() # 命名实体识别
recognizer.load('E:\ltp-data-v3.3.1(1)\ltp_data\ner.model') # 加载模型
netags = recognizer.recognize(words, postages) # 命名实体识别
print ('\t'.join(netags))
recognizer.release()
labeller = SementicRoleLabeller() # 语义角色标注
labeller.load('E:\ltp-data-v3.3.1(1)\ltp_data\srl') # 加载模型
roles = labeller.label(words, postages, netags, arcs) # 语义角色标注
for role in roles:
print (role.index, "".join(
["%s:(%d,%d)" % (arg.name, arg.range.start, arg.range.end) for arg in role.arguments]))
labeller.release() # 释放模型
返回结果为
运用 几 个 功能 返回 结果 为 空 , 但是 并 未 报错 。
v m q n v n v a wp c d d v wp
0:HED 3:ATT 4:ATT 1:VOB 1:COO 5:VOB 5:COO 7:VOB 1:WP 13:ADV 13:ADV 13:ADV 1:COO 1:WP
`
下面是示例:
sentence = "**进出口银行与**银行加强合作" segmentor = Segmentor() segmentor.load(os.path.join(MODELDIR, "cws.model")) words = segmentor.segment(sentence) print "\t".join(words)
输出的内容是: 涓浗 杩涘嚭鍙� 閾惰 涓� 涓浗閾惰 鍔犲己 鍚堜綔
如果我用:
sentence = u"**进出口银行与**银行加强合作"
程序又会报错。
请问2.7应该如果使用? 感谢!
用例子获得了下面的结果, 但是第三行的句法分析的结果怎么看,有人可以每个给解释一下吗?
例如:
**,3:ATT, 是说 ‘**’ 和 ‘与’这两个词是ATT关系吗,ATT是定中的意思? ** <-与?
与, 5:LAD, 是说 '与' 和 ‘加强’ 这两个词是LAD关系吗,LAD就是左附加关系, 与<-加强?
我的理解对吗? 但是感觉结果好像不太对啊, ‘与’不是应该和'**银行'是左附加关系么?
** 进出口 银行 与 **银行 加强 合作
ns v n c ni v v
3:ATT 3:ATT 6:SBV 5:LAD 3:COO 0:HED 6:VOB
请梓翔把自己填进去吧 😄
Env: Win7 + cygwin x64
Python: [GCC 4.8.3] on cygwin
GCC: 4.9.2
无论是pip安装还是开发版安装均编译失败,错误信息如下:
running build
running build_ext
building 'pyltp' extension
gcc -fno-strict-aliasing -ggdb -O2 -pipe -Wimplicit-function-declaration -fdebug-prefix-map=/usr/src/ports/python/python-2.7.8-1.x86_64/build=/usr/src/debug/python-2.7.8-1 -fdebug-prefix-map=/usr/src/ports/python/python-2.7.8-1.x86_64/src/Python-2.7.8=/usr/src/debug/python-2.7.8-1 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Iltp/include/ -Iltp/thirdparty/boost/include/ -Iltp/thirdparty/maxent/ -Iltp/src/ -Iltp/src/segmentor/ -Iltp/src/postagger/ -Iltp/src/ner/ -Iltp/src/parser -Iltp/src/srl/ -Iltp/src/utils/ -Iltp/src/__util/ -Iltp/src/srl/ -Ipatch/include/ -I/usr/include/python2.7 -c src/pyltp.cpp -o build/temp.cygwin-1.7.35-x86_64-2.7/src/pyltp.o
cc1plus: 警告:command line option ‘-Wimplicit-function-declaration’ is valid for C/ObjC but not for C++
cc1plus: 警告:command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from patch/include/boost/python/detail/prefix.hpp:13:0,
from patch/include/boost/python/args.hpp:8,
from patch/include/boost/python.hpp:11,
from src/pyltp.cpp:14:
patch/include/boost/python/detail/wrap_python.hpp:88:0: 警告:“SIZEOF_LONG”重定义
^
In file included from patch/include/boost/python/detail/wrap_python.hpp:50:0,
from patch/include/boost/python/detail/prefix.hpp:13,
from patch/include/boost/python/args.hpp:8,
from patch/include/boost/python.hpp:11,
from src/pyltp.cpp:14:
/usr/include/python2.7/pyconfig.h:1013:0: 附注:这是先前定义的位置
#define SIZEOF_LONG 8
^
In file included from /usr/include/python2.7/Python.h:58:0,
from patch/include/boost/python/detail/wrap_python.hpp:142,
from patch/include/boost/python/detail/prefix.hpp:13,
from patch/include/boost/python/args.hpp:8,
from patch/include/boost/python.hpp:11,
from src/pyltp.cpp:14:
/usr/include/python2.7/pyport.h:886:2: 错误:#error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?)."
#error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?)."
^
In file included from patch/include/boost/python/object/make_instance.hpp:9:0,
from patch/include/boost/python/object/make_ptr_instance.hpp:8,
from patch/include/boost/python/to_python_indirect.hpp:11,
from patch/include/boost/python/converter/arg_to_python.hpp:10,
from patch/include/boost/python/call.hpp:15,
from patch/include/boost/python/object_core.hpp:14,
from patch/include/boost/python/args.hpp:25,
from patch/include/boost/python.hpp:11,
from src/pyltp.cpp:14:
patch/include/boost/python/object/instance.hpp:14:36: 警告:类型属性在定义后被忽略 [-Wattributes]
struct BOOST_PYTHON_DECL_FORWARD instance_holder;
^
error: command 'gcc' failed with exit status 1
请问该如何解决,谢谢。
环境: ubuntu 14.04 x64, 下载
系统默认python 为 python2.7
cmake 也用apt安装了
python-config --includes
-I/usr/include/python3.4m -I/usr/include/python3.4m
2.运行cmake -DLTP_HOME=/home/he/ltp-3.1.2 出错, 信息如下
CMake Error at /usr/share/cmake-2.8/Modules/FindPackageHandleStandardArgs.cmake:108 (message):
Could NOT find PythonLibs (missing: PYTHON_LIBRARIES PYTHON_INCLUDE_DIRS)
Call Stack (most recent call first):
/usr/share/cmake-2.8/Modules/FindPackageHandleStandardArgs.cmake:315 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-2.8/Modules/FindPythonLibs.cmake:208 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
boost_python/CMakeLists.txt:1 (find_package)
CMake Error at /usr/share/cmake-2.8/Modules/FindPackageHandleStandardArgs.cmake:108 (message):
Could NOT find PythonLibs (missing: PYTHON_LIBRARIES PYTHON_INCLUDE_DIRS)
Call Stack (most recent call first)
请问该如何解决,谢谢
按照如下命令安装 pyltp 之后:
$ git clone https://github.com/HIT-SCIR/pyltp
$ git submodule init
$ git submodule update
$ python setup.py install
不论是否对 ltp_data 进行替换,我这一直显示如下信息并造成Python崩溃:
$ python example.py
Segmentor: Model not loaded!
����P��S��
Python(10114,0x7fff7f118000) malloc: *** error for object 0x10c721bd8: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6
将 postagger、parser、recognizer 及 labeller 注释之后 Python 不再崩溃,但是仍旧显示 “Segmentor: Model not loaded!”
Python process 及机器信息如下:
Process: Python [10123]
Path: /usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
Identifier: Python
Version: 2.7.11 (2.7.11)
Code Type: X86-64 (Native)
Parent Process: bash [9210]
Responsible: iTerm [7398]
Model: MacBookPro12,1, BootROM MBP121.0167.B16, 2 processors, Intel Core i7, 3.1 GHz, 16 GB, SMC 2.28f7
Graphics: Intel Iris Graphics 6100, Intel Iris Graphics 6100, Built-In
Memory Module: BANK 0/DIMM0, 8 GB, DDR3, 1867 MHz, 0x80CE, 0x4B3445424533303445422D45474346202020
_Memory Module: BANK 1/DIMM0, 8 GB, DDR3, 1867 MHz, 0x80CE, _0x4B3445424533303445422D45474346202020
Thunderbolt Bus: MacBook Pro, Apple Inc., 27.1
加载cws.model好像没成功
我想用Python采用多进程池调用LTP
from multiprocessing import Pool
if __name__ == '__main__':
p = Pool(int(arguments[u'--count']))
for page in range(start, end):
p.apply_async(segement_task, args=(arguments[u'LTP_DATA_MODEL'], page, ))
print u'Waiting for all subprocesses done...'
p.close()
p.join
在子进程中采用如下代码加载
def segement_task(model_path, page):
segmentor = Segmentor()
segmentor.load(os.path.join(model_path, "cws.model"))
print u'load segmentor'
postagger = Postagger()
postagger.load(os.path.join(model_path, "pos.model"))
print u'load postagger'
parser = Parser()
parser.load(os.path.join(model_path, "parser.model"))
print u'load parser'
recognizer = NamedEntityRecognizer()
recognizer.load(os.path.join(model_path, "ner.model"))
print u'load recognizer'
labeller = SementicRoleLabeller()
labeller.load(os.path.join(model_path, "srl/"))
print u'load labeller'
界面只打印出 load segmentor
就退出了
请问应该如何多进程调用
打扰了!我想问下pyltp可以导入自己的附加词语字典来使分词效果更适合自己的运用场景?
由于需要将pyltp作为分词器接入scikit-learn的CountVectorizer,在调用过程中其内部调用无法将pyltp返回的utf-8字符串decode('utf8')。希望可以将字符编码转换的过程内置到pyltp中,也就是encode('utf8')和decode('utf8')处理可以在pyltp中执行。这样可以提供极大的便利。谢谢!
在Mac OS系统下安装失败,提示信息如下:
clang: error: invalid deployment target for -stdlib=libc++ (requires OS X 10.7 or later)
clang: error: invalid deployment target for -stdlib=libc++ (requires OS X 10.7 or later)
error: command 'clang++' failed with exit status 1
----------------------------------------
Command "/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python -u -c "import setuptools, tokenize;file='/private/var/folders/dq/15h1sz4j16502ttfyfvs09tw0000gn/T/pip-build-sNJWOa/pyltp/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /var/folders/dq/15h1sz4j16502ttfyfvs09tw0000gn/T/pip-kTPFza-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/dq/15h1sz4j16502ttfyfvs09tw0000gn/T/pip-build-sNJWOa/pyltp/
clang: error: invalid deployment target for -stdlib=libc++ (requires OS X 10.7 or later)
error: command 'clang++' failed with exit status 1
Failed building wheel for pyltp
Running setup.py clean for pyltp
Failed to build pyltp
Installing collected packages: pyltp
Running setup.py install for pyltp ... error
Complete output from command /Users/wangshang1011/anaconda/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/0y/9qm2kz4s1jz3ghpbhqt6chg00000gn/T/pip-build-oagyNZ/pyltp/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /var/folders/0y/9qm2kz4s1jz3ghpbhqt6chg00000gn/T/pip-Wljke6-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_ext
building 'pyltp' extension
creating build
creating build/temp.macosx-10.6-x86_64-2.7
creating build/temp.macosx-10.6-x86_64-2.7/src
creating build/temp.macosx-10.6-x86_64-2.7/ltp
creating build/temp.macosx-10.6-x86_64-2.7/ltp/thirdparty
creating build/temp.macosx-10.6-x86_64-2.7/ltp/thirdparty/boost
creating build/temp.macosx-10.6-x86_64-2.7/ltp/thirdparty/boost/libs
creating build/temp.macosx-10.6-x86_64-2.7/ltp/thirdparty/boost/libs/regex
creating build/temp.macosx-10.6-x86_64-2.7/ltp/thirdparty/boost/libs/regex/src
creating build/temp.macosx-10.6-x86_64-2.7/ltp/thirdparty/maxent
creating build/temp.macosx-10.6-x86_64-2.7/ltp/src
creating build/temp.macosx-10.6-x86_64-2.7/ltp/src/splitsnt
creating build/temp.macosx-10.6-x86_64-2.7/ltp/src/segmentor
creating build/temp.macosx-10.6-x86_64-2.7/ltp/src/postagger
creating build/temp.macosx-10.6-x86_64-2.7/ltp/src/ner
creating build/temp.macosx-10.6-x86_64-2.7/ltp/src/parser.n
creating build/temp.macosx-10.6-x86_64-2.7/ltp/src/srl
creating build/temp.macosx-10.6-x86_64-2.7/patch
creating build/temp.macosx-10.6-x86_64-2.7/patch/libs
creating build/temp.macosx-10.6-x86_64-2.7/patch/libs/python
creating build/temp.macosx-10.6-x86_64-2.7/patch/libs/python/src
creating build/temp.macosx-10.6-x86_64-2.7/patch/libs/python/src/object
creating build/temp.macosx-10.6-x86_64-2.7/patch/libs/python/src/converter
clang++ -fno-strict-aliasing -I/Users/wangshang1011/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Iltp/include/ -Iltp/thirdparty/boost/include/ -Iltp/thirdparty/eigen-3.2.4 -Iltp/thirdparty/maxent/ -Iltp/src/ -Iltp/src/splitsnt -Iltp/src/segmentor/ -Iltp/src/postagger/ -Iltp/src/ner/ -Iltp/src/parser.n/ -Iltp/src/srl/ -Iltp/src/utils/ -Iltp/src/srl/ -Ipatch/include/ -I/Users/wangshang1011/anaconda/include/python2.7 -c src/pyltp.cpp -o build/temp.macosx-10.6-x86_64-2.7/src/pyltp.o -std=c++11 -Wno-c++11-narrowing -stdlib=libc++
clang: error: invalid deployment target for -stdlib=libc++ (requires OS X 10.7 or later)
error: command 'clang++' failed with exit status 1
>>> from pyltp import Segmentor
>>> segmentor = Segmentor()
>>> segmentor.load("d:/data/ltp/cws.model")
>>> words = segmentor.segment("元芳你怎么看")
>>> print "|".join(words)
>>>
我的环境是win7 64位。 Python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on win32
running install
running bdist_egg
running egg_info
creating pyltp.egg-info
writing pyltp.egg-info/PKG-INFO
writing top-level names to pyltp.egg-info/top_level.txt
writing dependency_links to pyltp.egg-info/dependency_links.txt
writing manifest file 'pyltp.egg-info/SOURCES.txt'
reading manifest file 'pyltp.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '.hpp' under directory 'ltp/src/framework'
warning: no files found matching '.hpp' under directory 'ltp/src/segmentor'
warning: no files found matching '.hpp' under directory 'ltp/src/postagger'
warning: no files found matching '.hpp' under directory 'ltp/src/ner'
warning: no files found matching '.hpp' under directory 'ltp/src/parser.n'
warning: no files found matching '.hpp' under directory 'ltp/src/srl'
writing manifest file 'pyltp.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.6-intel/egg
running install_lib
running build_ext
building 'pyltp' extension
creating build
creating build/temp.macosx-10.6-intel-2.7
creating build/temp.macosx-10.6-intel-2.7/src
creating build/temp.macosx-10.6-intel-2.7/ltp
creating build/temp.macosx-10.6-intel-2.7/ltp/thirdparty
creating build/temp.macosx-10.6-intel-2.7/ltp/thirdparty/boost
creating build/temp.macosx-10.6-intel-2.7/ltp/thirdparty/boost/libs
creating build/temp.macosx-10.6-intel-2.7/ltp/thirdparty/boost/libs/regex
creating build/temp.macosx-10.6-intel-2.7/ltp/thirdparty/boost/libs/regex/src
creating build/temp.macosx-10.6-intel-2.7/ltp/thirdparty/maxent
creating build/temp.macosx-10.6-intel-2.7/ltp/src
creating build/temp.macosx-10.6-intel-2.7/ltp/src/splitsnt
creating build/temp.macosx-10.6-intel-2.7/ltp/src/segmentor
creating build/temp.macosx-10.6-intel-2.7/ltp/src/postagger
creating build/temp.macosx-10.6-intel-2.7/ltp/src/ner
creating build/temp.macosx-10.6-intel-2.7/ltp/src/parser.n
creating build/temp.macosx-10.6-intel-2.7/ltp/src/srl
creating build/temp.macosx-10.6-intel-2.7/patch
creating build/temp.macosx-10.6-intel-2.7/patch/libs
creating build/temp.macosx-10.6-intel-2.7/patch/libs/python
creating build/temp.macosx-10.6-intel-2.7/patch/libs/python/src
creating build/temp.macosx-10.6-intel-2.7/patch/libs/python/src/object
creating build/temp.macosx-10.6-intel-2.7/patch/libs/python/src/converter
clang++ -fno-strict-aliasing -fno-common -dynamic -arch i386 -arch x86_64 -g -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Iltp/include/ -Iltp/thirdparty/boost/include/ -Iltp/thirdparty/eigen-3.2.4 -Iltp/thirdparty/maxent/ -Iltp/src/ -Iltp/src/splitsnt -Iltp/src/segmentor/ -Iltp/src/postagger/ -Iltp/src/ner/ -Iltp/src/parser.n/ -Iltp/src/srl/ -Iltp/src/utils/ -Iltp/src/srl/ -Ipatch/include/ -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/pyltp.cpp -o build/temp.macosx-10.6-intel-2.7/src/pyltp.o -std=c++11 -Wno-c++11-narrowing -stdlib=libc++
clang: error: invalid deployment target for -stdlib=libc++ (requires OS X 10.7 or later)
clang: error: invalid deployment target for -stdlib=libc++ (requires OS X 10.7 or later)
error: command 'clang++' failed with exit status 1
System Version: macOS 10.12 (16A323)
Kernel Version: Darwin 16.0.0
https://ci.appveyor.com/project/Oneplus/ltp4j/build/14
vc9 (python 2.7) 的编译环境没有cstdint。
@endyul 我更倾向于第一种方案。
In file included from ltp/src/ner/decoder.cpp:1:
In file included from ltp/src/ner/decoder.h:8:
ltp/src/utils/unordered_set.hpp:75:27: error: redefinition of '__gnu_cxx::hash<unsigned long long>'
template<> struct hash<unsigned long long> {
^~~~~~~~~~~~~~~~~~~~~~~~
ltp/src/utils/unordered_map.hpp:75:27: note: previous definition is here
template<> struct hash<unsigned long long> {
^
In file included from ltp/src/ner/decoder.cpp:1:
In file included from ltp/src/ner/decoder.h:8:
ltp/src/utils/unordered_set.hpp:80:37: error: redefinition of 'hash<type-parameter-0-0 *>'
template<typename T> struct hash<T *> {
^~~~~~~~~
ltp/src/utils/unordered_map.hpp:80:37: note: previous definition is here
template<typename T> struct hash<T *> {
^
In file included from ltp/src/ner/decoder.cpp:1:
In file included from ltp/src/ner/decoder.h:8:
ltp/src/utils/unordered_set.hpp:85:27: error: redefinition of '__gnu_cxx::hash<std::string>'
template<> struct hash<std::string> {
^~~~~~~~~~~~~~~~~
ltp/src/utils/unordered_map.hpp:85:27: note: previous definition is here
template<> struct hash<std::string> {
^
In file included from ltp/src/ner/decoder.cpp:1:
ltp/src/ner/decoder.h:15:8: error: no template named 'unordered_set' in namespace 'std'; did you mean 'unordered_map'?
std::unordered_set<size_t> rep;
~~~~~^~~~~~~~~~~~~
unordered_map
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__hash_table:217:85: note: 'unordered_map' declared here
template <class, class, class, class, class> friend class _LIBCPP_TYPE_VIS_ONLY unordered_map;
^
In file included from ltp/src/ner/decoder.cpp:1:
ltp/src/ner/decoder.h:15:8: error: too few template arguments for class template 'unordered_map'
std::unordered_set<size_t> rep;
^
此处省略一堆warning和note。。。
56 warnings and 5 errors generated.
error: command 'cc' failed with exit status 1
Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/private/tmp/pip_build_root/pyltp/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().
错误提示:
ltp/src/utils/unordered_map.hpp:8:12: fatal error: 'tr1/unordered_map' file not found
#include <tr1/unordered_map>
^
1 error generated.
pyltp里arc.head是从0开始的(0表示根),但好像语义角色标注和ltp里都是用-1表示根
有一些分词由于和自定义需求不同,分错了。 请问可以添加自定义词表,使得某个领域的分词更适用吗? 可以的话,如果添加。谢谢!
Anaconda virtualenv, Python 2.7.11
(venv) $ pip install pyltp
Collecting pyltp
Downloading pyltp-0.1.8.tar.gz (3.4MB)
Installing collected packages: pyltp
Running setup.py install for pyltp: started
Running setup.py install for pyltp: finished with status 'error'
running install
running build
running build_ext
building 'pyltp' extension
creating build
creating build/temp.macosx-10.5-x86_64-2.7
creating build/temp.macosx-10.5-x86_64-2.7/src
creating build/temp.macosx-10.5-x86_64-2.7/ltp
creating build/temp.macosx-10.5-x86_64-2.7/ltp/thirdparty
creating build/temp.macosx-10.5-x86_64-2.7/ltp/thirdparty/boost
... ...
ltp/src/ner/decoder.cpp:27:7: error: use of undeclared identifier 'rep'
rep.insert(from * T + to);
^
ltp/src/ner/decoder.cpp:38:10: error: use of undeclared identifier 'rep'
return rep.find(code) != rep.end();
^
ltp/src/ner/decoder.cpp:38:28: error: use of undeclared identifier 'rep'
return rep.find(code) != rep.end();
^
In file included from ltp/src/ner/decoder.cpp:1:
In file included from ltp/src/ner/decoder.h:7:
ltp/src/utils/smartmap.hpp:72:5: warning: field '_cap_entries' will be initialized after field '_num_buckets' [-Wreorder]
_cap_entries(INIT_CAP_ENTRIES),
^
ltp/src/utils/smartmap.hpp:589:3: note: in instantiation of member function 'ltp::utility::SmartMap<int, ltp::utility::__Default_CharArray_HashFunction, ltp::utility::__Default_CharArray_EqualFunction>::SmartMap' requested here
IndexableSmartMap() : entries(0), cap_entries(0) {}
^
ltp/src/utils/smartmap.hpp:75:5: warning: field '_len_key_buffer' will be initialized after field '_hash_buckets' [-Wreorder]
_len_key_buffer(0),
^
ltp/src/utils/smartmap.hpp:79:5: warning: field '_val_buffer' will be initialized after field '_hash_buckets_volumn' [-Wreorder]
_val_buffer(0),
^
In file included from ltp/src/ner/decoder.cpp:1:
In file included from ltp/src/ner/decoder.h:6:
In file included from ltp/src/framework/decoder.h:5:
ltp/src/utils/math/sparsevec.h:154:10: warning: private field 'norm' is not used [-Wunused-private-field]
double norm;
^
12 warnings and 8 errors generated.
error: command 'gcc' failed with exit status 1
$ gcc -v
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin14.5.0
Thread model: posix
segmentor = Segmentor()
segmentor.load_with_lexicon('E:/code/LTP/ltp_data/cws.model','E:/code/python_code/extractEventsTestApi/searchTrigger/dict/mydict.txt'),mydict.txt是u8编码,格式如下
恒生
农银信用债
等等
出现Segmentor: Model not loaded!, 是怎么回事
我不清楚是不是我的用法有问题啊
In [1]: from pyltp import Postagger
In [2]: postagger = Postagger()
In [3]: postagger.load("/data/ltp/ltp-models/3.2.0-server/ltp_data/pos.model")
In [4]: postagger.postag(["A", "B", "C"])
---------------------------------------------------------------------------
ArgumentError Traceback (most recent call last)
<ipython-input-4-8af9244afe40> in <module>()
----> 1 postagger.postag(["A", "B", "C"])
ArgumentError: Python argument types in
Postagger.postag(Postagger, list)
did not match C++ signature:
postag(Postagger {lvalue}, std::vector<std::string, std::allocator<std::string> >)
主要原因是我们在封装时boost.Python没提供python list of str的接口。
Dear All,
Thanks for your work. I recently tried to use pyltp to process some new data and found it no longer working. It looks like the program stops at the iteration over the second file in the directory. I used to have more than two thousand files processed with pyltp a few months ago.
I have tried to used file pointer as well, still got nothing but the first file result. Could you guys have a look at my shabby python code?
Best regards,
Y.
import os
from pyltp import Segmentor
segmentor = Segmentor()
filename = r'/home/jingjin/code/py_code/ltp_data/cws.model'
if os.path.exists(filename):
print 'cws.model exists'
else:
print 'sorry'
segmentor.load(filename)
words = segmentor.segment("元芳你怎么看")
print "|".join(words)
输出:
cws.model exists
Segmentor: Model not loaded!
环境:
ubuntu 14.04 LTS
32位
python2.7.6
如果可以,在何处设置呢?ltp命令行似乎是提供了的?
我分词可以的,但是分句的时候出现问题
Boost.Python.ArgumentError: Python argument types in
SentenceSplitter.split(unicode)
did not match C++ signature:
split(class std::basic_string<char,struct std::char_traits,class std::allocator >)
环境: windows10, python2.7
`segmentor = Segmentor()
segmentor.load('E:/LTP/3.3.0/ltp_data/cws.model')
postagger = Postagger()
postagger.load('E:/LTP/3.3.0/ltp_data/pos.model')
parser = Parser()
parser.load('E:/LTP/3.3.0/ltp_data/parser.model')`
我在用上面代码调用的时候,程序出现这个错误:
Process finished with exit code -1073741819 (0xC0000005)
查了一遍,实在不知道错误出现在哪里。 使用anaconda安装的,之前用的版本0.1.8是没问题的,这两天换了新的机子, 用pip install安装,刚刚看了一下,安装的版本是: Anaconda2\Lib\site-packages\pyltp-0.1.9.1.dist-info。
请问是什么原因呢? 谢谢!!
错误信息:Boost.Python.ArgumentError: Python argument types in
Segmentor.segment(Segmentor, unicode)
did not match C++ signature:
segment(struct Segmentor {lvalue}, class std::basic_string<char,struct std::char_traits,class std::allocator >)
方便说一下怎么回事吗
windows 7下只用pip成功安装了pyltp,在注释掉加载lib文件的那行代码后,成功运行example.py。我想问的是,仅仅安装pyltp情况下,是否可以正确执行ltp的所有功能?
单句字长度为0,或长度大于1024会有“Command terminated”的中断,建议可以在说明文件里提示下这个限制,方便定位错误。
release
方法readme
中增加模型下载的说明Traceback (most recent call last):
File "example/example.py", line 11, in
from pyltp import Segmentor, Postagger, Parser, NamedEntityRecognizer, SementicRoleLabeller
ImportError: dlopen(/Library/Python/2.7/site-packages/pyltp-0.1.3-py2.7-macosx-10.10-intel.egg/pyltp.so, 2): Symbol not found: __ZNSbIwSt11char_traitsIwESaIwEE4_Rep11_S_terminalE
Referenced from: /Library/Python/2.7/site-packages/pyltp-0.1.3-py2.7-macosx-10.10-intel.egg/pyltp.so
Expected in: flat namespace
in /Library/Python/2.7/site-packages/pyltp-0.1.3-py2.7-macosx-10.10-intel.egg/pyltp.so
mac os 通过pip install 安装成功了,但是在程序中无法调用pyltp 我又安装git的方法试了一次,还是不行。这是为什么呢
这个python接口是和语言云分析一样的吗,用3.3 model的话?打扰了,谢谢!
I intall pyltp by pip in anaconda 4.2.0 Ubuntu16.04 LTS envirnment, I get a problem when I run import pyltp
:
... undefined symbol: _ZTVNSt7__cxx1119basic_istringstreamIcSt11char_traitsIcESaIcEEE
and then I install by source, still have the same problem. I try to use sudo python setup.py intall
and add package path in sys.path like:
import sys
sys.path.append(/usr/local/lib/python2.7/dist-packages/pyltp-0.1.9.1-py2.7-linux-x86_64.egg)
import pyltp
error message become:
/home/?/anaconda2/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /usr/local/lib/python2.7/dist-packages/pyltp-0.1.9.1-py2.7-linux-x86_64.egg/pyltp.so)
So I think the problem come from anaconda 4.2.0 envirnment libstdc++, I run strings /home/ldy/anaconda2/bin/../lib/libstdc++.so.6 | grep GLIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_3.4.14
GLIBCXX_3.4.15
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBCXX_FORCE_NEW
GLIBCXX_DEBUG_MESSAGE_LENGTH
There is no GLIBCXX_3.4.20
in anaconda 4.2.0 libstdc++
Solution:
my solution is sample:
cd ~/anaconda2/lib
rm libstdc++.so.6.0.19
ln -s /usr/lib/x86_64-linux-gnu/libstdc++.so.6 libstdc++.so.6.0.19
check strings libstdc++.so.6 | grep GLIBCXX
, you will find the following info:
...
GLIBCXX_3.4.16
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.19
GLIBCXX_3.4.20
GLIBCXX_3.4.21
GLIBCXX_3.4.22
GLIBCXX_DEBUG_MESSAGE_LENGTH
I write down this to help other people who meet the same problem.
words = segmentor.segment(sentence)
在python里,words是vectorofString类型,例如我想获得‘**’这个词的位置,怎么使用? 主要目的是想试试这种类型怎么处理
例如,如果是list类型,可以这样获得下标:
words.index('**')
刚刚试了,报错。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.