Comments (5)
并没有什么特殊的原因,分词这一步可以用更准的深度学习模型;用jieba只是图他的简单和快速。
特定领域jieba分词需要配合着该领域的词库一起用,不然效果确实不好。
from information-extraction-chinese.
大神,在特殊领域jieba切词效果不好,能告知下使用jieba的原因吗?感谢
from information-extraction-chinese.
比如在中文医学领域,用jieba很难切词准确(即使有一部分医学词典的情况下)。在切词错误的情况下,医学实体也很难被标注出来。那么除了只使用char embedding似乎没有更好的解决方法?
from information-extraction-chinese.
医学领域我不熟悉,我的建议是从下面几个思路尝试看看:
- 医学领域我理解应该新词发现的情况不多,因此不断扩充领域词库不论是对分词还是对NER应该仍然是比较靠谱的做法(搜狗应该就有一个蛮大的医学词典?)
- 如果有大量医学语料,建议重新训练针对医学领域的word/char embedding
- 标注数据足够多的话,可以尝试char embedding做输入训练模型,看看效果
from information-extraction-chinese.
非常感谢您的建议。利用词典确实是特定领域NER的比较实在的手段了。现在就是医学领域的标注数据太少啊,成本太高,而且数据不公开。您怎么看今年acl自动化所的那篇联合抽取的论文?或者您对联合抽取有什么理解呢?
from information-extraction-chinese.
Related Issues (20)
- 请问关于NER_IDCNN_CRF中关于训练次数的疑问
- 请问关于NER_IDCNN_CRF中关于训练次数的疑问
- 关于运行结果不一致的问题 HOT 1
- 关于network和train里的参数问题 HOT 1
- 【seg_features】的长度问题
- 作者你好,想请问能否公布一下如何获取数据的代码
- test_GRU如何计算F1值 HOT 2
- GRU模型如何计算F1,recall,如果想根据关系的类别查看准确率该怎么做呢
- 训练新标签
- 如何识别新加的实体类型
- 数据标注工具
- 如何训练自己的数据呢?数据怎么标注?可不可以把实体标签换成自己设计的呢?
- 单个关系一般需要多少的数据量进行训练可以达到一个理想的效果
- 作者 你好,关于关系抽取那块test_GRU跑出来的precision超级低 HOT 1
- 测试输出报错:InvalidArgumentError (see above for traceback): ConcatOp : Dimensions
- test_GRU.py文件如何输出测试准确率呢? HOT 1
- test_GRU
- test_GRU运行第一个main函数,出现错误
- RE关系抽取中数据集的vec.txt
- 让test生成三元组并保存,该怎么做呢
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from information-extraction-chinese.