Is it possible to use character embedding directly and get rid of dependency with Jieb

医学领域我不熟悉，我的建议是从下面几个思路尝试看看：医学领域我理解应该新词发现的情况不多，因此不断扩充领域词库不论是对分词

[NER_IDCNN_CRF] Get rid of dependency with Jieba? about information-extraction-chinese HOT 5 OPEN

crownpku commented on July 21, 2024

[NER_IDCNN_CRF] Get rid of dependency with Jieba?

from information-extraction-chinese.

Comments (5)

crownpku commented on July 21, 2024 2

并没有什么特殊的原因，分词这一步可以用更准的深度学习模型；用jieba只是图他的简单和快速。
特定领域jieba分词需要配合着该领域的词库一起用，不然效果确实不好。

from information-extraction-chinese.

mingmingfu commented on July 21, 2024

大神，在特殊领域jieba切词效果不好，能告知下使用jieba的原因吗？感谢

from information-extraction-chinese.

mingmingfu commented on July 21, 2024

比如在中文医学领域，用jieba很难切词准确（即使有一部分医学词典的情况下）。在切词错误的情况下，医学实体也很难被标注出来。那么除了只使用char embedding似乎没有更好的解决方法？

from information-extraction-chinese.

crownpku commented on July 21, 2024

医学领域我不熟悉，我的建议是从下面几个思路尝试看看：

医学领域我理解应该新词发现的情况不多，因此不断扩充领域词库不论是对分词还是对NER应该仍然是比较靠谱的做法（搜狗应该就有一个蛮大的医学词典？）
如果有大量医学语料，建议重新训练针对医学领域的word/char embedding
标注数据足够多的话，可以尝试char embedding做输入训练模型，看看效果

from information-extraction-chinese.

mingmingfu commented on July 21, 2024

非常感谢您的建议。利用词典确实是特定领域NER的比较实在的手段了。现在就是医学领域的标注数据太少啊，成本太高，而且数据不公开。您怎么看今年acl自动化所的那篇联合抽取的论文？或者您对联合抽取有什么理解呢？

from information-extraction-chinese.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

Comments (5)

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org