Code Monkey home page Code Monkey logo

Comments (5)

crownpku avatar crownpku commented on July 21, 2024 2

并没有什么特殊的原因,分词这一步可以用更准的深度学习模型;用jieba只是图他的简单和快速。
特定领域jieba分词需要配合着该领域的词库一起用,不然效果确实不好。

from information-extraction-chinese.

mingmingfu avatar mingmingfu commented on July 21, 2024

大神,在特殊领域jieba切词效果不好,能告知下使用jieba的原因吗?感谢

from information-extraction-chinese.

mingmingfu avatar mingmingfu commented on July 21, 2024

比如在中文医学领域,用jieba很难切词准确(即使有一部分医学词典的情况下)。在切词错误的情况下,医学实体也很难被标注出来。那么除了只使用char embedding似乎没有更好的解决方法?

from information-extraction-chinese.

crownpku avatar crownpku commented on July 21, 2024

医学领域我不熟悉,我的建议是从下面几个思路尝试看看:

  1. 医学领域我理解应该新词发现的情况不多,因此不断扩充领域词库不论是对分词还是对NER应该仍然是比较靠谱的做法(搜狗应该就有一个蛮大的医学词典?)
  2. 如果有大量医学语料,建议重新训练针对医学领域的word/char embedding
  3. 标注数据足够多的话,可以尝试char embedding做输入训练模型,看看效果

from information-extraction-chinese.

mingmingfu avatar mingmingfu commented on July 21, 2024

非常感谢您的建议。利用词典确实是特定领域NER的比较实在的手段了。现在就是医学领域的标注数据太少啊,成本太高,而且数据不公开。您怎么看今年acl自动化所的那篇联合抽取的论文?或者您对联合抽取有什么理解呢?

from information-extraction-chinese.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.