Are there any plans of adding NER capabilities to Spacy soon? Any recommendations on t

Question: NER with Spacy about spacy HOT 8 CLOSED

explosion commented on April 27, 2024

Question: NER with Spacy

from spacy.

Comments (8)

syllog1sm commented on April 27, 2024

I'm not really happy with any of the existing algorithms, so I've been working on a novel shift-reduce approach. Briefly, where previous work usually encodes the structure into sequence tags, so that a finite-state machine can be used, I think it makes more sense to use a push-down automaton, now that work in parsing with shift-reduce grammars is so well understood.

I'm just starting to get results for this. Currently accuracy is only 77% on OntoNotes, where the Stanford NER system reportedly gets around 84%. I still need to do a lot of bug-fixing and tuning, and I'm not using gazetteers or any semi-supervised learning at the moment.

So, in short: yes, NER is planned, and the bulk of the work is done. It remains to be seen whether my approach will hit comparable accuracy to previous work, but imo it should. Once the accuracy is good, I then need to design and implement the Python API, and write the testing and deployment code. Probably about 1 month all up, given other things I'm working on.

from spacy.

viksit commented on April 27, 2024

Thanks, that's an interesting approach. Are there any specific papers you recommend for PDA based NER?

Also, are you inviting code/collaboration on this yet?

from spacy.

syllog1sm commented on April 27, 2024

As far as I know PDA for NER is a new idea, since most of the previous work uses HMMs and CRFs. If it works, I'll write it up.

I need to set up the contributors agreement, but then I could accept contributions. But, I think it's easiest if I do the research parts myself. Collaborating on that gets complicated.

If you want to weigh in on what sort of API you'd like to see though, that would be very welcome.

from spacy.

viksit commented on April 27, 2024

Ah, I didn't realize it hasn't been tried before - I remember coming across a chinese NER system that used PDAs, but I can't find that paper. Would you be interested in sharing some high level thoughts on the PDA/NER approach that you're taking?

Re: collaborating on the research parts - just an idea - it might be interesting to have a shared ipynb or some such, on one of the github style research collaboration platforms.

Definitely, let me think about the APIs. I've always thought that the GATE or UIMA style, and even the Stanford NER APIs have been super heavy weight.

It would be good to have a visual representation of the parse tree, like NLTKs as this progresses,

(S
  Over/IN
  (NP a/DT cup/NN)
  of/IN
  (NP coffee/NN)
  ,/,
  (NP Mr./NNP Stone/NNP)
  told/VBD
  (NP his/PRP$ story/NN)
  ./.)

from spacy.

syllog1sm commented on April 27, 2024

Quick update:

This is progressing well: I'm now getting 81% on the OntoNotes WSJ corpus. I expect gazzetteers from Wikidata will bring this in line with current state-of-the-art.

It's hard to say, but this might be ready by the end of April.

from spacy.

honnibal commented on April 27, 2024

NER now included, although the model still needs accuracy improvements. Currently it's getting 82% F on OntoNotes, and 86% on CoNLL '03. State-of-the-art is around 85% and 90% on these benchmarks. Improvements are in the works.

from spacy.

viksit commented on April 27, 2024

Sweet - is there a pointer on usage?

from spacy.

lock commented on April 27, 2024

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

from spacy.

Question: NER with Spacy about spacy HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent