Comments (4)
hello , is here anyone who tried to implement another model/framework other than spacy (ner) as a labeling function. i tried to work with flair but didnt work. can anyone help me and thanks in advance .
I am also having this issue at the moment! So far, I've been able to use external models which can be fit into the SpaCy framework: https://spacy.io/universe/category/models
Namely, I've had success with Stanza as there's already a version with the SpaCy wrapper available for this model: spacy-stanza
I'm currently trying to use Flair using a myriad of ways (the SpaCy-wrap tool, changing to the SpaCy tokenizer), but so far have not had any success. I will update if I manage to do this!
In the meantime, having the option to integrate non-SpaCy-based models more easily would definitely be appreciated. I will see how far I get with trying to get Flair to work and keep this thread updated.
This is the errors I get right now by just using the Flair model and retrieving the entities and using the FunctionAnnotator:
IndexError: [E035] Error creating span with start 1 and end 6 for Doc of length 2.
It seems to me that even though I override Flair by using the SpacyTokenizer, there are still some differences which result in a conflict on certain documents.
UPDATE: It seems to me that if there's a way to use the character IDs themselves, over the token IDs, this issue could very easily be mitigated and virtually any model could be used. If there's a way to use SpanAnnotator or FunctionAnnotator in this manner, I'd really appreciate to know how this is done!
from skweak.
Yes, I agree that using character spans instead of token-level spans would make skweak
less spacy-dependent and provide more flexibility. But it would mean rewriting a lot of the code, since right now, the results of the labelling functions are stored as Span
objects, which require token-level indexing. So it's definitely something that would be worth looking into, but it's not the pipeline at the moment.
from skweak.
Thank you for your input @plison! I can understand that this may not be as straightforward to implement.
At the moment, I have been able to figure out a workaround to get Flair working using the FunctionAnnotator
, by using the function doc.char_span
to manually set the entities such that they are supported (and then tokenized) using the existing doc
object. You can see an example below.
This workaround should in theory work for any model which provides character spans for the retrieved entities, however I have only tested it with Flair so far, and haven't had any major issues.
from skweak import heuristics
from flair.data import Sentence
from flair.models import SequenceTagger
...
flair_classifier = SequenceTagger.load("flair/ner-english-large")
def flair_annotator(doc):
sentence = Sentence(doc.text)
flair_classifier.predict(sentence)
spans = []
for entity in sentence.get_spans('ner'):
spans.append(doc.char_span(entity.start_position, entity.end_position, entity.labels[0].value))
for token in spans:
if token: # exclude NoneType Entities from extraction which may be retrieved, known issue with the char_span fn
yield token.start, token.end, token.label
...
# declare and set Flair annotations
flair_annotator = heuristics.FunctionAnnotator("flair_annotator", flair_annotator)
docs = list(flair_annotator.pipe(docs))
...
from skweak.
Thanks, that's very useful indeed!
from skweak.
Related Issues (20)
- matcher annotator HOT 1
- Functionality to construct the detected span from start and end index HOT 1
- Converting .spacy files to conll format to train other models on it. HOT 5
- skweak.utils.docbin_reader always loads 'en_core_web_md' regardless which model was saved? HOT 2
- Support for loading any pre-trained model inside the 'Model Annotator' HOT 2
- Error in MultilabelNaiveBayes HOT 5
- SpanCategorizer HOT 1
- Custom NER model training HOT 2
- Support options in displacy.render
- minimal example not working HOT 3
- Does skweak use POS tags and lemma information to aggregate labels? HOT 1
- How to use the already available Label Matrix to train Skweak? HOT 1
- Step by step NER alternative 2 HOT 1
- Annotating whole sentences (without using regex) HOT 2
- Adding to the gazetteer annotator constrains HOT 1
- Is skweak being actively maintained and will be maintained? HOT 1
- How to import annotator in the annotator(doc)
- hmmlearn 0.3.0 HOT 1
- Update examples stepbystep
- How to use prefix ner tags with skweak aggregation.HMM HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from skweak.