Comments (6)
I don't seem to manage to reproduce the error unfortunately. What is the version of spacy
and skweak
that you are using? Is it perhaps the case that the train.spacy
data has been generated by an older version that is no longer compatible?
from skweak.
Thanks for the quick reply! I'm using the latest versions.
skweak==0.3.1
spacy==3.2.3
from skweak.
Could you let me know if the example on https://github.com/NorskRegnesentral/skweak/blob/main/examples/quick_start.ipynb works for you (including in particular the last part that runs the spacy training script)?
from skweak.
Hi, the example skript works for me, apart from the last line of code, where the spacy model is trained. This part is not finishing.
As for my classification model, I assume that there are conflicts between my labelling functions as some of them might overlap and label the same spans. Maybe that could be why the writing to spacy doc results in the error : "ValueError: [E1010] Unable to set entity information for token 10 which is included in more than one span in entities, blocked, missing or outside." I resolved this by filtering these conflicts out. The error unhashable list still prevails even when run on a different machine.
from skweak.
I don't really know what might cause this problem, unfortunately. Could you send me a minimal example I could test?
from skweak.
Hi, I don't know if it may be related, but I have the same error : TypeError: unhashable type: 'list'
Here is how I obtain the error:
- annotate a bunch of spacy docs using a combined annotator
- add annotated docs to a DocBin object
- save the DocBin object to disk using DocBin.to_disk
Then in an other script:
- load the DocBin using DocBin.from_disk
- get the docs back using docs = list(db.get_docs(nlp.vocab)) -> Error
It seems that the error is caused by voting.MajorityVoter, since I do not have the error when removing the majority voter from my pipeline.
Here is the full trace
Traceback (most recent call last):
File "fit_model.py", line 48, in <module>
docs = get_docs(db_path)
File "fit_model.py", line 29, in get_docs
docs = list(db.get_docs(nlp.vocab))
File "/home/leguilln/workspace/nlp/corpus_annotation/skweak-corpus-annot/src/skweak-env/lib/python3.8/site-packages/spacy/tokens/_serialize.py", line 152, in get_docs
doc.spans.from_bytes(self.span_groups[i])
File "/home/leguilln/workspace/nlp/corpus_annotation/skweak-corpus-annot/src/skweak-env/lib/python3.8/site-packages/spacy/tokens/_dict_proxies.py", line 96, in from_bytes
group = SpanGroup(doc).from_bytes(value_bytes)
File "spacy/tokens/span_group.pyx", line 223, in spacy.tokens.span_group.SpanGroup.from_bytes
File "/home/leguilln/workspace/nlp/corpus_annotation/skweak-corpus-annot/src/skweak-env/lib/python3.8/site-packages/srsly/_msgpack_api.py", line 27, in msgpack_loads
msg = msgpack.loads(data, raw=False, use_list=use_list)
File "/home/leguilln/workspace/nlp/corpus_annotation/skweak-corpus-annot/src/skweak-env/lib/python3.8/site-packages/srsly/msgpack/__init__.py", line 79, in unpackb
return _unpackb(packed, **kwargs)
File "srsly/msgpack/_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb
TypeError: unhashable type: 'list'
from skweak.
Related Issues (20)
- matcher annotator HOT 1
- Functionality to construct the detected span from start and end index HOT 1
- Converting .spacy files to conll format to train other models on it. HOT 5
- skweak.utils.docbin_reader always loads 'en_core_web_md' regardless which model was saved? HOT 2
- Support for loading any pre-trained model inside the 'Model Annotator' HOT 2
- Error in MultilabelNaiveBayes HOT 5
- SpanCategorizer HOT 1
- Custom NER model training HOT 2
- Support options in displacy.render
- minimal example not working HOT 3
- Does skweak use POS tags and lemma information to aggregate labels? HOT 1
- How to use the already available Label Matrix to train Skweak? HOT 1
- Step by step NER alternative 2 HOT 1
- Annotating whole sentences (without using regex) HOT 2
- Adding to the gazetteer annotator constrains HOT 1
- Is skweak being actively maintained and will be maintained? HOT 1
- How to import annotator in the annotator(doc)
- hmmlearn 0.3.0 HOT 1
- Update examples stepbystep
- How to use prefix ner tags with skweak aggregation.HMM HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from skweak.