eznlp
is a PyTorch
-based package for neural natural language processing, currently supporting:
- Text Classification
- Named Entity Recognition
- Sequence Tagging
- Span Classification
- Boundary Selection
- Relation Extraction
- Attribute Extraction
- Machine Translation
- Image Captioning
Dataset | Language | Our Best Acc. | Model |
---|---|---|---|
IMDb | English | 95.78 | RoBERTa-base + Attention |
Yelp Full | English | 71.55 | RoBERTa-base + Attention |
Yelp 2013 | English | 70.80 | RoBERTa-base + Attention |
ChnSentiCorp | Chinese | 95.83 | BERT-base + Attention |
THUCNews-10 | Chinese | 98.98 | RoBERTa-base + Attention |
See Text Classification for more details.
Dataset | Language | Our Best F1 | Model |
---|---|---|---|
CoNLL 2003 | English | 93.26 | RoBERTa-large + LSTM + CRF |
OntoNotes 5 | English | 91.05 | RoBERTa-base + LSTM + CRF |
MSRA | Chinese | 96.18 | BERT + LSTM + CRF |
WeiboNER v2 | Chinese | 70.48 | BERT + LSTM + CRF |
ResumeNER | Chinese | 95.97 | BERT + LSTM + CRF |
OntoNotes 4 | Chinese | 82.29 | BERT + LSTM + CRF |
OntoNotes 5 | Chinese | 80.31 | BERT + LSTM + CRF |
See Named Entity Recognition for more details.
Dataset | Language | Our Best F1 (Ent / Rel / Rel+) |
Model |
---|---|---|---|
CoNLL 2004 | English | 89.17 / - / 75.03 | SpERT (w/ RoBERTa-base + LSTM) |
SciERC | English | 69.29 / 48.93 / 36.65 | SpERT (w/ RoBERTa-base) |
See Relation Extraction for more details.
$ pip install eznlp
$ python setup.py sdist
$ pip install dist/eznlp-<version>.tar.gz
$ python scripts/text_classification.py --dataset <dataset> [options]
$ python scripts/entity_recognition.py --dataset <dataset> [options]
$ python scripts/relation_extraction.py --dataset <dataset> [options]
$ python scripts/attribute_extraction.py --dataset <dataset> [options]
- Unify the data interchange format as a dict, i.e.,
entry
- Reorganize
JsonIO
- Memory optimization for large dataset for training PLM
- More relation extraction models
- Multihot classification
- Unify the aggregation interface of pooling and attention
- Radical-level features
- Data augmentation
- Loss increases in later training phases -> LR finder?