Light

hhx1999 / eznlp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from syuoni/eznlp

1.0 0.0 0.0 2.15 MB

Easy Natural Language Processing

License: Apache License 2.0

Python 97.16% Scheme 2.84%

eznlp's Introduction

Easy Natural Language Processing

eznlp is a PyTorch-based package for neural natural language processing, currently supporting:

Text Classification
Named Entity Recognition
- Sequence Tagging
- Span Classification
- Boundary Selection
Relation Extraction
Attribute Extraction
Machine Translation
Image Captioning

Experimental Results

Text Classification

Dataset	Language	Our Best Acc.	Model
IMDb	English	95.78	RoBERTa-base + Attention
Yelp Full	English	71.55	RoBERTa-base + Attention
Yelp 2013	English	70.80	RoBERTa-base + Attention
ChnSentiCorp	Chinese	95.83	BERT-base + Attention
THUCNews-10	Chinese	98.98	RoBERTa-base + Attention

See Text Classification for more details.

Named Entity Recognition

Dataset	Language	Our Best F1	Model
CoNLL 2003	English	93.26	RoBERTa-large + LSTM + CRF
OntoNotes 5	English	91.05	RoBERTa-base + LSTM + CRF
MSRA	Chinese	96.18	BERT + LSTM + CRF
WeiboNER v2	Chinese	70.48	BERT + LSTM + CRF
ResumeNER	Chinese	95.97	BERT + LSTM + CRF
OntoNotes 4	Chinese	82.29	BERT + LSTM + CRF
OntoNotes 5	Chinese	80.31	BERT + LSTM + CRF

See Named Entity Recognition for more details.

Relation Extraction

Dataset	Language	Our Best F1 (Ent / Rel / Rel+)	Model
CoNLL 2004	English	89.17 / - / 75.03	SpERT (w/ RoBERTa-base + LSTM)
SciERC	English	69.29 / 48.93 / 36.65	SpERT (w/ RoBERTa-base)

See Relation Extraction for more details.

Installation

With `pip`

$ pip install eznlp

From source

$ python setup.py sdist
$ pip install dist/eznlp-<version>.tar.gz

Running the Code

Text classification

$ python scripts/text_classification.py --dataset <dataset> [options]

Entity recognition

$ python scripts/entity_recognition.py --dataset <dataset> [options]

Relation extraction

$ python scripts/relation_extraction.py --dataset <dataset> [options]

Attribute extraction

$ python scripts/attribute_extraction.py --dataset <dataset> [options]

Future Plans

Unify the data interchange format as a dict, i.e., entry
Reorganize JsonIO
Memory optimization for large dataset for training PLM
More relation extraction models
Multihot classification
Unify the aggregation interface of pooling and attention
Radical-level features
Data augmentation
Loss increases in later training phases -> LR finder?

eznlp's People

Contributors

Stargazers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.