Code Monkey home page Code Monkey logo

eznlp's Introduction

Easy Natural Language Processing

eznlp is a PyTorch-based package for neural natural language processing, currently supporting:

  • Text Classification
  • Named Entity Recognition
    • Sequence Tagging
    • Span Classification
    • Boundary Selection
  • Relation Extraction
  • Attribute Extraction
  • Machine Translation
  • Image Captioning

Experimental Results

Text Classification

Dataset Language Our Best Acc. Model
IMDb English 95.78 RoBERTa-base + Attention
Yelp Full English 71.55 RoBERTa-base + Attention
Yelp 2013 English 70.80 RoBERTa-base + Attention
ChnSentiCorp Chinese 95.83 BERT-base + Attention
THUCNews-10 Chinese 98.98 RoBERTa-base + Attention

See Text Classification for more details.

Named Entity Recognition

Dataset Language Our Best F1 Model
CoNLL 2003 English 93.26 RoBERTa-large + LSTM + CRF
OntoNotes 5 English 91.05 RoBERTa-base + LSTM + CRF
MSRA Chinese 96.18 BERT + LSTM + CRF
WeiboNER v2 Chinese 70.48 BERT + LSTM + CRF
ResumeNER Chinese 95.97 BERT + LSTM + CRF
OntoNotes 4 Chinese 82.29 BERT + LSTM + CRF
OntoNotes 5 Chinese 80.31 BERT + LSTM + CRF

See Named Entity Recognition for more details.

Relation Extraction

Dataset Language Our Best F1
(Ent / Rel / Rel+)
Model
CoNLL 2004 English 89.17 / - / 75.03 SpERT (w/ RoBERTa-base + LSTM)
SciERC English 69.29 / 48.93 / 36.65 SpERT (w/ RoBERTa-base)

See Relation Extraction for more details.

Installation

With pip

$ pip install eznlp

From source

$ python setup.py sdist
$ pip install dist/eznlp-<version>.tar.gz

Running the Code

Text classification

$ python scripts/text_classification.py --dataset <dataset> [options]

Entity recognition

$ python scripts/entity_recognition.py --dataset <dataset> [options]

Relation extraction

$ python scripts/relation_extraction.py --dataset <dataset> [options]

Attribute extraction

$ python scripts/attribute_extraction.py --dataset <dataset> [options]

Future Plans

  • Unify the data interchange format as a dict, i.e., entry
  • Reorganize JsonIO
  • Memory optimization for large dataset for training PLM
  • More relation extraction models
  • Multihot classification
  • Unify the aggregation interface of pooling and attention
  • Radical-level features
  • Data augmentation
  • Loss increases in later training phases -> LR finder?

eznlp's People

Contributors

syuoni avatar

Stargazers

Hanxiong Huang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.