- A PyTorch implementation of a common NER model: BiLSTM-CRF with Character-Word-Capital-Affix-Level Feature.
- Tips: Some code are referred from ZhixiuYe/NER-pytorch.
- Paper: Bidirectional LSTM-CRF Models for Sequence Tagging - arXiv
- Paper: Deep Affix Features Improve Neural Named Entity Recognizers - ACL
- NER dataset: CoNLL-2003-English, it can be downloaded from release page.
- Pretrained word embedding: glove.6B.100d.txt, it can be downloaded from here.
- First, download eng.train, eng.testa, eng.testb, glove.6B.100d.txt and save to
./data/
folder. - To train, execute the following script. You can edit some parameters in train.py.
Tips: The evaluate script will ask you to have a perl interpreter, commonly it has been installed on Linux systems.
python3 train.py \
--output_mapping ./output/mapping.pkl \
--output_affix_list ./output/affix_list.json \
--use_crf 1 \
--add_cap_feature 1 \
--add_affix_feature 1 \
--use_gpu 1 \
--model_path ./model