Slot filling and intent detection tasks of spoken language understanding
- An implementation for "focus" part of the paper "Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding".
- An implementation of BLSTM-CRF based on jiesutd/NCRFpp
- An implementation of joint training of slot filling and intent detection tasks (Bing Liu and Ian Lane, 2016).
- Tutorials on ATIS and SNIPS datasets.
Setup
- pytorch 1.0
- python 3.6.x
- pip install gpustat [if gpu is used]
About the evaluations of intent detection on ATIS and SNIPS datasets.
As we can know from the datasets, ATIS may have multiple intents for one utterance while SNIPS has only one intent for one utterance. For example, "show me all flights and fares from denver to san francisco <=> atis_flight && atis_airfare". Therefore, there is a public trick in the training and evaluation stages for intent detection of ATIS dataset.
NOTE: Impacted by the paper "What is left to be understood in ATIS?", almost all works about ATIS choose the first intent as label to train a "softmax" intent classifier. In the evaluation stage, it will be viewed as correct if the predicted intent is one of the multiple intents.
TODO:
- Add char-embeddings
Tutorials A: Slot filling and intent detection with pretrained word embeddings
- Pretrained word embeddings are from CNN-BLSTM language models of ELMo where word embeddings are modelled by char-CNNs. We extract the pretrained word embeddings from atis and snips datasets by:
python3 scripts/get_ELMo_word_embedding_for_a_dataset.py \
--in_files data/atis-2/{train,valid,test} \
--output_word2vec local/word_embeddings/elmo_1024_cased_for_atis.txt
python3 scripts/get_ELMo_word_embedding_for_a_dataset.py \
--in_files data/snips/{train,valid,test} \
--output_word2vec local/word_embeddings/elmo_1024_cased_for_snips.txt
- Run scripts of training and evaluation at each epoch.
- BLSTM model:
bash run/atis_with_pretrained_word_embeddings.sh slot_tagger
bash run/snips_with_pretrained_word_embeddings.sh slot_tagger
- BLSTM-CRF model:
bash run/atis_with_pretrained_word_embeddings.sh slot_tagger_with_crf
bash run/snips_with_pretrained_word_embeddings.sh slot_tagger_with_crf
- Enc-dec focus model (BLSTM-LSTM), the same as Encoder-Decoder NN (with aligned inputs)(Liu and Lane, 2016):
bash run/atis_with_pretrained_word_embeddings.sh slot_tagger_with_focus
bash run/snips_with_pretrained_word_embeddings.sh slot_tagger_with_focus
ELMo
Tutorials B: Slot filling and intent detection with- Run scripts of training and evaluation at each epoch.
- BLSTM model:
bash run/atis_with_elmo.sh slot_tagger
bash run/snips_with_elmo.sh slot_tagger
- BLSTM-CRF model:
bash run/atis_with_elmo.sh slot_tagger_with_crf
bash run/snips_with_elmo.sh slot_tagger_with_crf
- Enc-dec focus model (BLSTM-LSTM), the same as Encoder-Decoder NN (with aligned inputs)(Liu and Lane, 2016):
bash run/atis_with_elmo.sh slot_tagger_with_focus
bash run/snips_with_elmo.sh slot_tagger_with_focus
BERT
Tutorials C: Slot filling and intent detection with- Model architectures:
- Joint BERT or "with pure bert":
- Our BERT + BLSTM (BLSTM-CRF\Enc-dec focus):
- Run scripts of training and evaluation at each epoch.
- BLSTM model:
bash run/atis_with_bert.sh slot_tagger
bash run/snips_with_bert.sh slot_tagger
- BLSTM-CRF model:
bash run/atis_with_bert.sh slot_tagger_with_crf
bash run/snips_with_bert.sh slot_tagger_with_crf
- Enc-dec focus model (BLSTM-LSTM), the same as Encoder-Decoder NN (with aligned inputs)(Liu and Lane, 2016):
bash run/atis_with_bert.sh slot_tagger_with_focus
bash run/snips_with_bert.sh slot_tagger_with_focus
XLNET [ToDo]
Tutorials C: Slot filling and intent detection withResults:
- For "NLU + BERT" model, hyper-parameters are not tuned carefully.
-
Results of ATIS:
models intent Acc (%) slot F1-score (%) [Atten. enc-dec NN with aligned inputs](Liu and Lane, 2016) 98.43 95.87 [Atten.-BiRNN](Liu and Lane, 2016) 98.21 95.98 [Enc-dec focus](Zhu and Yu, 2017) - 95.79 [Slot-Gated](Goo et al., 2018) 94.1 95.2 Intent Gating & self-attention 98.77 96.52 BLSTM-CRF + ELMo 97.42 95.62 Joint BERT 97.5 96.1 Joint BERT + CRF 97.9 96.0 BLSTM (A. Pre-train word emb.) 98.10 95.67 BLSTM-CRF (A. Pre-train word emb.) 98.54 95.39 Enc-dec focus (A. Pre-train word emb.) 98.43 95.78 BLSTM (B. +ELMo) 98.66 95.52 BLSTM-CRF (B. +ELMo) 98.32 95.62 Enc-dec focus (B. +ELMo) 98.66 95.70 BLSTM (C. +BERT) 99.10 95.94 -
Results of SNIPS:
-
Cased BERT-base model gives better result than uncased model.
models intent Acc (%) slot F1-score (%) [Slot-Gated](Goo et al., 2018) 97.0 88.8 BLSTM-CRF + ELMo 99.29 93.90 Joint BERT 98.6 97.0 Joint BERT + CRF 98.4 96.7 BLSTM (A. Pre-train word emb.) 99.14 95.75 BLSTM-CRF (A. Pre-train word emb.) 99.00 96.92 Enc-dec focus (A. Pre-train word emb.) 98.71 96.22 BLSTM (B. +ELMo) 98.71 96.32 BLSTM-CRF (B. +ELMo) 98.57 96.61 Enc-dec focus (B. +ELMo) 99.14 96.69 BLSTM (C. +BERT) 98.86 96.92 BLSTM-CRF (C. +BERT) 98.86 97.00 Enc-dec focus (C. +BERT) 98.71 97.17
Reference
- Su Zhu and Kai Yu, "Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding," in IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), 2017, pp. 5675-5679.