demo ็ซ้ข https://github.com/sinlin0908/QAmodel_Demo
- QAnet by google
- QAnet Code by NLP_learn
- Data Set by Delta
- Embedding data set by Chinese-Word-Vectors
- OS: Ubuntu 18.04 LTS
- GPU: GTX 1080Ti 11G
- CPU: i7-4770
- RAM: 16G
- Python 3.6
- NumPy
- tqdm
- Tensorflow>=1.5
- Jieba
- opencc
- bottle
- train set : 26936 questions
- dev set : 3524 questions
- test 3493 : questions
- word token: use
jieba.cut(context,cut_all=False)
_getword()
deleteword.lower(), word.capitalize(), word.upper()
- use 1292607 words 300d embedding data set
- use 14082 characters 300d character embedding data set
- word size: 1292607
- hidden size: 128
- num_head: 8
- batch size: 12
- char_emb_size : 300d
- pretrain_char -> True
python config.py --mode prepro
train
python config.py --mode train
test
python config.py --mode test
demo:
python config.py --mode demo
- F1: score 70.0496230556
- EM: 70.0257658173
- cost: 6 hours
- use GPU memory : 9.4G
tensorboard --logdir=./
number | hidden size | attention head | step | data size | word embedding size | F1 | EM |
---|---|---|---|---|---|---|---|
1 | 96 | 1 | 60000 | 15320 | 636086 | 51 | 51 |
2 | 96 | 1 | 60000 | 26936 | 636086 | 63 | 63 |
3 | 128 | 8 | 60000 | 26936 | 1292607 | 70 | 70 |
4 | 128 | 8 | 150000 | 26936 | 1292607 | 69 | 69 |
notice: character embedding has a little effect