brightmart / bert_language_understanding Goto Github PK

Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN

Python 100.00%

attention-is-all-you-need bert-model document-classification fasttext language-model language-understanding nlp pre-training question-answering self-attention text-classification textcnn transfer-learning transformer-encoder

bert_language_understanding's People

Contributors

Stargazers

Watchers

Forkers

george86028 moolighty leeyifu zhouyonglong johndpope cclauss bgfurfeature web199195 qgzang crescentluna wxyhv linhanxiao shugao0810 ewrfcas nick-2008 qrsrjm jangqh loretoparisi allensmile shafiahmed lxwithgod devhttps eminemrain lidhcs pankajmehar chanrom wurentidai vatile sid7954 ahashisyuu wangxuekui zhonghai2810 niklas88 super-louis fulquan yutaoxxx liu-nlper zhongyunuestc nicemartin currylym charlottesean tuozhanjun godfreyqiang wushicanasl xiaofengzhou wangkanger shubhampachori12110095 hoangcuong2011 bikramkhastgir yuhonghong66 wqw123 bihui9968 nipengmath caibinbupt microw iszhuangsha xxcharles erichan2046 mjc14 yuanjie-ai chaoyue729 tifoit yueping123 milariba liclone lw00245 jasperyang thesharmanitish yxk9810 siddarthkm cutecha tiffen pdkyll klausondrag usccolumbia shihuaxing pengxy jz3707 lbda1 saurabhkulkarni77 zjms flysky1991 lhhriver haofengrushui204 eggachecat mrsun15 alucardmini anhlbt pravitc thomascx foosynaptic eathoublu hauwenc b2220333 etrigger bibin-sebastian xingchengxu pvk444 sulasen tcxdgit

bert_language_understanding's Issues

question about learning rate

Hi, what is the learning rate when you train TextCNN(No-pretrain)?

MIDDLE SIZE DATASET(cail2018, 450k)

Model	TextCNN(No-pretrain)	TextCNN(Pretrain-Finetuning)	Gain from pre-train
F1 Score after 1 epoch	0.09	0.58	0.49
F1 Score after 5 epoch	0.40	0.74	0.35
F1 Score after 7 epoch	0.44	0.75	0.31
F1 Score after 35 epoch	0.58	0.75	0.27
Training Loss at beginning	284.0	84.3	199.7
Validation Loss after 1 epoch	13.3	1.9	11.4
Validation Loss after 5 epoch	6.7	1.3	5.4
Training time(single gpu)	8h	2h	6h

error in run_classifier_predict_online.py

I'm trying to load a bert model for text classification. It works on the code of original bert batch prediction but I got an error on run_classifier_predict_online.py as following:

Caused by op 'save/Assign_200', defined at:
File "run_classifier_predict_online.py", line 351, in
saver = tf.train.Saver()
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1094, in init
self.build()
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1106, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1143, in _build
build_save=build_save, build_restore=build_restore)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 787, in _build_internal
restore_sequentially, reshape)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 119, in restore
self.op.get_shape().is_fully_defined())
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 221, in assign
validate_shape=validate_shape)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 61, in assign
use_locking=use_locking, name=name)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
op_def=op_def)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1768, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [2,768] rhs shape= [1152,768]
[[{{node save/Assign_200}} = Assign[T=DT_FLOAT, _class=["loc:@output_weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](output_weights, save/RestoreV2/_401)]]
[[{{node save/RestoreV2/_222}} = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_228_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

question on the implement of bert_cnn_model

Very interesting work which indicates Pre-training can be every where (small scale, cnn models).

I have a question on the implement of bert_cnn_model.py.
# 3. get last hidden state of the masked position(s), and project it to make a predict.
It is noticed that language model predicates the masked word in base_model.py.
I wonder why these lines are commented in bert_cnn_model.py? Performance?

Three undefined names

flake8 testing of https://github.com/brightmart/bert_language_understanding on Python 3.7.1

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./pretrain_task.py:260:21: F821 undefined name 'count'
            count = count + 1
                    ^
./model/bert_cnn_model.py:318:41: F821 undefined name 'batch_size'
            p_mask_lm=[i for i in range(batch_size)]
                                        ^
./model/bert_model.py:227:41: F821 undefined name 'batch_size'
            p_mask_lm=[i for i in range(batch_size)]
                                        ^
3     F821 undefined name 'batch_size'
3

pre-trained word embedding的问题

如果用了Tencent_AILab_ChineseEmbedding_100w.txt来做pre-trained word embedding，tf.assign之后不需要tf.stop_gradient吗？不然训练的时候embedding就被改了，难道只是起一下更好的初始化的作用？

关于执行pre_train命令的问题

pre-train masked language with BERT:
python train_bert_lm.py [DONE]
请问这里的 [DONE] 是什么意思，.py脚本后不需要跟文件路径么？望指点。

Bert_model doesn't work

Hi, I tried your bert_model rather than bert_cnn_model. Bert_model could get about 75% F1 score on language model task. But using the pretrained bert_model to finetune on classification task, it didn't work. F1 score was still about 10% after several epoches. It is something wrong with bert_model?

你的textcnn实验，只要把学习率调大一个数量级，没有预训练也可以得到一样的结果

about pre_train

Hello, you said that the model pre-train effect means training on the basis of the Chinese pre-train model provided by Google, adding the data on hand, and then training the model?

different result when set lr to 0.001

I modify the learning rate to 0.001 and keep the other settings as default, then test on the same dataset without pretraining, my experiment results are quite different yours, pre-training can accelerate the convergence speed, however it may lead to a worse performance.

Epochl	valid loss(my)	valid F1(my)	valid F1(this)
1	4.606	57.9	58.0
5	2.234	71.3	74.0
7	1.774	73.0	75.0
15	1.449	75.3	-
35	-	-	75.0

What is the final f1 performance?

https://github.com/brightmart/bert_language_understanding#performance

What is the final f1 performance?

Thank you!

bug

1.some parameters are not in config.py , such as self.sequence_length_lm、self.is_fine_tuning
2.self.ckpt_dir is not clear

What is the accuracy of mask word prediction?

Thank you!
Do we need to concern the accuracy of mask word prediction？
What value do we need to judge the performance of mask word prediction?
@brightmart

The reason why pretrain does work.

Do you think that it is mainly because:
BERT is Bidirectional, and CNN could also have the same function?
@brightmart Thank you!

About "scaled_dot_product_attention_batch"

Code implementation of Scaled Dot-Product Attention

2. dot product of Q,K

In your code, it is "dot_product = dot_product * (1.0 / tf.sqrt(tf.cast(self.d_model, tf.float32))) ".

I think that should be "dot_product = dot_product * (1.0 / tf.sqrt(tf.cast(self.d_model/h, tf.float32)))
" according to the paper.

the pre-trained MLM performance

Hi, I tried to use your bert_cnn_model to train my corpus, 90W sentences and 30w words, the average length of sentence is 30 after tokenization. But the model seems to stuck on local minial that the accuracy on validation set just fluctuates after first 5-epoch

using bert with unused/discarded data

Hi, I have a multiclass data of song lyrics(lets say 20 classes with total of 36000 text pieces of avg length 250 words ). The dataset contains some of the overlapping classes while some are not appropriate for my work. So i decided to merge or get somehow reasonable final set of lets say 8 classes made using lets say 14 classes. But data corresponds to 20 - 14 = 6 classes were discarded as it was not suitable for my work. So by BERT pre-trained models i can fine-tune on data consists of 8 classes (14 but some were condensed ). But how would i make use of remaining data of 6 classes. I read some where that it is good to fine-tune with model trained on data which is close to your domain but since BERT trained on wikipedia and some other which is very different from song lyrics. So does using pre-trained BERT model and then fine-tuning on my data gives me benefit or should I train the BERT from scratch on discarded data and then fine-tune it on my selected data.
Hope someone reply.
TIA

关于预训练模型设计

请问你是没有用bert 中的encoder -decoder 而是自己设计的CNN 卷积进行预训练和微调的吗？为什么呢，是因为原始的encoder- decoder 效果没有CNN 效果好吗

运行run_classifier_predict_online.py出错

我依次根据readme执行了python train_bert_lm.py和python train_bert_fine_tuning.py，而后运行run_classifier_predict_online.py出错（此时我将开源的chinese-bert-base中的bert_config.json和vocab.txt复制到了微调后保存的模型目录下）。
望大佬悉知，并耐心回复下解决方案，已经尝试了多种方式验证，都报同一个错。
报错信息如下：
INFO:tensorflow:Restoring parameters from ./checkpoint_finetuing_law200_bert/model.ckpt-1
2019-10-28 19:41:26.392664: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
Traceback (most recent call last):
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
return fn(*args)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
[[{{node save/RestoreV2}}]]
[[save/RestoreV2/_383]]
(1) Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
[[{{node save/RestoreV2}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\training\saver.py", line 1286, in restore
{self.saver_def.filename_tensor_name: save_path})
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
run_metadata_ptr)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run
run_metadata)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
[[node save/RestoreV2 (defined at /stephen的个人文件夹/my_code/预训练语言模型(包括词向量)/代码/bert_language_understanding-master/run_classifier_predict_online.py:352) ]]
[[save/RestoreV2/_383]]
(1) Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
[[node save/RestoreV2 (defined at /stephen的个人文件夹/my_code/预训练语言模型(包括词向量)/代码/bert_language_understanding-master/run_classifier_predict_online.py:352) ]]
0 successful operations.
0 derived errors ignored.

run_classifier_predict_online.py

modeling.BertConfig
modeling.BertModel

error

tokenize_style=char的问题

我利用网盘下载了中文语料，设置tokenize_style=char，在pretrain_task.py文件71行和232行：
string_list=[x for x in jieba.lcut(sentence.strip()) if x and x not in [""","：","、","，","）","（"]]
string_list = [x for x in jieba.lcut(sentence.strip()) if x and x not in [""", "：", "、", "，", "）", "（"]]
可能也需要根据开关设置不同的处理方式：
string_list = [x for x in sentence.strip() if x and x not in [""", "：", "、", "，", "）", "（"]]

非常感谢你的工作。