brightmart / bert_language_understanding Goto Github PK
View Code? Open in Web Editor NEWPre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN
Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN
Hi, what is the learning rate when you train TextCNN(No-pretrain)?
MIDDLE SIZE DATASET(cail2018, 450k)
Model | TextCNN(No-pretrain) | TextCNN(Pretrain-Finetuning) | Gain from pre-train |
---|---|---|---|
F1 Score after 1 epoch | 0.09 | 0.58 | 0.49 |
F1 Score after 5 epoch | 0.40 | 0.74 | 0.35 |
F1 Score after 7 epoch | 0.44 | 0.75 | 0.31 |
F1 Score after 35 epoch | 0.58 | 0.75 | 0.27 |
Training Loss at beginning | 284.0 | 84.3 | 199.7 |
Validation Loss after 1 epoch | 13.3 | 1.9 | 11.4 |
Validation Loss after 5 epoch | 6.7 | 1.3 | 5.4 |
Training time(single gpu) | 8h | 2h | 6h |
I'm trying to load a bert model for text classification. It works on the code of original bert batch prediction but I got an error on run_classifier_predict_online.py as following:
Caused by op 'save/Assign_200', defined at:
File "run_classifier_predict_online.py", line 351, in
saver = tf.train.Saver()
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1094, in init
self.build()
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1106, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1143, in _build
build_save=build_save, build_restore=build_restore)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 787, in _build_internal
restore_sequentially, reshape)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 119, in restore
self.op.get_shape().is_fully_defined())
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 221, in assign
validate_shape=validate_shape)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 61, in assign
use_locking=use_locking, name=name)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
op_def=op_def)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1768, in init
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [2,768] rhs shape= [1152,768]
[[{{node save/Assign_200}} = Assign[T=DT_FLOAT, _class=["loc:@output_weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](output_weights, save/RestoreV2/_401)]]
[[{{node save/RestoreV2/_222}} = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_228_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Very interesting work which indicates Pre-training can be every where (small scale, cnn models).
I have a question on the implement of bert_cnn_model.py
.
# 3. get last hidden state of the masked position(s), and project it to make a predict.
It is noticed that language model predicates the masked word in base_model.py
.
I wonder why these lines are commented in bert_cnn_model.py
? Performance?
flake8 testing of https://github.com/brightmart/bert_language_understanding on Python 3.7.1
$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
./pretrain_task.py:260:21: F821 undefined name 'count'
count = count + 1
^
./model/bert_cnn_model.py:318:41: F821 undefined name 'batch_size'
p_mask_lm=[i for i in range(batch_size)]
^
./model/bert_model.py:227:41: F821 undefined name 'batch_size'
p_mask_lm=[i for i in range(batch_size)]
^
3 F821 undefined name 'batch_size'
3
如果用了Tencent_AILab_ChineseEmbedding_100w.txt来做pre-trained word embedding,tf.assign之后不需要tf.stop_gradient吗?不然训练的时候embedding就被改了,难道只是起一下更好的初始化的作用?
pre-train masked language with BERT:
python train_bert_lm.py [DONE]
请问这里的 [DONE] 是什么意思,.py脚本后不需要跟文件路径么?望指点。
Hi, I tried your bert_model rather than bert_cnn_model. Bert_model could get about 75% F1 score on language model task. But using the pretrained bert_model to finetune on classification task, it didn't work. F1 score was still about 10% after several epoches. It is something wrong with bert_model?
Hello, you said that the model pre-train effect means training on the basis of the Chinese pre-train model provided by Google, adding the data on hand, and then training the model?
I modify the learning rate to 0.001 and keep the other settings as default, then test on the same dataset without pretraining, my experiment results are quite different yours, pre-training can accelerate the convergence speed, however it may lead to a worse performance.
Epochl | valid loss(my) | valid F1(my) | valid F1(this) |
---|---|---|---|
1 | 4.606 | 57.9 | 58.0 |
5 | 2.234 | 71.3 | 74.0 |
7 | 1.774 | 73.0 | 75.0 |
15 | 1.449 | 75.3 | - |
35 | - | - | 75.0 |
https://github.com/brightmart/bert_language_understanding#performance
What is the final f1 performance?
Thank you!
1.some parameters are not in config.py , such as self.sequence_length_lm、self.is_fine_tuning
2.self.ckpt_dir is not clear
Thank you!
Do we need to concern the accuracy of mask word prediction?
What value do we need to judge the performance of mask word prediction?
@brightmart
Do you think that it is mainly because:
BERT is Bidirectional, and CNN could also have the same function?
@brightmart Thank you!
Code implementation of Scaled Dot-Product Attention
In your code, it is "dot_product = dot_product * (1.0 / tf.sqrt(tf.cast(self.d_model, tf.float32))) ".
I think that should be "dot_product = dot_product * (1.0 / tf.sqrt(tf.cast(self.d_model/h, tf.float32)))
" according to the paper.
Hi, I have a multiclass data of song lyrics(lets say 20 classes with total of 36000 text pieces of avg length 250 words ). The dataset contains some of the overlapping classes while some are not appropriate for my work. So i decided to merge or get somehow reasonable final set of lets say 8 classes made using lets say 14 classes. But data corresponds to 20 - 14 = 6 classes were discarded as it was not suitable for my work. So by BERT pre-trained models i can fine-tune on data consists of 8 classes (14 but some were condensed ). But how would i make use of remaining data of 6 classes. I read some where that it is good to fine-tune with model trained on data which is close to your domain but since BERT trained on wikipedia and some other which is very different from song lyrics. So does using pre-trained BERT model and then fine-tuning on my data gives me benefit or should I train the BERT from scratch on discarded data and then fine-tune it on my selected data.
Hope someone reply.
TIA
请问你是没有用bert 中的encoder -decoder 而是自己设计的CNN 卷积进行预训练和微调的吗?为什么呢,是因为原始的encoder- decoder 效果没有CNN 效果好吗
我依次根据readme执行了python train_bert_lm.py和python train_bert_fine_tuning.py,而后运行run_classifier_predict_online.py出错(此时我将开源的chinese-bert-base中的bert_config.json和vocab.txt复制到了微调后保存的模型目录下)。
望大佬悉知,并耐心回复下解决方案,已经尝试了多种方式验证,都报同一个错。
报错信息如下:
INFO:tensorflow:Restoring parameters from ./checkpoint_finetuing_law200_bert/model.ckpt-1
2019-10-28 19:41:26.392664: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
Traceback (most recent call last):
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
return fn(*args)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
[[{{node save/RestoreV2}}]]
[[save/RestoreV2/_383]]
(1) Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
[[{{node save/RestoreV2}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\training\saver.py", line 1286, in restore
{self.saver_def.filename_tensor_name: save_path})
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
run_metadata_ptr)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run
run_metadata)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
[[node save/RestoreV2 (defined at /stephen的个人文件夹/my_code/预训练语言模型(包括词向量)/代码/bert_language_understanding-master/run_classifier_predict_online.py:352) ]]
[[save/RestoreV2/_383]]
(1) Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
[[node save/RestoreV2 (defined at /stephen的个人文件夹/my_code/预训练语言模型(包括词向量)/代码/bert_language_understanding-master/run_classifier_predict_online.py:352) ]]
0 successful operations.
0 derived errors ignored.
modeling.BertConfig
modeling.BertModel
error
我利用网盘下载了中文语料,设置tokenize_style=char,在pretrain_task.py文件71行和232行:
string_list=[x for x in jieba.lcut(sentence.strip()) if x and x not in [""",":","、",",",")","("]]
string_list = [x for x in jieba.lcut(sentence.strip()) if x and x not in [""", ":", "、", ",", ")", "("]]
可能也需要根据开关设置不同的处理方式:
string_list = [x for x in sentence.strip() if x and x not in [""", ":", "、", ",", ")", "("]]
非常感谢你的工作。
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.