Code Monkey home page Code Monkey logo

bert_language_understanding's People

Contributors

brightmart avatar claushellsing avatar yuanxiaosc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bert_language_understanding's Issues

question about learning rate

Hi, what is the learning rate when you train TextCNN(No-pretrain)?

MIDDLE SIZE DATASET(cail2018, 450k)

Model TextCNN(No-pretrain) TextCNN(Pretrain-Finetuning) Gain from pre-train
F1 Score after 1 epoch 0.09 0.58 0.49
F1 Score after 5 epoch 0.40 0.74 0.35
F1 Score after 7 epoch 0.44 0.75 0.31
F1 Score after 35 epoch 0.58 0.75 0.27
Training Loss at beginning 284.0 84.3 199.7
Validation Loss after 1 epoch 13.3 1.9 11.4
Validation Loss after 5 epoch 6.7 1.3 5.4
Training time(single gpu) 8h 2h 6h

error in run_classifier_predict_online.py

I'm trying to load a bert model for text classification. It works on the code of original bert batch prediction but I got an error on run_classifier_predict_online.py as following:

Caused by op 'save/Assign_200', defined at:
File "run_classifier_predict_online.py", line 351, in
saver = tf.train.Saver()
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1094, in init
self.build()
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1106, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1143, in _build
build_save=build_save, build_restore=build_restore)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 787, in _build_internal
restore_sequentially, reshape)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 428, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 119, in restore
self.op.get_shape().is_fully_defined())
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 221, in assign
validate_shape=validate_shape)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 61, in assign
use_locking=use_locking, name=name)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
op_def=op_def)
File "/home/xichen/.conda/envs/bert/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1768, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [2,768] rhs shape= [1152,768]
[[{{node save/Assign_200}} = Assign[T=DT_FLOAT, _class=["loc:@output_weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](output_weights, save/RestoreV2/_401)]]
[[{{node save/RestoreV2/_222}} = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_228_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

question on the implement of bert_cnn_model

Very interesting work which indicates Pre-training can be every where (small scale, cnn models).

I have a question on the implement of bert_cnn_model.py.
# 3. get last hidden state of the masked position(s), and project it to make a predict.
It is noticed that language model predicates the masked word in base_model.py.
I wonder why these lines are commented in bert_cnn_model.py? Performance?

Three undefined names

flake8 testing of https://github.com/brightmart/bert_language_understanding on Python 3.7.1

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./pretrain_task.py:260:21: F821 undefined name 'count'
            count = count + 1
                    ^
./model/bert_cnn_model.py:318:41: F821 undefined name 'batch_size'
            p_mask_lm=[i for i in range(batch_size)]
                                        ^
./model/bert_model.py:227:41: F821 undefined name 'batch_size'
            p_mask_lm=[i for i in range(batch_size)]
                                        ^
3     F821 undefined name 'batch_size'
3

pre-trained word embedding的问题

如果用了Tencent_AILab_ChineseEmbedding_100w.txt来做pre-trained word embedding,tf.assign之后不需要tf.stop_gradient吗?不然训练的时候embedding就被改了,难道只是起一下更好的初始化的作用?

关于执行pre_train命令的问题

pre-train masked language with BERT:
python train_bert_lm.py [DONE]
请问这里的 [DONE] 是什么意思,.py脚本后不需要跟文件路径么?望指点。

Bert_model doesn't work

Hi, I tried your bert_model rather than bert_cnn_model. Bert_model could get about 75% F1 score on language model task. But using the pretrained bert_model to finetune on classification task, it didn't work. F1 score was still about 10% after several epoches. It is something wrong with bert_model?

about pre_train

Hello, you said that the model pre-train effect means training on the basis of the Chinese pre-train model provided by Google, adding the data on hand, and then training the model?

different result when set lr to 0.001

I modify the learning rate to 0.001 and keep the other settings as default, then test on the same dataset without pretraining, my experiment results are quite different yours, pre-training can accelerate the convergence speed, however it may lead to a worse performance.

Epochl valid loss(my) valid F1(my) valid F1(this)
1 4.606 57.9 58.0
5 2.234 71.3 74.0
7 1.774 73.0 75.0
15 1.449 75.3 -
35 - - 75.0

bug

1.some parameters are not in config.py , such as self.sequence_length_lm、self.is_fine_tuning
2.self.ckpt_dir is not clear

About "scaled_dot_product_attention_batch"

Code implementation of Scaled Dot-Product Attention

2. dot product of Q,K

In your code, it is "dot_product = dot_product * (1.0 / tf.sqrt(tf.cast(self.d_model, tf.float32))) ".

I think that should be "dot_product = dot_product * (1.0 / tf.sqrt(tf.cast(self.d_model/h, tf.float32)))
" according to the paper.

the pre-trained MLM performance

Hi, I tried to use your bert_cnn_model to train my corpus, 90W sentences and 30w words, the average length of sentence is 30 after tokenization. But the model seems to stuck on local minial that the accuracy on validation set just fluctuates after first 5-epoch

image

using bert with unused/discarded data

Hi, I have a multiclass data of song lyrics(lets say 20 classes with total of 36000 text pieces of avg length 250 words ). The dataset contains some of the overlapping classes while some are not appropriate for my work. So i decided to merge or get somehow reasonable final set of lets say 8 classes made using lets say 14 classes. But data corresponds to 20 - 14 = 6 classes were discarded as it was not suitable for my work. So by BERT pre-trained models i can fine-tune on data consists of 8 classes (14 but some were condensed ). But how would i make use of remaining data of 6 classes. I read some where that it is good to fine-tune with model trained on data which is close to your domain but since BERT trained on wikipedia and some other which is very different from song lyrics. So does using pre-trained BERT model and then fine-tuning on my data gives me benefit or should I train the BERT from scratch on discarded data and then fine-tune it on my selected data.
Hope someone reply.
TIA

关于预训练模型设计

请问你是没有用bert 中的encoder -decoder 而是自己设计的CNN 卷积进行预训练和微调的吗?为什么呢,是因为原始的encoder- decoder 效果没有CNN 效果好吗

运行run_classifier_predict_online.py出错

我依次根据readme执行了python train_bert_lm.py和python train_bert_fine_tuning.py,而后运行run_classifier_predict_online.py出错(此时我将开源的chinese-bert-base中的bert_config.json和vocab.txt复制到了微调后保存的模型目录下)。
望大佬悉知,并耐心回复下解决方案,已经尝试了多种方式验证,都报同一个错。
报错信息如下:
INFO:tensorflow:Restoring parameters from ./checkpoint_finetuing_law200_bert/model.ckpt-1
2019-10-28 19:41:26.392664: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
Traceback (most recent call last):
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
return fn(*args)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
[[{{node save/RestoreV2}}]]
[[save/RestoreV2/_383]]
(1) Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
[[{{node save/RestoreV2}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\training\saver.py", line 1286, in restore
{self.saver_def.filename_tensor_name: save_path})
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
run_metadata_ptr)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1350, in _do_run
run_metadata)
File "E:\ProgramFiles\Anaconda3\envs\tf14_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
[[node save/RestoreV2 (defined at /stephen的个人文件夹/my_code/预训练语言模型(包括词向量)/代码/bert_language_understanding-master/run_classifier_predict_online.py:352) ]]
[[save/RestoreV2/_383]]
(1) Not found: Key bert/embeddings/LayerNorm/beta not found in checkpoint
[[node save/RestoreV2 (defined at /stephen的个人文件夹/my_code/预训练语言模型(包括词向量)/代码/bert_language_understanding-master/run_classifier_predict_online.py:352) ]]
0 successful operations.
0 derived errors ignored.

tokenize_style=char的问题

我利用网盘下载了中文语料,设置tokenize_style=char,在pretrain_task.py文件71行和232行:
string_list=[x for x in jieba.lcut(sentence.strip()) if x and x not in [""",":","、",",",")","("]]
string_list = [x for x in jieba.lcut(sentence.strip()) if x and x not in [""", ":", "、", ",", ")", "("]]
可能也需要根据开关设置不同的处理方式:
string_list = [x for x in sentence.strip() if x and x not in [""", ":", "、", ",", ")", "("]]

非常感谢你的工作。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.