bohanli / bert-flow Goto Github PK

View Code? Open in Web Editor NEW

522.0 522.0 67.0 281 KB

TensorFlow implementation of On the Sentence Embeddings from Pre-trained Language Models (EMNLP 2020)

License: Apache License 2.0

Python 97.82% Shell 2.18%

bert-flow's People

Contributors

Stargazers

Watchers

Forkers

qianrenjian david-yoon vxacezxcv hell-to-heaven lbxcfx xrosliang psyxusheng cdj0311 gshan4056 sunnyhuma171 sndychvn chenghuige baokui archfool like-hui zhongyunuestc courierkyn gpwner franklwl qsong4 alexlimh tianxiaguixin002 danny305 yolosh2020 kornwtp visonew ljwmusclenlper shu-123 yueyingshuo daishu7 supermc5657 cronopioelectronico binliang-nlp chunningdu hackerapple tayeechang shenyong123 houpanpan cytsinghua haojiepan1 beijinggao qiuwenbogdut fangzheng354 ryanachi qingzwang markwjj qzlydao 77216384 yugenlgy zzisme grasshourse cleverxiaohong frankfanslc shiztong laurasanchz2 helenzwhu rayciel jxzhangjhu joseph-chan mercedeslv igoslow lunnada huoshanfei w32zhong hellonlp bjleah wwwjwww

bert-flow's Issues

Which code show that ﬂow network is optimized while the BERT parameters remain unchanged？

Is there a pytorch version coeds in the future?

predict 的问题

感谢大佬的工作，请问在predict的时候，只有BERT加载训练好的参数，而flow部分虽然之前进行了无监督训练，但是predict的时候似乎并没有加载训好的参数？

部分参数加载如下：
INFO:tensorflow: name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = bert/pooler/dense/kernel:0, shape = (768, 768), INIT_FROM_CKPT
INFO:tensorflow: name = bert/pooler/dense/bias:0, shape = (768,), INIT_FROM_CKPT

flow部分并没有加载训好的参数？
INFO:tensorflow: name = bert/flow/revnet_0/revnet_step_0/actnorm/actnorm_center/b:0, shape = (1, 1, 1, 64)
INFO:tensorflow: name = bert/flow/revnet_0/revnet_step_0/actnorm/actnorm_scale/logs:0, shape = (1, 1, 1, 64)
INFO:tensorflow: name = bert/flow/revnet_0/revnet_step_0/additive/nn/conv_block/1_1/W:0, shape = (1, 1, 32, 32)
INFO:tensorflow: name = bert/flow/revnet_0/revnet_step_0/additive/nn/conv_block/1_1/actnorm/actnorm_center/b:0, shape = (1, 1, 1, 32)
INFO:tensorflow: name = bert/flow/revnet_0/revnet_step_0/additive/nn/conv_block/1_1/actnorm/actnorm_scale/logs:0, shape = (1, 1, 1, 32)
INFO:tensorflow: name = bert/flow/revnet_0/revnet_step_0/additive/nn/conv_block/1_2/W:0, shape = (1, 1, 32, 32)

How can I get a single sentence embedding？

My dataset only has one sentence a line,How Can I get its embedding? Thanks in advance.

do you have pytorch implement?

训练损失为负数，

感觉作者训练有造假的嫌疑，首先
ops = [glow_ops.get_variable_ddi, glow_ops.actnorm, glow_ops.get_dropout]
encoder = glow_ops.encoder_decoder
self.z, encoder_objective, self.eps, _, _ = encoder(
"flow", x, self.hparams, eps=None, reverse=False,init=init)
objective += encoder_objective
这时的objective 损失为正，原因是其实就是在变换过程中antocrm的参数以及分割向量后经过正态分布密度函数后的log_prob的概率负数，但是作者确实使用+=，有待商榷，参数总和可能为正也可能为负，但是log_prob一定为负，由于概率再0-1之间，
self.z_top_shape =self.z.shape
prior_dist = self.top_prior()
prior_objective =paddle.sum(
prior_dist.log_prob(self.z), axis=[1, 2, 3])
#self.z_sample = prior_dist.sample()
objective += prior_objective
所以这个为负数就一点也不奇怪，所以作者的试验效果好，确实需要验证，我正在验证中

求作者出来解释一下

各位大佬，有监督和无监督的训练目标各是什么啊？

thank you very much!

BERT-flow是什么原理，还是不懂。

多谢啊

Correction for a mistake in 'last2avg'

Dear BERT-Flow authors,

I notice there's a mistake in your code where you attempt to average the last 2 layers of BERT but instead you average the first and last layer of BERT, which has also been pointed out by issue #11.
Specifically, the for-loop at line 172 of your run_siamese.py file starts from 0 instead of 1, which means that you take the average of the first and the last layer.

As BERT-Flow is a very important line of work in STS, which others need to compare with and build upon, please either correct the mistakes in the paper or update the results using the last 2 layers. However, if I'm wrong please correct me.

Best,
Minghan

Will it be better to replace BERT with ELECTRA?

Will it be better to replace BERT with ELECTRA? Have done relevant experiments？

The paper link at the beginning of README is incorrect

As title

May I get access to BERT-base-NLI-flow ?

It seems that the provided Google drive link only contains BERT-large models, could you release the base version of BERT-flow for purpose of better reproduction?

无监督训练为什么会有label呢？

请问, 校正过程所用语料覆盖面是否要足够大

看完paper，有些问题请教下。
在训练 BERT-flow，即校正句子语义空间的时候，用的语料是下游任务的语料。
(1)那么是否意味着，对每种下游任务，都需要校正一遍？
(2)是否可以做全局校正？比如用Wikipedia的语料进行一次全局校正？
(3)是否可以做到词级分布校正？
谢谢！

Fine-tuning bert-flow embedding with supervision

I fine-tuned the bert model with a dataset labeled with [-1,0,1], which means conflict, neutral and entailment. I used the cos_similarity loss to optimize the model.
After fine-tuning, the cos_similarity of bert embedding vary from -1 to 1.

Then I used cos_loss + flow_loss to fit flow model. But the cos_similarity of bert-flow embedding only vary from -0.4x to 1(If I only use flow_loss to fit flow model, the cos_similarity will vary from -0.00x to 1).

Is it normal?

loss为负

你好，我在QQP数据集上进行训练时，发现loss为负数，请问是什么原因呢

avg-last-2 取的是[-0]和[-1]层的embedding

如题，for i in range(n_last)取的层数是不是有问题？

about flow acc

我在蚂蚁金融数据集上进行了测试，一个中文的句子对相似任务[0，1] 分类，进行了如下实验：
1.bash scripts/train_siamese.sh train
"--exp_name=exp_${BERT_NAME}_${RANDOM_SEED}
--num_train_epochs=1.0
--learning_rate=2e-5
--train_batch_size=16
--cached_dir=${CACHED_DIR}"
在eval 中的acc 为85左右

2.bash scripts/train_siamese.sh train
"--exp_name_prefix=exp
--cached_dir=${CACHED_DIR}
--flow=1 --flow_loss=1
--num_examples=0
--num_train_epochs=3.0
--flow_learning_rate=1e-3
--train_batch_size=16"
没有使用avg-last-2,而是avg,
initial_ckpt 是1中的输出
在eval 中的acc 稳定在50左右
为什么使用了flow acc结果却下降了

some questions about this paper

According to the article and the code, my understanding is that the method proposed in this paper is essentially to train a invertible mapping by using the glow model, and then use the original function to transform Bert Embedding to the Gaussian distribution space, and the parameters of the Bert model will not be changed in this process; of course, the label information is not used in the model training process. Finally, when the model is saved, only some relevant parameters of the glow model need to be saved.

Is my understanding correct?