Code Monkey home page Code Monkey logo

bert-flow's People

Contributors

bohanli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bert-flow's Issues

predict 的问题

感谢大佬的工作,请问在predict的时候,只有BERT加载训练好的参数,而flow部分虽然之前进行了无监督训练,但是predict的时候似乎并没有加载训好的参数?

部分参数加载如下:
INFO:tensorflow: name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT
INFO:tensorflow: name = bert/pooler/dense/kernel:0, shape = (768, 768), INIT_FROM_CKPT
INFO:tensorflow: name = bert/pooler/dense/bias:0, shape = (768,), INIT_FROM_CKPT

flow部分并没有加载训好的参数?
INFO:tensorflow: name = bert/flow/revnet_0/revnet_step_0/actnorm/actnorm_center/b:0, shape = (1, 1, 1, 64)
INFO:tensorflow: name = bert/flow/revnet_0/revnet_step_0/actnorm/actnorm_scale/logs:0, shape = (1, 1, 1, 64)
INFO:tensorflow: name = bert/flow/revnet_0/revnet_step_0/additive/nn/conv_block/1_1/W:0, shape = (1, 1, 32, 32)
INFO:tensorflow: name = bert/flow/revnet_0/revnet_step_0/additive/nn/conv_block/1_1/actnorm/actnorm_center/b:0, shape = (1, 1, 1, 32)
INFO:tensorflow: name = bert/flow/revnet_0/revnet_step_0/additive/nn/conv_block/1_1/actnorm/actnorm_scale/logs:0, shape = (1, 1, 1, 32)
INFO:tensorflow: name = bert/flow/revnet_0/revnet_step_0/additive/nn/conv_block/1_2/W:0, shape = (1, 1, 32, 32)

训练损失为负数,

感觉作者训练有造假的嫌疑,首先
ops = [glow_ops.get_variable_ddi, glow_ops.actnorm, glow_ops.get_dropout]
encoder = glow_ops.encoder_decoder
self.z, encoder_objective, self.eps, _, _ = encoder(
"flow", x, self.hparams, eps=None, reverse=False,init=init)
objective += encoder_objective
这时的objective 损失为正,原因是其实就是在变换过程中antocrm的参数以及分割向量后经过正态分布密度函数后的log_prob的概率负数,但是作者确实使用+=,有待商榷,参数总和可能为正也可能为负,但是log_prob一定为负,由于概率再0-1之间,
self.z_top_shape =self.z.shape
prior_dist = self.top_prior()
prior_objective =paddle.sum(
prior_dist.log_prob(self.z), axis=[1, 2, 3])
#self.z_sample = prior_dist.sample()
objective += prior_objective
所以这个为负数就一点也不奇怪,所以作者的试验效果好,确实需要验证,我正在验证中

求作者出来解释一下

Correction for a mistake in 'last2avg'

Dear BERT-Flow authors,

I notice there's a mistake in your code where you attempt to average the last 2 layers of BERT but instead you average the first and last layer of BERT, which has also been pointed out by issue #11.
Specifically, the for-loop at line 172 of your run_siamese.py file starts from 0 instead of 1, which means that you take the average of the first and the last layer.

As BERT-Flow is a very important line of work in STS, which others need to compare with and build upon, please either correct the mistakes in the paper or update the results using the last 2 layers. However, if I'm wrong please correct me.

Best,
Minghan

May I get access to BERT-base-NLI-flow ?

It seems that the provided Google drive link only contains BERT-large models, could you release the base version of BERT-flow for purpose of better reproduction?

请问, 校正过程所用语料覆盖面是否要足够大

看完paper,有些问题请教下。
在训练 BERT-flow,即校正句子语义空间的时候,用的语料是下游任务的语料。
(1)那么是否意味着,对每种下游任务,都需要校正一遍?
(2)是否可以做全局校正?比如用Wikipedia的语料进行一次全局校正?
(3)是否可以做到词级分布校正?
谢谢!

Fine-tuning bert-flow embedding with supervision

I fine-tuned the bert model with a dataset labeled with [-1,0,1], which means conflict, neutral and entailment. I used the cos_similarity loss to optimize the model.
After fine-tuning, the cos_similarity of bert embedding vary from -1 to 1.
AFD04716-FDA8-44EC-8228-FF3C68956D43
Then I used cos_loss + flow_loss to fit flow model. But the cos_similarity of bert-flow embedding only vary from -0.4x to 1(If I only use flow_loss to fit flow model, the cos_similarity will vary from -0.00x to 1).
D3B8E5C8-236F-4C2D-998D-E54C156BFDC9
Is it normal?

loss为负

你好,我在QQP数据集上进行训练时,发现loss为负数,请问是什么原因呢
image
image

about flow acc

我在蚂蚁金融数据集上进行了测试,一个中文的句子对相似任务[0,1] 分类,进行了如下实验:
1.bash scripts/train_siamese.sh train
"--exp_name=exp_${BERT_NAME}_${RANDOM_SEED}
--num_train_epochs=1.0
--learning_rate=2e-5
--train_batch_size=16
--cached_dir=${CACHED_DIR}"
在eval 中的acc 为85左右

2.bash scripts/train_siamese.sh train
"--exp_name_prefix=exp
--cached_dir=${CACHED_DIR}
--flow=1 --flow_loss=1
--num_examples=0
--num_train_epochs=3.0
--flow_learning_rate=1e-3
--train_batch_size=16"
没有使用avg-last-2,而是avg,
initial_ckpt 是1中的输出
在eval 中的acc 稳定在50左右
为什么使用了flow acc结果却下降了

some questions about this paper

According to the article and the code, my understanding is that the method proposed in this paper is essentially to train a invertible mapping by using the glow model, and then use the original function to transform Bert Embedding to the Gaussian distribution space, and the parameters of the Bert model will not be changed in this process; of course, the label information is not used in the model training process. Finally, when the model is saved, only some relevant parameters of the glow model need to be saved.

Is my understanding correct?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.