你好 BP-Transformer这篇论文非常棒，我很喜欢这篇论文关于Text Classific

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

BPT vs BERT about bpt HOT 4 CLOSED

tanaka-jp commented on September 26, 2024

BPT vs BERT

from bpt.

Comments (4)

yzh119 commented on September 26, 2024

Hi @tanaka-jp , Thanks for your attention!

I haven't compared BPT with BERT/ALBERT because they are not directly comparable: BPT is a network structure that could act as the backbone of BERT/ALBERT.

The right way should be pre-training BPT-BERT on wiki/bookcorpus and compare how it performs against Transformer-BERT finetuned on GLUE/SuperGLUE. However, considering there are already many variants of pre-training strategies: RoBERTa, XLNet, SpanBERT(I think SpanBERT is a perfect match for BPT as both models span-level representation), ALBERT and ELECTRA, we think it better to keep the focus of this paper on the Transformer itself instead of a proposing a new network + a better pre-training framework.

Of course we are interested in improving BERT's using BPT, however I'm not so familiar with BERT pre-training(how to sample, how to set hyper-parameters), so the plan is postponed.

GloVe might be a little bit old. If necessary, I'll try replacing GloVe with pre-trained ELMO/BERT embedding(w/o fine-tuning) as the input in text-classification experiments and compare BPT against LSTM/CNN/Transformer and report the results in this repo, though I don't know if the numbers make too much sense.

from bpt.

tanaka-jp commented on September 26, 2024

谢谢你的回答。
我正在做一个text-classification类别的项目。
根据我的实验结果，ALBERT的精度比BERT高出明显一截。
在设备有限的条件下，因ALBERT对于GPU内存的需求比BERT要小，所以可以实现更大规模的模型。

考虑到BPT中，不同scale的span使得attention的计算量变小，
所以ALBERT+BPT的计算需求应该会更小，从而可以进一步提高模型规模。
我想应该会有更高的精度出来。

from bpt.

LiFXe commented on September 26, 2024

谢谢你的回答。
我正在做一个text-classification类别的项目。
根据我的实验结果，ALBERT的精度比BERT高出明显一截。
在设备有限的条件下，因ALBERT对于GPU内存的需求比BERT要小，所以可以实现更大规模的模型。

考虑到BPT中，不同scale的span使得attention的计算量变小，
所以ALBERT+BPT的计算需求应该会更小，从而可以进一步提高模型规模。
我想应该会有更高的精度出来。

您好，麻烦请问下您相关的尝试是否验证了猜想呢

from bpt.

yzh119 commented on September 26, 2024

Just a notification: the code on master branch is out-dated and might not be compatible with latest version of PyTorch (1.7) or DGL (0.5.2).
Please let me know if you found the code is not runnable and I'll do a refactor.

from bpt.

BPT vs BERT about bpt HOT 4 CLOSED

Comments (4)

Related Issues (4)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent