Code Monkey home page Code Monkey logo

convbert's Introduction

ConvBERT

Introduction

In this repo, we introduce a new architecture ConvBERT for pre-training based language model. The code is tested on a V100 GPU. For detailed description and experimental results, please refer to our NeurIPS 2020 paper ConvBERT: Improving BERT with Span-based Dynamic Convolution.

Requirements

  • Python 3
  • tensorflow 1.15
  • numpy
  • scikit-learn

Experiments

Pre-training

These instructions pre-train a medium-small sized ConvBERT model (17M parameters) using the OpenWebText corpus.

To build the tf-record and pre-train the model, download the OpenWebText corpus (12G) and setup your data directory in build_data.sh and pretrain.sh. Then run

bash build_data.sh

The processed data require roughly 30G of disk space. Then, to pre-train the model, run

bash pretrain.sh

See configure_pretraining.py for the details of the supported hyperparameters.

Fine-tining

We gives the instruction to fine-tune a pre-trained medium-small sized ConvBERT model (17M parameters) on GLUE. You can refer to the Google Colab notebook for a quick example. See our paper for more details on model performance. Pre-trained model can be found here. (You can also download it from baidu cloud with extraction code m9d2.)

To evaluate the performance on GLUE, you can download the GLUE data by running

python3 download_glue_data.py

Set up the data by running mv CoLA cola && mv MNLI mnli && mv MRPC mrpc && mv QNLI qnli && mv QQP qqp && mv RTE rte && mv SST-2 sst && mv STS-B sts && mv diagnostic/diagnostic.tsv mnli && mkdir -p $DATA_DIR/finetuning_data && mv * $DATA_DIR/finetuning_data. After preparing the GLUE data, setup your data directory in finetune.sh and run

bash finetune.sh

And you can test different tasks by changing configs in finetune.sh.

If you find this repo helpful, please consider cite

@inproceedings{NEURIPS2020_96da2f59,
 author = {Jiang, Zi-Hang and Yu, Weihao and Zhou, Daquan and Chen, Yunpeng and Feng, Jiashi and Yan, Shuicheng},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
 pages = {12837--12848},
 publisher = {Curran Associates, Inc.},
 title = {ConvBERT: Improving BERT with Span-based Dynamic Convolution},
 url = {https://proceedings.neurips.cc/paper/2020/file/96da2f590cd7246bbde0051047b0d6f7-Paper.pdf},
 volume = {33},
 year = {2020}
}

References

Here are some great resources we benefit:

Codebase: Our codebase are based on ELECTRA.

Dynamic convolution: Implementation from Pay Less Attention with Lightweight and Dynamic Convolutions

Dataset: OpenWebText from Language Models are Unsupervised Multitask Learners

convbert's People

Contributors

philipmay avatar zhoudaquan avatar zihangjiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

convbert's Issues

请问你提供的预训练模型是中文预训练模型 还是英文 是基于什么进行训练的 细节可以稍微介绍下吗

我从readme 里下载了你的预训练模型 convbert_base convbert_medium convbert_small. 这三个模型 文件夹里没有词表, 我根据你项目中的词表 vocab.txt (30522维度) 我理解你这是英文的预训练模型,请问我理解的对吗(我是根据electra 来看 英文模型词表是30522 中文预训练模型 词表是 21128)。谢谢回答

UnboundLocalError: local variable 'seq_length' referenced before assignment

Hi, I am using the ConvBertForTokenClassification model in transformers and encountered the bug when passing only input_embeds to forward().
The traceback says that at line 833 in modeling_convbert.py

if token_type_ids is None:
    if hasattr(self.embeddings, "token_type_ids"):
        buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]

The seq_length is unassigned.

I noticed just above this piece of code that in

elif input_ids is not None:
    input_shape = input_ids.size()
    batch_size, seq_length = input_shape
elif inputs_embeds is not None:
    input_shape = inputs_embeds.size()[:-1]

seq_length is not assigned if the program enters elif inputs_embeds is not None.

Not sure if it is the batch_size, seq_length = input_shape missing for inputs_embeds or I am not using the model correctly?

预测性能

hi,
请问有预测性能数据吗?
比如, bert_base, bert_tiny, conv_bert, conv_bert_small

疑惑

image
论文描述中,这个部分是LConv,有点不解,望不吝解答,感谢

span light conv疑惑

你好,我想请问下,在span light conv中,既然已经用tf.layers.separable_conv1d生成了带span信息的矩阵key_conv_attn_layer,为什么还需要点乘query_layer呢?对应于conv_attn_layer = tf.multiply(key_conv_attn_layer, query_layer)。感觉此处点乘不是很有必要

关于预训练的问题

我想请问一下,在实际预训练过程中,如何去判断训练多少步是足够的,另外训练过程中loss大概是多少,我目前在9-11左右一直在徘徊,是不是有问题?

关于mixed-attention推理速度的问题

请问一下,因为我看到论文中提到的FLOPs分别是26.5G和19.3G,请问这个实验数据是怎么得到的,因为我自己测试12层的medium-small模型encoder总体是在1GFLOPs左右。还有后面的推理速度是什么条件下测试出来的呢?
因为我这边得到的结果是推理速度慢于原始的self-attention,我猜想是因为里面虽然浮点计算操作少了,但是数据搬运的时间多了(reshape、transpose)。

Training on multiple GPUs for BASE or LARGE Models

Hi,

here #16 (comment) you say

Our code is only tested on a single V100 GPU.

But in your Paper you write about BASE size ConvBERT models.

But BASE size models can not be trained (created) on a single GPU. From my experince you need 8 GPUs.

Could you please explain this? I would like to create a german BASE or maybe even LARGE new language model.

At #16 (comment) you say that Hugging Face might be an option for multi GPU training. From my experience they are good at downstream task training but not good at the initial language model creation.

I would be super happy about some help to create my new ConvBERT BASE or larger model in different languages.

Many thanks
Philip

用自己的数据预训练 各种nah loss 问题

您好,感谢您的开源。我用自己的数据进行预训练 默认的2e-4 lr 的base 模型 一开始训练就nah loss. 换成 medium-small 模型 使用 2e-4 2e-5 均存在 训练大概几千步nah loss 退出训练的问题 想请教下解决办法

Train on GPU instead of TPU - differnt distribution strategies

Hi,
many thanks for this nice new model type and your research.
We would like to train a ConvBERT but on GPU and not TPU.
Do you have any experiences or tips how to do this?
We have concerns regarding the differnt distribution strategies
between GPUs and TPUs.

Thanks
Philip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.