Code Monkey home page Code Monkey logo

css-lm's Introduction

CSS-LM

CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models

  • WWW-Workshop 2021 Accepted.

  • IEEE/TASLP 2021 Accepted.

Overview

CSS-LM CSS-LM improves the fine-tuning phase of PLMs via contrastive semi-supervised learning. Specifically, given a specific task, we retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to the task. By performing contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances, CSS-LM can help PLMs capture crucial task-related semantic features and achieve better performance in low-resource scenarios.

Setups

  • python>=3.6
  • torch>=2.0.0+cu118

Requirements

pip install -r requirement.sh

Prepare the data

Download the open domain corpus (openwebtext) and backbone models (roberta-base, bert-base-uncased) and move them to the corresponding directories.

wget https://cloud.tsinghua.edu.cn/f/690e78d324ee44068857/?dl=1
mv 'index.html?dl=1' download.zip
unzip download.zip

rm -rf __MACOSX
scp -r download/openwebtext data
scp -r download/roberta-base script/roberta-base-768
scp -r download/bert-base-uncased script/bert-base-768

Semi-supervised Contrastive Fine-tuning (CSS-LM)

The CSS-LM (run_${DATASET}_sscl_dt_k.sh and run_bert_${DATASET}_sscl_dt_k.sh) is our main method. Users can run the the example of script/semeval_example.sh

for i_th in {1..5};
do
    #RoBERTa-base Model
    bash run_semeval_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_semeval_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th

    #BERT-base Moodel
    bash run_bert_semeval_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_bert_semeval_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th

done

We will introduce the the whole training pipeline and provide the detail of arguments in the following parts.

Run the All Experiments

Excute 'script/run1.sh'.

cd script
bash run1.sh

The run1.sh script.

for i_th in {1..5};
do
    #RoBERTa-based Model
    bash run_${DATASET}_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_${DATASET}_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_${DATASET}_st.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_${DATASET}_sscl.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th

    #BERT-based Moodel
    bash run_bert_${DATASET}_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_bert_${DATASET}_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_bert_${DATASET}_st.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_bert_${DATASET}_sscl.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
done

In run1.sh, we have two kinds of backbone models (BERT and RoBERTa).

RoBERTa-based

  • run_${DATASET}_finetune.sh: Few-shot Fine-tuning (Standard)
  • run_${DATASET}_sscl_dt_k.sh: Semi-supervised Contrastive Fine-tuning (CSS-LM)
  • run_${DATASET}_st.sh: Supervised Contrastive Fine-tuning (SCF)
  • run_${DATASET}_sscl.sh: Semi-supervised Contrastive Pseudo Labeling Fine-tuning (CSS-LM-ST)

BERT-based

  • run_bert_${DATASET}_finetune.sh: Few-shot Fine-tuning (Standard)
  • run_bert_${DATASET}_finetune.sh: Semi-supervised Contrastive Fine-tuning (CSS-LM)
  • run_bert_${DATASET}_finetune.sh: Supervised Contrastive Fine-tuning (SCF)
  • run_bert_${DATASET}_finetune.sh: Semi-supervised Contrastive Pseudo Labeling Fine-tuning (CSS-LM-ST)

Arguments

  • ${DATASET}: Can be semeval, sst5, scicite, aclintent, sciie, chemprot, and chemprot.
  • $gpu_0 $gpu_1 $gpu_2 $gpu_3: You could assign the numbers of GPUs and gpu_ids that you need.
  • $N_1 $N_2 $N_3: The number of annotated instances.
  • $N_times_1 $N_times_2: The number of training epoches.
  • $batch_size: Training batch size.
  • $max_length: The max length of the input sentence.
  • $i_th: Given 5 random seeds to train the models. Each $i_th indicates the different random seed.

Citation

Please cite our paper if you use CSS-LM in your work:

@article{su2021csslm,
   title={CSS-LM: A Contrastive Framework for Semi-Supervised Fine-Tuning of Pre-Trained Language Models},
   volume={29},
   ISSN={2329-9304},
   url={http://dx.doi.org/10.1109/TASLP.2021.3105013},
   DOI={10.1109/taslp.2021.3105013},
   journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
   publisher={Institute of Electrical and Electronics Engineers (IEEE)},
   author={Su, Yusheng and Han, Xu and Lin, Yankai and Zhang, Zhengyan and Liu, Zhiyuan and Li, Peng and Zhou, Jie and Sun, Maosong},
   year={2021},
   pages={2930–2941}
}

Contact

Yusheng Su

Mail: [email protected]; [email protected]

css-lm's People

Contributors

yushengsu-thu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

css-lm's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.