CSS-LM

CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models

WWW-Workshop 2021 Accepted.
IEEE/TASLP 2021 Accepted.

Overview

CSS-LM improves the fine-tuning phase of PLMs via contrastive semi-supervised learning. Specifically, given a specific task, we retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to the task. By performing contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances, CSS-LM can help PLMs capture crucial task-related semantic features and achieve better performance in low-resource scenarios.

Setups

python>=3.6
torch>=2.0.0+cu118

Requirements

pip install -r requirement.sh

Prepare the data

Download the open domain corpus (openwebtext) and backbone models (roberta-base, bert-base-uncased) and move them to the corresponding directories.

wget https://cloud.tsinghua.edu.cn/f/690e78d324ee44068857/?dl=1
mv 'index.html?dl=1' download.zip
unzip download.zip

rm -rf __MACOSX
scp -r download/openwebtext data
scp -r download/roberta-base script/roberta-base-768
scp -r download/bert-base-uncased script/bert-base-768

Semi-supervised Contrastive Fine-tuning (CSS-LM)

The CSS-LM (run_${DATASET}_sscl_dt_k.sh and run_bert_${DATASET}_sscl_dt_k.sh) is our main method. Users can run the the example of script/semeval_example.sh

for i_th in {1..5};
do
    #RoBERTa-base Model
    bash run_semeval_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_semeval_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th

    #BERT-base Moodel
    bash run_bert_semeval_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_bert_semeval_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th

done

We will introduce the the whole training pipeline and provide the detail of arguments in the following parts.

Run the All Experiments

Excute 'script/run1.sh'.

cd script
bash run1.sh

The run1.sh script.

for i_th in {1..5};
do
    #RoBERTa-based Model
    bash run_${DATASET}_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_${DATASET}_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_${DATASET}_st.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_${DATASET}_sscl.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th

    #BERT-based Moodel
    bash run_bert_${DATASET}_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_bert_${DATASET}_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_bert_${DATASET}_st.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
    bash run_bert_${DATASET}_sscl.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
done

In run1.sh, we have two kinds of backbone models (BERT and RoBERTa).

RoBERTa-based

run_${DATASET}_finetune.sh: Few-shot Fine-tuning (Standard)
run_${DATASET}_sscl_dt_k.sh: Semi-supervised Contrastive Fine-tuning (CSS-LM)
run_${DATASET}_st.sh: Supervised Contrastive Fine-tuning (SCF)
run_${DATASET}_sscl.sh: Semi-supervised Contrastive Pseudo Labeling Fine-tuning (CSS-LM-ST)

BERT-based

run_bert_${DATASET}_finetune.sh: Few-shot Fine-tuning (Standard)
run_bert_${DATASET}_finetune.sh: Semi-supervised Contrastive Fine-tuning (CSS-LM)
run_bert_${DATASET}_finetune.sh: Supervised Contrastive Fine-tuning (SCF)
run_bert_${DATASET}_finetune.sh: Semi-supervised Contrastive Pseudo Labeling Fine-tuning (CSS-LM-ST)

Arguments

${DATASET}: Can be semeval, sst5, scicite, aclintent, sciie, chemprot, and chemprot.
$gpu_0 $gpu_1 $gpu_2 $gpu_3: You could assign the numbers of GPUs and gpu_ids that you need.
$N_1 $N_2 $N_3: The number of annotated instances.
$N_times_1 $N_times_2: The number of training epoches.
$batch_size: Training batch size.
$max_length: The max length of the input sentence.
$i_th: Given 5 random seeds to train the models. Each $i_th indicates the different random seed.

Citation

Please cite our paper if you use CSS-LM in your work:

@article{su2021csslm,
   title={CSS-LM: A Contrastive Framework for Semi-Supervised Fine-Tuning of Pre-Trained Language Models},
   volume={29},
   ISSN={2329-9304},
   url={http://dx.doi.org/10.1109/TASLP.2021.3105013},
   DOI={10.1109/taslp.2021.3105013},
   journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
   publisher={Institute of Electrical and Electronics Engineers (IEEE)},
   author={Su, Yusheng and Han, Xu and Lin, Yankai and Zhang, Zhengyan and Liu, Zhiyuan and Li, Peng and Zhou, Jie and Sun, Maosong},
   year={2021},
   pages={2930–2941}
}

Contact

Yusheng Su

Mail: [email protected]; [email protected]

thunlp / css-lm Goto Github PK

css-lm's Introduction

CSS-LM

Overview

Setups

Requirements

Prepare the data

Semi-supervised Contrastive Fine-tuning (CSS-LM)

Run the All Experiments

RoBERTa-based

BERT-based

Arguments

Citation

Contact

css-lm's People

Contributors

Stargazers

Watchers

Forkers

css-lm's Issues

Recommend Projects

Recommend Topics

Recommend Org