Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding

by Shanshan Wang, Zhumin Chen, Zhaochun Ren, Muyang Ma, Huasheng Liang, Qiang Yan, Pengjie Ren

@article{wang2022paying,
title={Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding},
author={Wang, Shanshan and Chen, Zhumin and Ren, Zhaochun and Ma, Muyang and Liang, Huasheng and Yan, Qiang and Ren, Pengjie},
journal={arXiv preprint arXiv:2204.02922},
year={2022}
}

Running experiments

Requirements

This code is written in PyTorch. Any version later than 1.9 is expected to work with the provided code. Please refer to the official website for an installation guide.

Datasets

The datasets used in this work are MultiNLI for natural language inference, MedNLI for natural language inference on medical domain and Cross-genre-IR for the across medical genres query. And then put the downloaded dataset in the .data/ folder.

Training

Run the following scripts for model training on different datasets:

python ./train/multiNLI/multiNLI_main_train.py --model_name='bert-base-uncased' --loss_type='task+both' --pd_factor=0.001 --ad_factor=0.001
python ./train/medNLI/medNLI_main_train.py --model_name='bert-base-uncased' --loss_type='task+both' --pd_factor=0.001 --ad_factor=0.001
python ./train/IR/IR_pair_main_train.py --model_name='bert-base-uncased' --loss_type='task+both' --pd_factor=0.001 --ad_factor=0.001

Evaluating

Run the following scripts for model evaluation on different datasets:

python ./test/multiNLI_test.py --model_name='bert-base-uncased' --loss_type='task+both' --pd_factor=0.001 --ad_factor=0.001
python ./test/medNLI_test.py --model_name='bert-base-uncased' --loss_type='task+both' --pd_factor=0.001 --ad_factor=0.001
python ./test/IR_pair_test.py --model_name='bert-base-uncased' --loss_type='task+both' --pd_factor=0.001 --ad_factor=0.001

wow5678 / attentionguiding Goto Github PK

attentionguiding's Introduction

Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding

Running experiments

Requirements

Datasets

Training

Evaluating

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent