This repository contains the source code and the dataset for vaccine attitude detection.
The annotations are given in the form of ID,stance,aspect_span_start:aspect_span_end,opinion_span_start:opinion_span_end,aspect_catetegory
in the Datasets_Raw
folder.
To obtain tweet text,
cd twitter_get_text_by_id_twitter4j
- Open
./settings/crawler.properties
and setup yourconsumerKey, consumerSecret, access token and access token secret
.- For the acquisition of
consumerKey, consumerSecret, access token and access token secret
, please refer to https://developer.twitter.com/en/docs/developer-portal/overview. The Standard v1.1 is sufficient.
- For the acquisition of
- run twitter_get_text_by_id_twitter4j by either
java -jar twitter_vac_opi_cwl_by_id.jar ./settings/crawler.properties
orjavac -cp "./*" ./src/main/org/backingdata/twitter/crawler/rest/TwitterRESTTweetIDlistCrawler.java
The tweets are stored in./saves
in json format.
cd VADMlmFineTuning
VADtransformer is firstly trained unsupervised. The model will be saved to ../datasets/mlm-vad
.
To perform unsupervised training,
- Replace tweetIDs in
UnannotatedTwitterID_training.csv
andUnannotatedTwitterID_testing.csv
with obtained tweet text. - Put the tweet text file in
../datasets
. The format is the same asvad_train_finetune.txt
. cd src
and runtrain_vad_albert_vae.py
cd VADStanceAndTextspanPrediction
In the previous step we obtain the unsupervised pre-trained VAD, scilicet the TopicDrivenMaskedLM. At this stage we wrap the model with classifiers and constrains, and train the model.
To perform supervised training,
- Move the saved model (i.e., the
pytorch_model.bin
file) from the../datasets/mlm-vad
of VAD unsupervised training to the./datasets/albertconfigs/vadlm-albert-large-v2/vad-cache
folder. For your convenience a saved TopicDrivenMaskedLM is ready-to-use in the./datasets/albertconfigs/vadlm-albert-large-v2/vad-cache
folder. - Move the saved config of the model (i.e., the
config.json
file) from the../datasets/mlm-vad
of VAD unsupervised training to the./datasets/albertconfigs/vadlm-albert-large-v2/vadlm-albert-large-v2
folder. For your convenience a saved config.json is ready-to-use in the./datasets/albertconfigs/vadlm-albert-large-v2/vadlm-albert-large-v2
folder. cd src
and runvadtrain_eval_predict.py
for training and testing.- Training: Uncomment line 1559-1578 of
vadtrain_eval_predict.py
and run the file. Checkpoints will be saved in./datasets/vadcheckpoints/5-fold-211103/vadlm-albert-large-v2/
- Testing: Uncomment line 1580-1608 of
vadtrain_eval_predict.py
and run the file. The prediction will be output in same directory. A saved model can be downloaded via this link. You can place the save model in./datasets/vadcheckpoints/5-fold-211103/vadlm-albert-large-v2/
for a quick start.
- Training: Uncomment line 1559-1578 of