Code Monkey home page Code Monkey logo

covidtweets's Introduction

VADet

This repository contains the source code and the dataset for vaccine attitude detection.

Vaccine Attitude Dataset

The annotations are given in the form of ID,stance,aspect_span_start:aspect_span_end,opinion_span_start:opinion_span_end,aspect_catetegory
in the Datasets_Raw folder.

To obtain tweet text,

  1. cd twitter_get_text_by_id_twitter4j
  2. Open ./settings/crawler.properties and setup your consumerKey, consumerSecret, access token and access token secret.
    1. For the acquisition of consumerKey, consumerSecret, access token and access token secret, please refer to https://developer.twitter.com/en/docs/developer-portal/overview. The Standard v1.1 is sufficient.
  3. run twitter_get_text_by_id_twitter4j by either java -jar twitter_vac_opi_cwl_by_id.jar ./settings/crawler.properties or javac -cp "./*" ./src/main/org/backingdata/twitter/crawler/rest/TwitterRESTTweetIDlistCrawler.java The tweets are stored in ./saves in json format.

VAD unsupervised training

cd VADMlmFineTuning
VADtransformer is firstly trained unsupervised. The model will be saved to ../datasets/mlm-vad.

To perform unsupervised training,

  1. Replace tweetIDs in UnannotatedTwitterID_training.csv and UnannotatedTwitterID_testing.csv with obtained tweet text.
  2. Put the tweet text file in ../datasets. The format is the same as vad_train_finetune.txt.
  3. cd src and run train_vad_albert_vae.py

VAD supervised training

cd VADStanceAndTextspanPrediction

In the previous step we obtain the unsupervised pre-trained VAD, scilicet the TopicDrivenMaskedLM. At this stage we wrap the model with classifiers and constrains, and train the model.

To perform supervised training,

  1. Move the saved model (i.e., the pytorch_model.bin file) from the ../datasets/mlm-vad of VAD unsupervised training to the ./datasets/albertconfigs/vadlm-albert-large-v2/vad-cache folder. For your convenience a saved TopicDrivenMaskedLM is ready-to-use in the ./datasets/albertconfigs/vadlm-albert-large-v2/vad-cache folder.
  2. Move the saved config of the model (i.e., the config.json file) from the ../datasets/mlm-vad of VAD unsupervised training to the ./datasets/albertconfigs/vadlm-albert-large-v2/vadlm-albert-large-v2 folder. For your convenience a saved config.json is ready-to-use in the ./datasets/albertconfigs/vadlm-albert-large-v2/vadlm-albert-large-v2 folder.
  3. cd src and run vadtrain_eval_predict.py for training and testing.
    1. Training: Uncomment line 1559-1578 of vadtrain_eval_predict.py and run the file. Checkpoints will be saved in ./datasets/vadcheckpoints/5-fold-211103/vadlm-albert-large-v2/
    2. Testing: Uncomment line 1580-1608 of vadtrain_eval_predict.py and run the file. The prediction will be output in same directory. A saved model can be downloaded via this link. You can place the save model in ./datasets/vadcheckpoints/5-fold-211103/vadlm-albert-large-v2/ for a quick start.

covidtweets's People

Contributors

somethingx1202 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.