Code Monkey home page Code Monkey logo

newsmtsc's Introduction

NewsMTSC: (Multi-)Target-dependent Sentiment Classification in News Articles

NewsMTSC is a dataset for target-dependent sentiment classification (TSC) on news articles reporting on policy issues. The dataset consists of more than 11k labeled sentences, which we sampled from news articles from online US news outlets. More information can be found in our paper published at the EACL 2021.

This repository contains the dataset for target-dependent sentiment classification in news articles reporting on policy issues. Additionally, the repository contains our model named GRU-TSC, which achieves state-of-the-art TSC classification performance on NewsMTSC. Check it out - it works out of the box :-)

This readme consists of the following parts:

If you are only looking for the dataset, you can download it here or view it here.

To make available the model also to non-experts in computer science, we aimed to make the installation and using the model as easy as possible. If you face any issue with using the model or notice an issue in our dataset, please feel free to open an issue.

Installation

It's super easy, we promise!

To keep things easy, we use Anaconda for setting up requirements. If you do not have it yet, follow Anaconda's installation instructions. NewsMTSC was tested on MacOS and Ubuntu; other OS may work, too. Let us know :-)

1. Setup the conda environment:

conda create --yes -n newsmtsc python=3.7
conda activate newsmtsc

2. Clone the repository:

git clone [email protected]:fhamborg/NewsMTSC.git
cd NewsMTSC

3. Install pytorch:

Choose either of the following. Either use this command if your GPU supports CUDA:

conda install --yes "pytorch=1.7.1" torchvision cudatoolkit=10.1 -c pytorch

Or use this command if your GPU does not support CUDA, you don't know what CUDA is, or if the previous command gives you an error:

conda install --yes "pytorch=1.7.1" torchvision -c pytorch

4. Install remaining packages:

conda install --yes pandas tqdm scikit-learn
conda install --yes -c conda-forge boto3 regex sacremoses jsonlines matplotlib tabulate imbalanced-learn "spacy>=2.1,<3"
conda install --yes -c anaconda requests gensim openpyxl networkx
pip install "transformers>=3.1.0,<4"
python -m spacy download en_core_web_sm

5. Download our model:

python download.py

You're all set now :-)

Target-dependent Sentiment Classification

Target-dependent sentiment classification works out-of-the-box. Have a look at infer.py or give it a try:

python infer.py

Training

There are two entry points to the system. train.py is used to train and evaluate a specific model on a specific dataset using specific hyperparameters. We call a single run an experiment. controller.py is used to run multiple experiments automatically. This is for example useful for model selection and evaluating hundreds or thousands of combinations of models, hyperparameters, and datasets.

Running a single experiment

train.py allows fine-grained control over the training and evaluation process, yet for most command line arguments we provide useful defaults. Important arguments include

  • --model_name (which model is used, e.g., LCF_BERT),
  • --dataset_name (which dataset is used, e.g., newstsc)
  • --default_lm (which language model is used, e.g., roberta-base).

For more information refer to train.py and combinations_absadata_0.py. If you just want to test the system, the command below should work out of the box.

python train.py --model_name lcf_bert --optimizer adam --initializer xavier_uniform_ --learning_rate 2e-5 --batch_size 16 --balancing None --num_epoch 3 --lsr True --use_tp_placeholders False --eval_only_after_last_epoch True --devmode False --local_context_focus cdm --SRD 3 --pretrained_model_name bert_news_ccnc_10mio_3ep --snem recall_avg --dataset_name newstsc --experiment_path ./experiments/newstsc_20191126-115759/0/ --crossval 0 --task_format newstsc

Running multiple experiments

controller.py takes a set of values for each argument, creates combinations of arguments, applies conditions to remove unnecessary combinations (e.g., some arguments may only be used for a specific model), and creates a multiprocessing pool to run experiments of these argument combinations in parallel. After completion, controller.py creates a summary, which contains detailed results, including evaluation performance, of all experiments. By using createoverview.py, you can export this summary into an Excel spreadsheet.

Acknowledgements

This repository is in part based on ABSA-PyTorch. We thank Song et al. for making their excellent repository open source.

How to cite

If you use the dataset or model, please cite our paper (PDF):

@InProceedings{Hamborg2021b,
  author    = {Hamborg, Felix and Donnay, Karsten},
  title     = {NewsMTSC: (Multi-)Target-dependent Sentiment Classification in News Articles},
  booktitle = {Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)},
  year      = {2021},
  month     = {Apr.},
  location  = {Virtual Event},
}

newsmtsc's People

Contributors

fhamborg avatar movabo avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.