Code Monkey home page Code Monkey logo

mitweet's Introduction

MITweet

Ideology Takes Multiple Looks: A High-Quality Dataset for Multifaceted Ideology Detection (EMNLP 2023)

Multifaceted Ideology Schema

The multifaceted ideology schema contains five domains that reflect different aspects of society. Under the five domains, there are twelve facets with ideological attributes of left- and right-leaning.

Multifaceted Ideology Schema

Multifaceted Ideology Schema

Multifaceted Ideology Schema

Illustration of Multifaceted Ideology Schema

The MITweet Dataset

Based on the schema, we construct a new high-quality dataset, MITweet, for a new multifaceted ideology detection (MID) task. MITweet contains 12,594 English Twitter posts, each manually annotated with a Relevance label, and an Ideology label if the Relevance label is “Related”, along each facet. Meanwhile, MITweet covers 14 highly controversial topics in recent years (e.g., abortion, covid-19 and Russo-Ukrainian war).

Label Distribution

Label Distribution of MITweet

Baselines

we develop baselines for the new MID task based on three widely-used PLMs (BERT, RoBERTa, BERTweet) under both in-topic and cross-topic settings. We split the multifaceted ideology detection procedure into two sub-tasks in a pipeline manner:

  1. Relevance Recognition
  2. Ideology Detection

In-topic Setting

results_in-topic

Cross-topic Setting

results_cross-topic

Reproduce

We provide the dataset and code for reproducing.

In the directory data , MITweet.csv is the complete dataset.

Each .csv data file contains the following columns:

  • topic

  • tweet

  • tokenized tweet : tokenized tweets using the tweet segmentation tool in nltk

  • R1 ~ R5 : relevance labels for the 5 domains. 1 means "Related", 0 means "Unelated"

  • R1-1-1 ~ R512-5-3 : relevance labels for the 12 facets. 1 means "Related", 0 means "Unrelated"

  • I1 ~ I12 : ideology labels for the 12 facets. 0 , 1 , 2 mean left-leaning, center, right-leaning, respectively. -1 means "Unrelated", so no ideology label

How to Run

  • Indicator Detection

    python log_odds_ratio.py
    
  • Relevance Recognition

    python train_relevance.py \
    	--train_data_path your_path \
    	--val_data_path your_path \
    	--test_data_path your_path
    
  • Ideology Detection

    python train_ideology.py \
    	--train_data_path your_path \
    	--val_data_path your_path \
    	--test_data_path your_path \
    	--indicator_file_path your_path
    

mitweet's People

Contributors

lst1836 avatar

Stargazers

 avatar  avatar CatHat avatar  avatar Lixin Wang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.