Code Monkey home page Code Monkey logo

persuasion_spans's Introduction

🪐 spaCy Project: Detecting Persuasion with spaCy

Configuration and code accompanying Detecting Persuasion with spaCy.

Persuasion techniques express shortcuts in the argumentation process, e.g. by leveraging on the emotions of the audience or by using logical fallacies to influence it. This project creates a spaCy pipeline with a SpanCategorizer to detect and classify spans in which persuasion techniques are used in a text.

Notes:

  • No other pre-processing of data is performed except conversion to spacy binary format.
  • Default configuration files are used for small, large, and transformer models.
  • After training/evaluation of every model, the created model is removed! For the purpose of the associated article, we are interested in the metrics, not the created models.
  • For the article describing this project, suggester configuration is changed manually to vary between maximum 16-grams and maximum 32-grams configurations.
  • Evaluation output for training different models (JSON format) is processed by report.py to allow for comparison.
  • GPU is used only for the transformer models.
  • A suggester configuration for maximum 32-grams with a transformer model will run out of 8GB memory of a GPU. In the provided configuration here batch sizes are tweaked to make it run, but at a loss of some twenty percent of accuracy. * On a 6-core CPU, a 32-grams configuration with a transformer model took some 14 hours to run!

Python code is used to:

  • create the corpus in spacy format from the original dataset.
  • extract data from generated metrics files for reporting.

📋 project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the spaCy projects documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using spacy project run [name]. Commands are only re-run if their inputs have changed.

Command Description
corpus Convert the data to spaCy's format
train_sm Train and evaluate 'sm' model for 16-grams and 32-grams configurations
train_lg Train and evaluate 'lg' model for 16-grams and 32-grams configurations
train_trf Train and evaluate 'trf' model for 16-grams and 32-grams configurations
report Convert metrics of the different trained models to CSV for reading into notebook
clean Remove intermediate files

⏭ Workflows

The following workflows are defined by the project. They can be executed using spacy project run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow Steps
all corpustrain_smtrain_lgtrain_trfreport

🗂 Assets

The following assets are defined by the project. They can be fetched by running spacy project assets in the project directory.

File Source Description
assets Git Dev dataset from SemEval2021 Task-6 'Detection of Persuasive Techniques in Texts and Images'

persuasion_spans's People

Contributors

ceesroele avatar

Stargazers

 avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.