🪐 spaCy Project: Detecting Persuasion with spaCy

Configuration and code accompanying Detecting Persuasion with spaCy.

Persuasion techniques express shortcuts in the argumentation process, e.g. by leveraging on the emotions of the audience or by using logical fallacies to influence it. This project creates a spaCy pipeline with a SpanCategorizer to detect and classify spans in which persuasion techniques are used in a text.

Notes:

No other pre-processing of data is performed except conversion to spacy binary format.
Default configuration files are used for small, large, and transformer models.
After training/evaluation of every model, the created model is removed! For the purpose of the associated article, we are interested in the metrics, not the created models.
For the article describing this project, suggester configuration is changed manually to vary between maximum 16-grams and maximum 32-grams configurations.
Evaluation output for training different models (JSON format) is processed by report.py to allow for comparison.
GPU is used only for the transformer models.
A suggester configuration for maximum 32-grams with a transformer model will run out of 8GB memory of a GPU. In the provided configuration here batch sizes are tweaked to make it run, but at a loss of some twenty percent of accuracy. * On a 6-core CPU, a 32-grams configuration with a transformer model took some 14 hours to run!

Python code is used to:

create the corpus in spacy format from the original dataset.
extract data from generated metrics files for reporting.

📋 project.yml

The project.yml defines the data assets required by the project, as well as the available commands and workflows. For details, see the spaCy projects documentation.

⏯ Commands

The following commands are defined by the project. They can be executed using spacy project run [name]. Commands are only re-run if their inputs have changed.

Command	Description
`corpus`	Convert the data to spaCy's format
`train_sm`	Train and evaluate 'sm' model for 16-grams and 32-grams configurations
`train_lg`	Train and evaluate 'lg' model for 16-grams and 32-grams configurations
`train_trf`	Train and evaluate 'trf' model for 16-grams and 32-grams configurations
`report`	Convert metrics of the different trained models to CSV for reading into notebook
`clean`	Remove intermediate files

⏭ Workflows

The following workflows are defined by the project. They can be executed using spacy project run [name] and will run the specified commands in order. Commands are only re-run if their inputs have changed.

Workflow	Steps
`all`	`corpus` → `train_sm` → `train_lg` → `train_trf` → `report`

🗂 Assets

The following assets are defined by the project. They can be fetched by running spacy project assets in the project directory.

File	Source	Description
`assets`	Git	Dev dataset from SemEval2021 Task-6 'Detection of Persuasive Techniques in Texts and Images'

ceesroele / persuasion_spans Goto Github PK

persuasion_spans's Introduction

🪐 spaCy Project: Detecting Persuasion with spaCy

📋 project.yml

⏯ Commands

⏭ Workflows

🗂 Assets

persuasion_spans's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent