KAREN: Unifying Hatespeech Detection and Benchmarking

This project started as a course project for the 2021 Natural Language Processing course at Tsinghua University and is still a work in progress. Our final project report is available in report.pdf Contributions are accepted for further work.

Introduction

Hate speech, also known as offensive or abusive language, is defined as “any form of communication that disparages a person or group on the basis of some characteristic such as race, color ethnicity, gender, sexual orientation, nationality, religion or other characteristic” (Nockleby, 2000). Nowadays, thanks to the availability of the internet and the emergence of social media, people have the tools necessary to express their opinions online. This of course includes the widespread dissemination of hate speech. Such speech has the potential of causing severe psychological consequences to individuals, or potentially promote verbal or even physical violence against a group. Due to these unwanted consequences, both the industry and academia have been working hard to develop techniques that can accurately detect such forms of hate. Such solutions, however, are not unified. Most research proposes a solution together with their own dataset and evaluates only on this dataset. This suffers from several problems.

Firstly, bias. Due to cultural differences and even just different points of view between different individuals, perception of hate speech varies and is very subjective, which will result in some datasets being especially biased on way or another.

Secondly, dataset incompatibility. It is common for some recent models to make use of metadata which can help improve results with the help of some background information, and this will often lead to a low compatibility between models and datasets.

Overall, it is hard to specify what is the current state of the art and what are the most promising research directions. Very few models can be directly compared as they are trained on different datasets.

To combat these issues we propose KAREN, a framework that intends to unify this research area. Our contribution provides an easy to use system that unifies the testing platform and can be easily utilised by beginners and researchers at the forefront of the field alike. It eases the design of data pre-processing and model implementation, allowing researchers to compare models themselves on their machines, or to contribute with their own datasets, meaning it is easily to get results on new research, compare with other baselines and test the durability of different models in different environments.

Running

To run the framework, you just need to run the run.py file available at the root of the repository. To get started simply run:

python3 run.py --model softmaxregression --dataset hatexplain --dropout 0.15 --max-epochs 5

You can check the parameters of each model in its file or by checking the initial configuration when running it.

Contributing

You can contribute to the framework by adding models and datasets that fit the format of the framework. Please note that for simplification, we assumed this task as being a multi-class classification, so the model must output probabilities of out_feat size which will then be passed to a softmax function.

Models

All implemented models must extend the superclass BaseModel in framework/models/base_model.py and override its methods (which will be used in the remaining training and testing scripts. You can see an example of a Softmax classification in framework/models/softmax_regression.py.

If your model requires specific arguments, you can request them from the parser using the add_required_arguments(parser) method. At the moment, if you run multiple models with the same requirements it will not run. You should also create a make_model function that picks up the arguments from the parser and extracts the one your model needs.

After implementing your model, you can add it to the framework by adding the @RegisterModel decorator. This will make sure the framework can find your model.

You'll also need to add an import in framework/models/__init__.py

Note: different models make use of different data and this framework intends to provide a unified way of testing them and easing implementation. There is a collection of requirements for each model to run that must be containted within the dataset. Please make sure that you're not repeating words, typos or writing them in a different way. You can check the available features of a dataset by checking their data_requirements() method.

Available arguments

When developing a model, some extra arguments are always available for selection. Currently, the list is the following:

in_feat
out_feat
vocab_size
device

The make_model function should refrain from using any others than this list and the arguments specified on add_arguments of itself.

Datasets

Datasets are implemented similar to models. You must extend BaseDataset from the file framework/datasets/base_dataset.py and implemented the required logic. framework/datasets/hatexplain.py provides an example on how to implement a dataset with lazy preprocessing.

For registering datasets, you must use the @RegisterDataset decorator and add the import in the framework/datasets/__init__.py. All the remaining logic is the same as for the models.

Results

The results are available in results.md

Deterministic computation

Currently this is how we handle the seeds for the computation, which is how it's stated in pytorch/pytorch#7068

torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)  # if you are using multi-GPU.
np.random.seed(seed)  # Numpy module.
random.seed(seed)  # Python random module.
torch.manual_seed(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
import os
os.environ['PYTHONHASHSEED'] = str(seed)

But when running twice a simple SoftmaxRegression model, we get different output

tiagoantunes:hatespeech/ (master) $ python3 run.py --model softmaxregression --dataset HATexplAin --max-epochs 1 --batch-size 64                                                                                                                              [20:07:09]
******************************  CONFIGURATION  ******************************
batch_size                              64
cpu                                     False
dataset                                 ['hatexplain']
dropout                                 0.1
embedding_dim                           200
embeddings                              None
lr                                      0.001
max_epochs                              1
model                                   ['softmaxregression']
savename_hatexplain                     HateXPlain.dataset
seed                                    12345
url_hatexplain                          https://raw.githubusercontent.com/hate-alert/HateXplain/master/Data/dataset.json
***************************************************************************** 

Preprocessing HateXPlain

Starting training of (Model=softmaxregression Dataset=hatexplain)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:00<00:00, 309.05it/s, loss=972]
Epoch #1 validation accuracy = 0.294789
Accuracy increased from 0 to 0.29478907585144043, saving model.

Test accuracy: 0.2730281352996826
tiagoantunes:hatespeech/ (master) $ python3 run.py --model softmaxregression --dataset HATexplAin --max-epochs 1 --batch-size 64                                                                                                                              [20:07:19]
******************************  CONFIGURATION  ******************************
batch_size                              64
cpu                                     False
dataset                                 ['hatexplain']
dropout                                 0.1
embedding_dim                           200
embeddings                              None
lr                                      0.001
max_epochs                              1
model                                   ['softmaxregression']
savename_hatexplain                     HateXPlain.dataset
seed                                    12345
url_hatexplain                          https://raw.githubusercontent.com/hate-alert/HateXplain/master/Data/dataset.json
***************************************************************************** 

Preprocessing HateXPlain

Starting training of (Model=softmaxregression Dataset=hatexplain)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:00<00:00, 287.52it/s, loss=979]
Epoch #1 validation accuracy = 0.300744
Accuracy increased from 0 to 0.3007444143295288, saving model.

Test accuracy: 0.2961941659450531

I have tried using torch.use_deterministic_algorithms(True) but with no success either.

I haven't been able to find a fix. Solutions/Suggestions are appreciated.

tiagomantunes / karen Goto Github PK

karen's Introduction

KAREN: Unifying Hatespeech Detection and Benchmarking

Introduction

Running

Contributing

Models

Available arguments

Datasets

Results

karen's People

Contributors

Stargazers

Watchers

Forkers

karen's Issues

Recommend Projects

Recommend Topics

Recommend Org