analysis-on-ml-model-s-classification-of-bengali-hate-speech-in-different-social-contexts's Introduction

Analysis on ML Model's Classification of Bengali Hate Speech in Different Social Contexts

4rth Year 2nd Semester Pattern Recognition Lab Project.
Hate Speech classification analysis project using different machine learning models.

Abstract — Hate Speech has evolved as a result of social media platforms and online streaming services. With the huge volume of content created by users on social media sites, it is possible and affordable to use modern machine learning methods to address the problem of offensive language. But in order to train models that can be used across different social contexts where offensive language is commonly used, it is necessary to have diverse datasets that reflect the many different languages and contexts in which this language occurs. In this paper, we identify the shortcomings of existing Bangla HS datasets and introduce a large manually labelled dataset BD-SHS that includes HS in different social contexts. The labelling criteria were prepared following a hierarchical annotation process, which is the first of its kind in Bangla HS to the best of our knowledge. The dataset includes more than 50,200 offensive comments crawled from online social networking sites and is at least 60% larger than any existing Bangla HS datasets. We present the benchmark result of our dataset by training different NLP models resulting in the best one achieving an F1-score of 91.0%. In our experiments, we discovered that a word embedding trained solely on 1.47 million comments from social media and streaming sites consistently outperformed other pre-trained embeddings in terms of HS detection modeling.

Index Terms — Hate Speech Classification, Binary Classifica- tion, Machine Learning models, Ensemble Techniques, BD-SHS

Objectives:

Data Preprocessing, Data Cleaning
Data Augmentation
Bi Gram, Trigram
Lematization
TF-IDF
Machine Learning Algorithm (Fine Tuning, Grid Search)
Ensemble Techniques

Recommend Projects

tonmoytalukder / analysis-on-ml-model-s-classification-of-bengali-hate-speech-in-different-social-contexts Goto Github PK