Code Monkey home page Code Monkey logo

analysis-on-ml-model-s-classification-of-bengali-hate-speech-in-different-social-contexts's Introduction

Analysis on ML Model's Classification of Bengali Hate Speech in Different Social Contexts

4rth Year 2nd Semester Pattern Recognition Lab Project.
Hate Speech classification analysis project using different machine learning models.

Abstract โ€” Hate Speech has evolved as a result of social media platforms and online streaming services. With the huge volume of content created by users on social media sites, it is possible and affordable to use modern machine learning methods to address the problem of offensive language. But in order to train models that can be used across different social contexts where offensive language is commonly used, it is necessary to have diverse datasets that reflect the many different languages and contexts in which this language occurs. In this paper, we identify the shortcomings of existing Bangla HS datasets and introduce a large manually labelled dataset BD-SHS that includes HS in different social contexts. The labelling criteria were prepared following a hierarchical annotation process, which is the first of its kind in Bangla HS to the best of our knowledge. The dataset includes more than 50,200 offensive comments crawled from online social networking sites and is at least 60% larger than any existing Bangla HS datasets. We present the benchmark result of our dataset by training different NLP models resulting in the best one achieving an F1-score of 91.0%. In our experiments, we discovered that a word embedding trained solely on 1.47 million comments from social media and streaming sites consistently outperformed other pre-trained embeddings in terms of HS detection modeling.

Index Terms โ€” Hate Speech Classification, Binary Classifica- tion, Machine Learning models, Ensemble Techniques, BD-SHS

Objectives:

  • Data Preprocessing, Data Cleaning
  • Data Augmentation
  • Bi Gram, Trigram
  • Lematization
  • TF-IDF
  • Machine Learning Algorithm (Fine Tuning, Grid Search)
  • Ensemble Techniques

analysis-on-ml-model-s-classification-of-bengali-hate-speech-in-different-social-contexts's People

Contributors

tonmoytalukder avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.