Code Monkey home page Code Monkey logo

toxicer's Introduction

Comparing Toxic Texts

The datasets in this project are:

  1. The Ruddit dataset which can be obtained from here: https://www.kaggle.com/datasets/rajkumarl/ruddit-jigsaw-dataset

  2. The Kaggle validation dataset for Jigsaw Rate Severity of Toxic Comments which can be found here: https://www.kaggle.com/competitions/jigsaw-toxic-severity-rating

There are 3 notebooks in src directory of this project responsible for preprocessing the datasets and fine-tuning DistilBERT model for classification and regression to perform comparing toxic texts and identifying which one is more toxic.

Table: The result of regression and pair classification methods.

Ruddit Test Kaggle Validation
Regression - 0.67364
Pair Classification 0.79179 0.65072

Installing Required dependencies:

Based on the platform that you are running the codes on, you might need different dependencies, but generally you can install packages in requirements.txt file using the following command:

pip install -r requirements.txt

Running the code:

To run the code, you need to first download the datasets and put them in data directory inside src directory. Then you can run the notebooks in src directory.

toxicer's People

Contributors

smmousavisp avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.