Code Monkey home page Code Monkey logo

sarcasmdetection's Introduction

Detecting Sarcasm in Reddit Comments

This was a small project I worked on, with Rubini and Vikram, during my 2020 Summer Internship at Carnegie Mellon University.

Aim

The aim is to detect sarcasm in comments found on Reddit, using the Sarcasm on Reddit dataset available from Kaggle. Through this, we also aim to identify features that are indicative of sarcasm, and explain our models' predictions.

Methodology and Results

We experimented with TF-IDF and BERT Sentence Embeddings to extract features from text. We tried using various combinations of features, such as using only the comment, its characteristics and also its parent comment, to provide context. Additionally, we tried to use PCA for dimensionality reduction.

The classifiers we used include the Random Forest Classifier, Gradient Boosting Classifier and the Multi-Layer Perceptron, among others.

Our best-performing model was a Random Forest Classifier trained on TF-IDF features extracted from raw text (comment and parent) and also the comment's characteristics such as the subreddit and author. It obtained an F1-Score of 0.66 on the validation set. The comment's characteristics were deemed as very important features by the models we built.

Code

The code is available as three Jupyter Notebook files, simply start up a Jupyter Notebook server and run the code. Ensure that the dependencies are installed before you run the code. To do so, simply execute this command in the Terminal:

pip install -r requirements.txt

Presentation

Our presentation is also available in this repository, and provides more information.

sarcasmdetection's People

Contributors

nandahkrishna avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.