Code Monkey home page Code Monkey logo

humor-hound's Introduction

Unstructured Data course

Humor Hound

We present HumorHound, a NLP model that classifies news article titles as sarcastic or not. This project was undertaken as part of the Unstructured Data course within the Big Data Master's Degree program at Comillas ICAI University.

The team responsible for the project includes:

Name Email
Jorge Ayuso Martínez [email protected]
Carlota Monedero Herranz [email protected]
José Manuel Vega Gradit [email protected]

Overview

Our project aims to develop a classifier that can accurately identify whether a news article title is sarcastic or not. We use a dataset available on Kaggle, [1] [2] which consists of news article titles labeled as sarcastic or not.

  1. Misra, Rishabh and Prahal Arora. "Sarcasm Detection using News Headlines Dataset." AI Open (2023).
  2. Misra, Rishabh and Jigyasa Grover. "Sculpting Data for ML: The first act of Machine Learning." ISBN 9798585463570 (2021).

We explore various natural language processing (NLP) techniques, including feature engineering, word embeddings, and transfer learning. Our goal is to identify the best approach that can help us achieve high accuracy and other evaluation metrics, such as precision, recall, and F1 score. Throughout the project, we continuously monitor the performance of our models and perform error analysis to identify common mistakes.

Our final product, HumorHound, is a user-friendly web application that allows users to enter a news article title and determine whether it is sarcastic or not, using our best-performing model. We provide clear instructions on how to run the application and deploy it to a server.

We believe that our project will provide insights into the challenges of text classification and how to overcome them using various NLP techniques.

Interactive App

In addition to training and testing a wide variety of models, we have also developed a small web application using Streamlit. Streamlit is an open-source web framework that allows users to build lightweight web applications to serve Machine Learning and Data Science capabilities using Python.

However, Streamlit alone does not allow for the creation of REST API endpoints to interact with it, so we also buit a backend to allow for model serving using FastAPI.

In addition, we used Docker in order to perform virtualization of these services, so that they can be run in any computer!

To run the application locally, the first step is to download Docker following the OS specifications.

Once Docker is installed, you can run the run.sh file on the terminal. This file contains the commands used for building the images and running the containers that contain the apps:

sudo bash run.sh

Once executed, you can run docker ps to check that the containers are up an running.

Finally, to access the UI, head to http://localhost:8501/, where you should see a small app that looks like this:

UI

In the UI, you can write any headline you want in the text box and press "Get prediction" to get a prediction on whether the headline tone is sarcastic or not. The bar at the bottom of the info box graphically shows the probability assigned to the model's prediction.

Normal

Sarcastic

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.