Code Monkey home page Code Monkey logo

ulmfit-sentiment's Introduction

ULMFiT-Sentiment

Applying a semi-supervised ULMFiT model to Twitter US Airlines Sentiment.


Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text data using text analysis techniques. Sentiment analysis tools allow businesses to identify customer sentiment toward products, brands or services in online feedback. It is one of the most basic techniques in Natural Language Processing to analyse the emotions and sentiments behind a specific piece of text.

In this project I was delegated the task of analysing the sentiments of the US-Airline-Twitter-Sentiment Dataset, but instead of using the conventional NLTK based approach and machine learning models, I was expected to use fastai's ULMFiT method that makes use of LSTMs and RNNs to perform the same.

Dataset Used

Twitter US Airline Sentiment: https://www.kaggle.com/crowdflower/twitter-airline-sentiment

Method Deployed

Universal Language Model Fine-tuning for Text Classification (ULMFiT) method: https://www.aclweb.org/anthology/P18-1031.pdf

Getting Started

The first few steps consisted of the usual data loading, splitting and pre-processing steps by plotting the various correlations between the different parameters given to us and filtering out the most important ones, followed by tokenization, removal of stop words and formulating regular expressions to weed out hashtags and links that might give the model an inaccurate picture of the dataset.

Approach and Models

All the other details regarding the ULMFiT method for sentiment analysis can be found in the notebook itself, which is extremely well documented following the guidelines of the fellowship.ai team.

Final Results and Scoring Metric Used

The scoring metric used here is 'accuracy'. Our model performs quite well having an overall accuracy of over 82% on the test set. It does a great job considering the fact that informal writings, like the ones on Twitter often do not make use of proper and conventional grammatical structures. Moreover, figures of speech such as 'sarcasm' are difficult to account for while feature engineering and are sometimes difficult to catch and recognize even for humans.

Prerequisites

On my local machine, I've used:

  • Kaggle kernels - attributing to the sheer degree of convinience they provide when it comes to common Machine Learning Libraries and frameworks. ,
  • python : 3.7.6
  • fastai : 1.0.61
  • fastprogress : 0.2.3
  • torch : 1.5.1
  • nvidia driver : 418.67
  • torch cuda : 10.1
  • torch cudnn : 7603

Hardware

  • nvidia gpus : 1
  • torch devices : 1
    • gpu0 : 16280MB | Tesla P100-PCIE-16GB

Contributing

Please feel free to fork the above repository and open an issue first before submitting a pull request.

Authors

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

ulmfit-sentiment's People

Contributors

anmolpant avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.