Code Monkey home page Code Monkey logo

supervised-learning-using-word-vectors's Introduction

Supervised-learning-using-Word-Vectors

Data Set and Problem Statement

I performed analysis on on a large Kaggle dataset of movie reviews from IMDB to predict whether a movie review is positive or negative. The data is split into training and testing sets. I have used a Random Forest classifier to fit the "Bag of Words" model. This model is further used to predict the sentiment label of movie reviews in the test dataset. I have understood how learning word vectors could play an important role in predictions using supervised learning in a text corpus.

Methodology

  • Pre-processing : Cleaning, Lowercase, Tokenization, Stopwords removal
  • Feature Extraction : Bag of Words - CountVectorizer
  • Classification : Random Forest
  • Dimension Reduction : Feature importances

Result

Testing the model on the dataset with 25000 features yields about 85.3% accuracy. The figure below shows the precision, recall, f1-score values on the test data.

The diagram below shows the confusion matrix obtained.

After reducing the dimension and bringing down the features to 20000, the model yields about 86% accuracy. The figure below shows the precision, recall, f1-score values on the test data.

The diagram below shows the confusion matrix obtained.

How to run

Run the project code by submitting it to spark.

spark-submit ClassificationRandomForest.py labeledTrainData.tsv 

supervised-learning-using-word-vectors's People

Contributors

oindrillac avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.