Code Monkey home page Code Monkey logo

sentiment-analysis-of-netflix-reviews's Introduction

Sentiment-Analysis-of-Netflix-Reviews

In this project a Neural Network model based on Recurrent Neural Networks aims to predict whether a Netflix review conveys a positive or a negative sentiment. For each review the model predicts a percentage value for conveying a particular sentiment.

In case of Recurrent Neural Network I am using Long-Short-Term-Memory (LSTMs) networks. Via input arguments the user can specify whether these LSTMs should be bi-directional or uni-directional. A Dropout wrapper around the LSTMs prevents the overfitting.

Getting Started

Prerequisites

TensorFlow 1.5

Python 3.7.1

Data

The data consists of 5000 negative and 5000 positive Netflix reviews. You can examine the reviews in data/raw/.

I am using tf.Data API to ensure a fast, high performance data input pipeline. tf.DATA API works the best if the data is in the tf.Records format.

The cleaned and formatted reviews in tf.Record format can be found in data\tf_records\.

You can do the cleaning of the data by yourself by executing src\preprocess\clean_file.py. To export the data in tf.Records format execute src\data\tf_records_writer.py.

Start the Training of The Model

To start the training of the model run the python script src\train.py, with (optionaly) your own hyperparameters to overwrite the existing ones. For example:

  python src\train.py \
        --num_epoch=25 \
        --batch_size=32 \
        --learning_rate=0.0005 \
        --architecture=uni_directional (or bi-directional) \
        --lstm_units=100  \
        --dropout_keep_prob=0.5 \
        --embedding_size=100  \
        --required_acc_checkpoint=0.7 \

The meaning of these hyperparameters can be found in the documentations of tf.FLAGS in train.py.

After the execution, the training of the model should start. You can observe the training loss and the accuracy on the training set and test set. Accuracy gives the ratio of correctly predicted sentiment of a given Netflix Review. You may see results like these:

        epoch_nr: 0, train_loss: 0.654, train_acc: 0.629, test_acc: 0.737
        epoch_nr: 1, train_loss: 0.451, train_acc: 0.809, test_acc: 0.753
        epoch_nr: 2, train_loss: 0.294, train_acc: 0.889, test_acc: 0.762
        epoch_nr: 3, train_loss: 0.205, train_acc: 0.930, test_acc: 0.740
        epoch_nr: 4, train_loss: 0.141, train_acc: 0.951, test_acc: 0.754
        epoch_nr: 5, train_loss: 0.108, train_acc: 0.965, test_acc: 0.743
        epoch_nr: 6, train_loss: 0.085, train_acc: 0.973, test_acc: 0.723
        epoch_nr: 7, train_loss: 0.069, train_acc: 0.977, test_acc: 0.738
        epoch_nr: 8, train_loss: 0.055, train_acc: 0.984, test_acc: 0.725
        epoch_nr: 9, train_loss: 0.048, train_acc: 0.986, test_acc: 0.727
        epoch_nr: 10, train_loss: 0.044, train_acc: 0.987, test_acc: 0.723

During training we can observe overfitting on the training set. Hence a lower value for dropout_keep_prob is suggested.

After the accuracy on the test set reaches the value required_acc_checkpoint, the model begins to save checkpoints in checkpoints/model.ckpt of the underlying dataflow graph and the parameters of the network.

Deployment

For deployment perposes the model must be exported in the SavedModel format. In order to do so execute the script src\inference.py:

        python src\inference.py \
              --export_path_base==model-export/

Attention: Other hyparameters must stay the same as during the training of the network.

The created instance of SavedModel can be run in a Docker container in a cloud for example. (The documentation for this will be extended in the future.)

sentiment-analysis-of-netflix-reviews's People

Contributors

artem-oppermann avatar

Watchers

James Cloos avatar Michel SEBAG avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.