Code Monkey home page Code Monkey logo

hugomartinbjork / nlp-sentiment-analysis-project Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 25.93 MB

This project aims to improve NLP sentiment analysis through ensemble methods by proposing two simplified methods that combine BERT with VADER's VAD score. Inspired by Wang et al., the project seeks to increase performance in classifying the sentiment of IMDb reviews.

Jupyter Notebook 100.00%
bert imdb-dataset natural-language-processing transformers vader-sentiment-analysis

nlp-sentiment-analysis-project's Introduction

NLP-Project - Exploring BERT and VADER ensamble methods for sentiment analysis

Description

This project will investigate possibilities of fine-tuning existing techniques of NLP sentiment analysis through ensemble methods. The project takes inspiration from the work of Wang et al. but proposes two simplified methods which focus on combining BERT with VAD score from VADER to increase performance in the task of classifying the sentiment of IMDb reviews.


Dataset

The dataset used consist of 50k labeled IMDb movie reviews. Due to hardware constraints, we slim down the dataset into reviews of length 128 or lower. We then split the data into 70% train, 15% validation and 15% test.

Train Valid Test
Positive 4,849 1,033 1,013
Negative 4,442 958 979
Total 9,291 1,991 1,992

Evaluation

To evaluate the effectiveness of the model, and later do comparisons between the baseline and the other implemented models, 4 performance evaluation metrics will be used.

  1. Accuracy, which measures proportion of correctly classified instances by the model.
  2. Precision, which measures proportion of predicted positives correctly classified by the model.
  3. Recall, which measures proportion of true positives correctly classified by the model.
  4. F1-score, which takes both precision and recall into account and gives a weighted avarage.

Performance will mostly be judged on accuracy. However, in order to get additional insights and a full understanding of our model, precision, recall and f1-score will serve as a complementary to the accuracy.

Baseline

As baseline for this project, a regular BERT model has been implemented and fine tuned on the task of classifying the sentiment of IMDb reviews.

Training our baseline model for 1 epoch using a batch size of 32 yielded the following average results:

Accuracy Precision Recall F1-score
87.7 86.3 90.4 88.3

Results

Method 1

Method 1 implements a multi layer perceptron to combine the fine-tuned BERT model from our baseline with VAD-scores from VADER. Training the MLP implementation yielded average results as follows:

Accuracy Precision Recall F1-score
88.5 88.2 89.6 88.9

Method 2

Method 2 assigns weights to the individual results from the fine-tuned BERT and VADER and combines the models with different weight-combinations. The best combination of weights yielded the following average results:

Accuracy Precision Recall F1-score
88.6 87.2 91.2 89.1

nlp-sentiment-analysis-project's People

Contributors

hugomartinbjork avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.