Code Monkey home page Code Monkey logo

aniruddha18-spd / multilingual-sentiment-analysis-with-bert-model Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 3.47 MB

The following project used fine-tuned Bidirectional Encoder Representations from Transformer (BERT) model for Sentiment Analysis evaluated on an open-source, multilingual dataset of over 30 languages annotated with Plutchik's eight-core emotions.

Python 8.39% Jupyter Notebook 91.61%
bert bert-model bert-fine-tuning plutchik sentimental-analysis python3 jupyter-notebook multilingual

multilingual-sentiment-analysis-with-bert-model's Introduction

Multilingual-Sentiment-Analysis-with-Fine-tuned-Bidirectional-Encoder-Representations-from-Transformer

The following project used fine-tuned Bidirectional Encoder Representations from Transformer (BERT) model for Sentiment Analysis evaluated on an open-source, multilingual dataset of over 30 languages annotated with Plutchik's eight-core emotions.

Part 1: INTRODUCTION

Sentiment Analysis involves understanding and scoring the underlying emotions behind a text. While significant work has been done on understanding emotions in a singular or a smaller set of languages, performing accurate analysis across multiple, ten or more, languages with a single model is still an unsolved research problem. With the advancement in multilingual transformer-based models, machines have gotten better are understanding and interpreting multiple languages. While human, expert-level translation and understanding of languages is still unrivaled, the scale of data generated with the internet makes this problem challenging. The proposed approach, if successful, may provide a higher overview of national or regional sentiments and change of sentiments over time.

Part 2: ALIGNMENT AND DEGREE OF INNOVATION

Sentiment Analysis works incredibly well when data is in a single language, in particular English. However, a single approach that works and scales well across multiple, disparate languages is still a significant research problem in Natural Language Processing.
The base approach is derived from the seminal work on the Bidirectional Encoder Representations from Transformer (BERT) model. BERT model has achieved state-of-the-art performance in most Natural Language Processing Tasks and has spawned multiple variations. A recent variation, proposed in 2020, aims to learn the encodings of multiple languages in a singular model. While improvements to the multilingual BERT model are still being made, the model has not yet been tested for the sentiment analysis task. This is the first work utilizing multilingual BERT for sentiment analysis across ten or more languages to the best of our knowledge. The dataset used for this task, XDE, was also released in 2020 and included over 30 languages in writing. The dataset is expected to expand to include 40 more languages.
The proposed approach, if successful, can be extended to include more languages and data sources such as audio or videos. Specifically, online and social media sources can retrieve national or regional sentiment analysis results irrespective of language. A dashboard and data on "change of perceptions" may also be created.

Part 3: TEAM

Understanding emotion from text is a fundamental step towards Artificial General Intelligence. With the advancement towards multilingual models, language models are getting better every day. This proposed project represents an intellectual challenge and growth opportunity for the team.
The team lead is an expert in machine learning, deep-learning approaches to computer vision, and natural language tasks. In addition to the team lead, the team comprises three goal-driven students experienced in Python programming. Under the direction of the project lead, the students will study, program, train, and evaluate the requisite model and novel variants using the available dataset for sentiment analysis. We see this project as an opportunity to learn and add to existing knowledge in the field of Natural Language Processing.

Part 4: TECHNOLOGY AND CONCEPT VIABILITY

The base model being used has provided state-of-the-art performance in Natural Language tasks compared to existing models for language embedding. While the model itself is larger, fine-tuning for sentiment classification requires finite and lower computational resources than comparable approaches.

multilingual-sentiment-analysis-with-bert-model's People

Contributors

aniruddha18-spd avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.