Code Monkey home page Code Monkey logo

sentiment-analysis-in-russian's Introduction

Sentiment Analysis in Russian

This repository contains links to models for sentiment analysis of texts in Russian, which were trained within Evaluation of Pre-Trained Transformers for Sentiment Analysis of Texts in Russian and Deep Transfer Learning Baselines for Sentiment Analysis in Russian articles.

Evaluation of Pre-Trained Transformers for Sentiment Analysis of Texts in Russian

Model Score
Rank Dataset
SentiRuEval-2016
RuSentiment KRND LINIS Crowd RuTweetCorp RuReviews
TC Banks
micro F1 macro F1 F1 micro F1 macro F1 F1 wighted F1 F1 F1 F1 F1 F1
SOTA n/s 76.71 66.40 70.68 67.51 69.53 74.06 78.50 n/s 73.63 60.51 83.68 77.44
XLM-RoBERTa-Large 76.37 1 82.26 76.36 79.42 76.35 76.08 80.89 78.31 75.27 75.17 60.03 88.91 78.81
SBERT-Large 75.43 2 78.40 71.36 75.14 72.39 71.87 77.72 78.58 75.85 74.20 60.64 88.66 77.41
MBARTRuSumGazeta 74.70 3 76.06 68.95 73.04 72.34 71.93 77.83 76.71 73.56 74.18 60.54 87.22 77.51
Conversational RuBERT 74.44 4 76.69 69.09 73.11 69.44 68.68 75.56 77.31 74.40 73.10 59.95 87.86 77.78
LaBSE 74.11 5 77.00 69.19 73.55 70.34 69.83 76.38 74.94 70.84 73.20 59.52 87.89 78.47
XLM-RoBERTa-Base 73.60 6 76.35 69.37 73.42 68.45 67.45 74.05 74.26 70.44 71.40 60.19 87.90 78.28
RuBERT 73.45 7 74.03 66.14 70.75 66.46 66.40 73.37 75.49 71.86 72.15 60.55 86.99 77.41
MBART-50-Large-Many-to-Many 73.15 8 75.38 67.81 72.26 67.13 66.97 73.85 74.78 70.98 71.98 59.20 87.05 77.24
SlavicBERT 71.96 9 71.45 63.03 68.44 64.32 63.99 71.31 72.13 67.57 72.54 58.70 86.43 77.16
EnRuDR-BERT 71.51 10 72.56 64.74 69.07 61.44 60.21 68.34 74.19 69.94 69.33 56.55 87.12 77.95
RuDR-BERT 71.14 11 72.79 64.23 68.36 61.86 60.92 68.48 74.65 70.63 68.74 54.45 87.04 77.91
MBART-50-Large 69.46 12 70.91 62.67 67.24 61.12 60.25 68.41 72.88 68.63 70.52 46.39 86.48 77.52

Deep Transfer Learning Baselines for Sentiment Analysis in Russian

This repository contains the fine-tuned Multilingual Bidirectional Encoder Representations from Transformers (M-BERT), RuBERT, and two versions of Multilingual Universal Sentence Encoder (M-USE) for sentiment classification in Russian referenced in Deep Transfer Learning Baselines for Sentiment Analysis in Russian.

Dataset Measure Current SOTA M-BERT RuBERT M-USE-CNN M-USE-Trans
SentiRuEval-2016 TC F1 68.42 66.29
70.68
63.64 68.27
macro F1PN 66.07 61.78 66.40 58.97 62.77
micro F1PN 74.11 72.45 76.71 71.31 75.00
SentiRuEval-2016 Banks F1 74.06 65.31 72.83 66.71 72.40
macro F1PN 69.53 58.00 65.89 58.73 65.04
micro F1PN 71.76 60.52 68.43 62.41 68.21
SentiRuEval-2016 TC F1 68.54 60.47 64.39 60.57 64.28
macro F1PN 63.47 53.16 57.76 52.37 57.60
micro F1PN 67.51 57.03 61.38 57.76 61.18
SentiRuEval-2016 Banks F1 79.51 67.65 70.58 66.32 69.62
macro F1PN 67.44 56.97 60.95 54.74 59.12
micro F1PN 70.09 59.32 63.33 57.61 62.17
RuSentiment F1 n/s 71.37 72.03 66.27 68.60
weighted F1 78.50 75.13 75.71 71.05 73.42
Kaggle Russian News Dataset F1 70.00 71.36 73.63 71.27 72.66
LINIS Crowd F1 37.29 42.73 60.51 56.34 56.95
RuTweetCorp (binary) F1 75.95 83.04 83.69 81.34 83.17
RuTweetCorp (trinary) F1 78.1 80.10 80.79 78.39 79.69
RuReviews F1 75.45 77.31 77.44 76.63 76.94

SOTA approaches for RuReviews, RuSentiment, Kaggle Russian News Dataset, and RuTweetCorp were described in papers (Smetanin and Komarov, 2019), (Baymurzina et al., 2019), (Shalkarbayuli et al., 2018), and (Rubtsova, 2018), consequently. The SOTA approach for LINIS Crowd was implemented based on the paper (Koltsova et al., 2016).

Sentiment Datasets in Russian

Despite the fact that Russian is one of the most common languages in the World Wide Web, generally it is not as well-resourced as the English language, especially in the field of sentiment analysis. Even though many studies aim at sentiment classification, only few of them makes their datasets publicly available for the research community.

Dataset Classes Average lengths Max lengths Train Samples Test Samples Overall Samples Download Link
SentiRuEval-2016 (Loukachevitch and Rubtsova, 2016) 3 87.0928 172 18,035 5,560 23,595 Project page
SentiRuEval-2015 Subtask (Loukachevitch et al., 2015) 3 81.4986 172 8,580 7,738 16,318 Project page
RuTweetCorp (Rubtsova, 2013) 3 89.1725 189 n/a n/a 334836 Project page
LINIS Crowd (Koltsova et al., 2016) 5 n/a n/a n/a n/a n/a Project page
RuSentiment (Rogers et al., 2018) 5 82.0279 800 28218 2967 31185 Project page
Kaggle Russian News Dataset 3 3911.8501 381498 n/a n/a 8263 Kaggle page
RuReviews (Smetanin and Komarov, 2019) 3 130.0693 1007 n/a n/a 90,000 GitHub page

Fine-Tuned Models

To download fine-tuned models for Russian, please follow the link https://yadi.sk/d/Xp5vLG_5xCQL-Q.

Citation

@article{Smetanin2020Deep,
  title = {Deep transfer learning baselines for sentiment analysis in Russian},
  author = {Sergey Smetanin and Mikhail Komarov},
  journal = {Information Processing & Management},
  volume = {58},
  number = {3},
  pages = {102484},
  year = {2021},
  issn = {0306-4573},
  doi = {https://doi.org/10.1016/j.ipm.2020.102484},
  url = {https://www.sciencedirect.com/science/article/pii/S0306457320309730}
}

License

See LICENSE.

sentiment-analysis-in-russian's People

Contributors

sismetanin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.