The sentimentanalysisunisgml from farhadmohmand66

farhadmohmand66 / sentimentanalysisunisgml Goto Github PK

View Code? Open in Web Editor NEW

Sentiment analysis using Machine Learning and deployment of model

Python 3.89% CSS 0.97% HTML 2.35% Jupyter Notebook 92.79%

sentimentanalysisunisgml's Introduction

Sentiment Analysis of Pashto Text Using Machine Learning Techniques

Sentiment analysis has vast applications such as for political results predictions, decision-making related to different services and products, and recommendations of various items. People express their opinions on social media in the English language as well as their native languages. This project aims to carry out a sentiment analysis on one of the native languages called "Pashto". The Pashto language is the national language of Afghanistan, and it is spoken in many regions of Pakistan. We used online social networks generated corpus and annotated it into positive and negative by two different native and well-aware Pashto speakers. We performed binary classification using Supervised Learning algorithms including Support Vector Machine, Naive Bayes, decision Tree, Random Forest, and AdaBoost. The results are evaluated using the standard performance evaluation measures including Accuracy, F-measure, Precision, and Recall. The results show that the Naive Bayes achieved better accuracy than other ML algorithms.

Django Web app developed and deployed on 'pythonanawhere' server.
Project Live on : https://farhadmohmand66.pythonanywhere.com

About Corpus

The corpus of the Pashto language is generated from Facebook. The CSV file was created and stored every sentence according to the following fields: (i) ID (iii) Source (link) from where the comments were collected and topic of comments (iii) Pashto Text, this is the main text for SA (iv) English translation, (v) Annotator One and (vi) Annotator Two, these both annotators were a native speaker of Pashto language and well familiar of Pashto Text. The corpus belongs to three genres Politics, Sports, Dramas, and Movies combined. The corpus contains 300 rows of Politics, 150 rows of Sports, 150 rows of dramas and movies, and seven attributes. The link to the corpus is given below:

Corpus Link: https://www.kaggle.com/farhadkhan66/datasets

User view:

🔗 Contact:

Recommend Projects

farhadmohmand66 / sentimentanalysisunisgml Goto Github PK

sentimentanalysisunisgml's Introduction

Sentiment Analysis of Pashto Text Using Machine Learning Techniques

About Corpus

🔗 Contact:

sentimentanalysisunisgml's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent