Sentiment analysis has vast applications such as for political results predictions, decision-making related to different services and products, and recommendations of various items. People express their opinions on social media in the English language as well as their native languages. This project aims to carry out a sentiment analysis on one of the native languages called "Pashto". The Pashto language is the national language of Afghanistan, and it is spoken in many regions of Pakistan. We used online social networks generated corpus and annotated it into positive and negative by two different native and well-aware Pashto speakers. We performed binary classification using Supervised Learning algorithms including Support Vector Machine, Naive Bayes, decision Tree, Random Forest, and AdaBoost. The results are evaluated using the standard performance evaluation measures including Accuracy, F-measure, Precision, and Recall. The results show that the Naive Bayes achieved better accuracy than other ML algorithms.
Django Web app developed and deployed on 'pythonanawhere' server.
Project Live on : https://farhadmohmand66.pythonanywhere.com
The corpus of the Pashto language is generated from Facebook. The CSV file was created and stored every sentence according to the following fields: (i) ID (iii) Source (link) from where the comments were collected and topic of comments (iii) Pashto Text, this is the main text for SA (iv) English translation, (v) Annotator One and (vi) Annotator Two, these both annotators were a native speaker of Pashto language and well familiar of Pashto Text. The corpus belongs to three genres Politics, Sports, Dramas, and Movies combined. The corpus contains 300 rows of Politics, 150 rows of Sports, 150 rows of dramas and movies, and seven attributes. The link to the corpus is given below:
Corpus Link: https://www.kaggle.com/farhadkhan66/datasets