Code Monkey home page Code Monkey logo

twitter-sentiment-analysis-1's Introduction

twitter-sentiment-analysis

An example of sentiment analysis on Twitter using Emoticons to label the sentiment of the tweet.

To complete the analysis exploits two python libraries:

Collector

The collector scripts allows to define a campaign to collect the tweets we need for the sentiment analysis.

The scripts requires three arguments:

  • sentiment_label, the kind of emoticons we are looking for in the tweets positive -> :) or negative -> :(
  • campain_name, the name of campaign we are going to define
  • keywords, the keywords we are going to look for in the tweets.

The output of the collector.py execution is a csv file named with the pattern [campaign_name]+[sentiment_label].csv (e.g. iphone6-positive.csv). The fields of such a file are TweetID,User_ID,Text,Sentiment.

Pre Processing

The preprocessing scripts modifies the tweets content in order to make possible the further analysis.

  • Any url is removed and substituted with the word "URL"
  • Any @Username is converted to "AT_USER" to make any tweet anonymous
  • Any additional white spaces is removed
  • Ant not alphanumeric symbol is removed
  • Hashtags are substituted with the corresponding word
  • The emoticon used for the labeling is stripped out from the text

Feature Extractor

Before building a classifier we need to extract all the features (word) contained in the tweet text.

Moreover, is necessary to remove any stop words. We also strip out punctuation, digits, and any symbols that might be still in the tweet text.

The file stopwords.txt contains a list of the most common English stopwords.

Classifier

The actual analysis happens by the means of the classifier.py script.

It requires as arguments (the order is relevant):

  • the file of the positive tweets,
  • the file of the negative tweets,
  • the file with the stop words

The analysis follows the following steps:

  1. The tweets dataset which are now separated will be merged and divides into a train dataset, which will be used to train the classifier, and a test dataset, used to test it. Respectively there will be the 80% of the tweets for training and the 20% for testing.

  2. From each tweet will be extracted a feature-set

  3. We use NLTK to describe each tweets in terms of the features it contains. Indeed, we create a list of words ordered by frequency.

  4. We trains a NaiveBayesClassifier with such a dataset

  5. We test the classifier using the remaining 20% of the tweets and we process in the same way of the training dataset.

Results

Here the results of the analysis for the example, train dataset of 800 instances, test dataset of 202 instances.

Most Informative Features:

  • contains check = True ; positive : negative = 200.3 : 1.0
  • contains free = True ; positive : negative = 200.3 : 1.0
  • contains bi = True ; positive : negative = 199.7 : 1.0
  • contains thx = True ; positive : negative = 199.7 : 1.0
  • contains do = True ; positive : negative = 125.7 : 1.0
  • contains hurry = True ; positive : negative = 81.7 : 1.0
  • contains hi = True ; positive : negative = 39.9 : 1.0
  • contains wanna = True ; positive : negative = 20.8 : 1.0
  • contains ios = True ; negative : positive = 12.2 : 1.0
  • contains don = True ; negative : positive = 11.7 : 1.0

Metrics

  • pos precision: 1.0
  • pos recall: 0.693069306931
  • pos F-measure: 0.818713450292
  • neg precision: 0.765151515152
  • neg recall: 1.0
  • neg F-measure: 0.8669527897

To complete this experiment I followed the guided at Link1 and Link2

twitter-sentiment-analysis-1's People

Watchers

James Cloos avatar F/C.AT avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.