Code Monkey home page Code Monkey logo

chulong-li / real-time-sentiment-tracking-on-twitter-for-brand-improvement-and-trend-recognition Goto Github PK

View Code? Open in Web Editor NEW
266.0 12.0 124.0 5.92 MB

A real-time interactive web app based on data pipelines using streaming Twitter data, automated sentiment analysis, and MySQL&PostgreSQL database (Deployed on Heroku)

Home Page: https://twitter-analysis-web-app.herokuapp.com/

License: MIT License

Jupyter Notebook 99.73% Python 0.27%
data-analysis streaming-data twitter-sentiment-analysis brand-improvement stream-processing tweets heroku-server topic-tracking dashboard twitter dash plotly

real-time-sentiment-tracking-on-twitter-for-brand-improvement-and-trend-recognition's Introduction

Real-time Twitter Sentiment Analysis for Brand Improvement and Topic Tracking

Web App GIF

Dive into the industry and get my hands dirty. That's why I start this self-motivated independent project.

Try this awesome Real-Time Twitter Monitoring System here on Heroku server. Read a series of related articles below:

  • Chapter 1: Collecting Twitter Data using Streaming Twitter API with Tweepy, MySQL, & Python
  • Chapter 2: Twitter Sentiment Analysis and Interactive Data Visualization using RE, TextBlob, NLTK, and Plotly
  • Chapter 3: Deploy a Real-time Twitter Analytical Web App on Heroku using Dash & Plotly in Python
  • Chapter 4 (In Progress): Parallelize Streaming Twitter Sentiment Analysis using Scala, Kafka and Spark Streaming

Inspiration

The solution for evaluating Twitter data to perform better business decisions is to keep tracking all relevant Twitter content about a brand in real-time, perform analysis as topics or issues emerge, and detect anomaly with alert. By monitoring brand mentions on Twitter, brands could inform enagement and deliver better experiences for their customers across the world.

Interesting facts from exploratory data analysis

  • Less 0.01% users will push tweets with their locations.
  • Tweets grabbed from streaming data won't have more than 0 LIKE or RETWEET, since you have already captured them even before others press buttons :p
  • More than 65.6% users will write the locations in their profile, although very few of them don't live on Earth according to that fact.
  • The numbers of positive and negative tweets are relatively close and stay low compared with neural tweet number. Unless emergency events happen, lines won't fluctuate acutely.

Technical Approach - Version 2 ( ~ Sep 16)

  1. Build ETL pipelines based on stream processing using Kafka (In Progress)
  2. Perform sentiment analysis using Spark Streaming (In Progress)

Orignal Development - Version 1.2 (Done🎉)

  1. Extract streaming Twitter Data, preprocess data in Python, and load data into MySQL for storage
  2. Perform exploratory data analysis with Pandas & Seaborn to explore the insights
  3. Connect with Plotly for real-time interactive dashboard based on time series
  4. Deploy the real-time interactive front-end web app using Dash & Heroku PostgreSQL on Heroku server

Quick Demo

Real-time Interactive Web App on Heroku server

web app has been deployed on Heroku.

Real-time Twitter Sentiment Analysis in Jupyter Notebook

Try this interactive data visuilization in Jupyter Notebook. To run with streaming data, you need to deploy it locally.

Complex Dashboard

Get Started

Pre-installation

pip install -r requirements.txt

Set-up

Create a file called credentials.py and fill in the following content

# Go to http://apps.twitter.com and create an app.
# The consumer key and secret will be generated for you
API_KEY = "XXXXXXXXXXXXXX"
API_SECRET_KEY = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

# After the step above, you will be redirected to your app's page.
# Create an access token under the the "Your access token" section
ACCESS_TOEKN = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
ACCESS_TOKEN_SECRET = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

Create local MySQL database with info below

host="localhost"
user="root"
passwd="password"
database_table="TwitterDB"

Track Word Setting (Optional)

You can change the TRACK_WORDS in settings.py into any word, brand, or topic you're interested.

Stream the complex visualization

To perform streaming processing on dashboard, you need to deploy all settings above as well as let Main.ipynb keep listening.

Run

Run Main.ipynb to start scraping data on Jupter Notebook.

Run Analysis.ipynb to perform data analysis for brand improvement after Main.ipynb starts running.

Run Trend_Analysis_Complex to track topic trends on Twitter after Main.ipynb starts running.

Note: Since streaming process is always on, press STOP button to finsih.

Dash Web Application

All things related to Dash App is placed in dash_app folder.

Challenges

  • Unstructured tweet texts may contain messy code and emoji characters
  • Some brands may take a long time to collection enough data to perform analysis on issue emerging since they target specify groups of people
  • Some hot words will uncover useful insights only after appearing more than 10k times on tweets
  • Plotly doesn't have well-document on reference making customize dashboard much harder
  • More challenges on the way, but Google, StackOverFlow, Towards Data Science, and GitHub will always be your best friends

real-time-sentiment-tracking-on-twitter-for-brand-improvement-and-trend-recognition's People

Contributors

chulong-li avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

real-time-sentiment-tracking-on-twitter-for-brand-improvement-and-trend-recognition's Issues

Twitter app getting suspended.

I am using access keys of my standard product track. I have been trying to do sentiment analysis of real time tweets using this codebase. the script runs perfectly but after around 24 hours my app gets suspended. Twitter send me email with no specific reason behind suspension. I have been searching solution for this for 3 days but in vain. I would appreciate you time.

KeyError

Hi there,

Apologies if this is a bit of a rookie question/request, I am new to programming and trying to learn as much as I can as I go.

I am having a problem with the Trend_Analysis_Complex.ipynb file, as I keep receiving this error:
(NOTE: I am looking at Corona instead of Facebook which you used in your example and am also trying to scrape in South Africa as opposed to the USA)

KeyError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: "Num of 'Corona' mentions"

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
in
38 fig.add_trace(go.Scatter(
39 x=time_series,
---> 40 y=result["Num of '{}' mentions".format(settings.TRACK_WORDS[0])][result['polarity']==0].reset_index(drop=True),
41 name="Neutral",
42 opacity=0.8), row=1, col=1)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)
2978 if self.columns.nlevels > 1:
2979 return self._getitem_multilevel(key)
-> 2980 indexer = self.columns.get_loc(key)
2981 if is_integer(indexer):
2982 indexer = [indexer]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: "Num of 'Corona' mentions"

Table does not exist

What is Back_Up table in line 145 app.py file in dash_app folder. How to add data in this table.

Running Main file doesn't fetch data

hey, first of all this is a great notebook, thanks for putting this together :D
So I set all credentials, set up a local mysql db and tried to run the Main jupyter notebook file. All cells ran great without any issue: checked mysql workbench and the table was created with columns under twitterDB, only thing is - no data.
The cell with:
`class Stream(tweepy.Stream):
def on_status(self, status):
print(status.text)
Stream = Stream()
myStream = tweepy.Stream()
myStream.filter (languages=['en'], track = settings.TRACK_WORDS)
is filled with the access keys and tokens, and keep on running but no data at all appears in my local db.
I also checked my twitter app and saw no requests were made out of my monthly tweet cap usage bar (elevated subscription).
Can anyone assist please?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.