Code Monkey home page Code Monkey logo

realtime-twitterdataanalysis's Introduction


RealTime-TwitterDataAnalysis

Collect and process real time twitter data to analyse popularity of tweets with specific keywords or hashtags , visualize important metrics, generate twitter networks and map tweets and trends geographically.


Analysing Twitter Data Can be useful in a wide variety of fields like:

  • In the industry it can be used in Marketing and Product Analysis to improve upon an organization's business decisions.
  • It can be used to Measure public opinions which can serve to gauge mood of people in important topics of interest such as political or social events.
  • Further , it can be used for Clustering Behavioral Groups by identifying conversation spheres , patterns in behaviour of diferent subsections of the society and also the bridges or major influencers.



  • Show Real Time Plot Of Tweet Volumes and Proportion Of tweets mentioning either keyword

Total Volume Of tweets ---- Average Mentions



  • Graph tweet sentiment by performing NLP

Tweet Sentriment



  • Node Networks for Retweets And Replied Tweets | Force Directed Circular Layout Showing more Retweeted user Larger in Size



node-network.png            Circular-layout-reply-network



Functionality

  • Stream tweets containing specific keywords in real time
  • Show volume metrics for selected tweets in real time
  • Filter tweets by any time window
  • Plot the prevalance of tweets regarding a particular topic of interest.
  • Perform Sentiment analysis of tweets based on keyword and chart them in real time
  • Twitter Networks Graphing follow , favorited , retweet and reply networks.
  • Analyse Twitter Networks and discover important nodes that are influencers or conversation bridges.
  • Display Tweets Geographically on a Map.

Domains

Badge  Badge  Badge  Badge  Badge  Badge


Tech Stack

Badge  Badge  Badge  Badge  Badge  Badge  Badge



Setup Locally

  • Clone the repository on your local machine.
    git clone https://github.com/kaustav202/RealTime-TwitterDataAnalysis.git
  • Go into the cloned directory
  • Run pip install -r requirements.txt to install all the dependencies.
  • Create a developer account on twitter: https://developer.twitter.com/en
  • Get your Twitter API credentials and replace the placeholders in twitter_config.py.
    • Go to the Twitter Developer Portal Projects & Apps page at https://developer.twitter.com/en/portal/projects-and-apps
    • Find the API/consumer key and secret under the Consumer Keys section of the Keys and Tokens tab of your app
    • Your account's access token and secret for your app can be found under the Authentication Tokens section of the Keys and Tokens tab of your app
  • From inside the app/ folder, you can run python stream.py which adds(streams) the tweets into tweets.json
  • Run python main.py which is the application entry point preferably after some time so that you have more tweets to perform the analysis.
  • You can also perform Sentiment Analysis by running python sentiment_analysis.py and draw tweet network graphs by running python tweet_network.py
  • Remember that the streaming (writing) of tweets is a completely independent step that needs to be performed initially by running stream.py

Data Format

The data recieved from twitter stream api is in a json format

Important Module and Object Structures

The Overall Structure Of the Project twitter-Info-Structure.png

How to get started with contributions

Steps To Contribute

  • Fork this Repository.

  • Clone the Repository: git clone "url of this repo"

  • Check existing issues or raise a new issue of your own and ask it to be assigned to you.

  • Wait for the issue to be assigned to you.

  • Create a branch: git checkout -b <your-new-branch-name>

  • Put your code :-

    • Make all necessary changes or modifications to the code in your local cloned branch.
    • Neccessary information like functionalities, screenshots, working video(if required) should be kept handy (you will need to present it when submitting the PR)
  • Push changes to gitHub ( on your forked repo ) : git push origin <add-your-branch-name>

  • Create a new pull request to the original repo ( main branch of this repo )

  • Submit your changes for review.

  • And Boom! You're done 🥳

  • The maintainers will review and merge your changes into the main branch of this project. You will be automatically notified via E-mail once the changes have been merged.

Note : If you want your changes to count towards hacktoberfest ensure that the issue you are working on has #hacktoberfest label


GitHub release

GitHub pull-requests merged   GitHub branches  Maintenance     Maintainer   GitHub license

GitHub forks  GitHub stars  GitHub issues  GitHub contributors

Contributors 📑

kaustav202/
kaustav202
SegFal/
SegFal
Rishav
Rishav Mitra
jatin00000/
jatin00000
Manan
Manan Garg
Gokulakrishnan
Gokulakrishnan Shankar

Project Maintainers ⚡


Happy Contributing! 🧡

forthebadge

Star Mark this repository and keep contributing as you learn!!

realtime-twitterdataanalysis's People

Contributors

coder-manan avatar github-actions[bot] avatar gokullan avatar jatin00000 avatar kaustav202 avatar rishav-12 avatar segfal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

realtime-twitterdataanalysis's Issues

Suggestion

  • I'm submitting a ...

    • bug report
    • [*] feature request
    • change/modification
    • design issue
  • Do you want to request a feature or report a bug?
    Maybe add a sample file with tweets (sample tweets.json) because it takes time to generate tweets.

re-structure project files

  • I'm submitting a ...

    • bug report
    • feature request
    • change/modification
    • design issue
  • Do you want to request a feature or report a bug?

  • What is the current behavior?

All the project files are currently at the repository root

  • If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem

  • What is the expected behavior?

Only files related to meta-data or contributing should be at the root while actual project files in separate folders

  • What is the motivation / use case for changing the behavior?

Establish proper structure and hierarchy for easy use and future expansion of the project

  • Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, eg. stackoverflow, gitter, etc)

Import issues

  • I'm submitting a ...

    • [ -] bug report
    • feature request
    • change/modification
    • [ -] design issue
  • Do you want to request a feature or report a bug?

Bug

  • What is the current behavior?

issues with imports in tweet_network.py and sentiment_analysis.py

Improve documentation [ Set-up and Usage ]

  • I'm submitting a ...

    • bug report
    • feature request
    • change/modification
    • design issue
  • Do you want to request a feature or report a bug?

  • What is the current behavior?
    There are a few typos/grammatical errors in the README file. Also I think if some points in the setup steps are elaborated, it will be useful to people using the project

  • If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem

  • What is the expected behavior?
    I think there should be some more details, especially on how to setup to use the project locally, including the Twitter API setup

  • What is the motivation / use case for changing the behavior?
    Helping people who use the project by improving the documentation in the README

  • Please tell us about your environment:

    • Version:
    • Browser: [all | Chrome XX | Firefox XX | IE XX | Safari XX | Mobile Chrome XX | Android X.X Web Browser | iOS XX Safari | iOS XX UIWebView | iOS XX WKWebView ]
    • Language: [all | TypeScript X.X | ES6/7 | ES5 | Dart]
  • Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, eg. stackoverflow, gitter, etc)

Doc Update [ Tweet JSON Structure ]

  • I'm submitting a ...

    • bug report
    • feature request
    • change/modification
    • design issue
  • Do you want to request a feature or report a bug?

  • What is the current behavior?

  • If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem

  • What is the expected behavior?

Add a clear diagram showing the structure of the JSON response / tweet statuses received from the Twitter API

  • What is the motivation / use case for changing the behavior?

The whole project revolves around utilizing the tweets (json) for analysis and experimental processes, hence it's important to highlight the structure, hierarchy and fields of the respective data.

  • Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, eg. stackoverflow, gitter, etc)

Improve plots style

  • I'm submitting a ...

    • bug report
    • feature request
    • change/modification
    • design issue
  • Do you want to request a feature or report a bug?

Make plots more attractive

  • What is the current behavior?

Current plots are generated as simple matplotlib plots. An improvement to their look and feel would be good and will likely increase their visibility

Doc Update [ Important data formats ]

  • I'm submitting a ...

    • bug report
    • feature request
    • change/modification
    • design issue
  • What is the current behavior?

There are no references for a lot of the main data structures utilized in various modules of the application for analytics and processing. Also since all of the functionality is based on the information contained in these units it's essential to have a close look at them including the structure / hierarchy / fields / types etc. Mainly the extracted or mined data from raw json as it is the main source.

  • What is the expected behavior?

A diagrammatic or chart like representation of all the primary DS used in different modules for analysis. If transformation has been done then need to show those too.

  • What is the motivation / use case for changing the behavior?

Will be helpful in better understanding the project and gaining familiarity with it.

Doc Update [ Badges for Domain and Tech Stack ]

  • I'm submitting a ...

    • bug report
    • feature request
    • change/modification
    • design issue
  • Do you want to request a feature or report a bug?

  • What is the current behavior?

The Domain and Tech Stack sections of the readme is currently formatted as lists which consumes a lot of space

  • If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem

  • What is the expected behavior?

If graphic labels or badges in form of svg embeds were used instead, it will be a lot more concise and small ( horizontally laid out )

  • What is the motivation / use case for changing the behavior?

It will allow the focus of reader to be drawn on the important parts such as usage and technical aspects instead.

  • Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, eg. stackoverflow, gitter, etc)

Some useful references

https://github.com/alexandresanlim/Badges4-README.md-Profile

https://github.com/Naereen/badges

https://github.com/badges/shields

https://shields.io/

automate main.py run through tweets written

  • I'm submitting a ...

    • bug report
    • feature request
    • change/modification
    • design issue
  • What is the current behavior?

  • What is the expected behavior?

We are currently having two independent steps in running the application

  • run the stream.py to start generating tweets / statuses in tweets.json file fetched from the twitter API
  • once a significant number of tweets are accumulated, run the main.py file

We need to move towards a single entry point, ensuring that the main module is started automatically.

How it can be tackled :

  • count the number of tweets written explicitly and trigger a main file run once a certain no. is reached.
  • keep monitoring the tweets.json file written for no. of lines written (since each status/tweet is written in new line ) using a continuous loop until the desired number is reached then exit and let it continue with execution of the main file

Or if there are better alternatives then please suggest

Integrate sentiment_analysis.py with main

  • I'm submitting a ...

    • bug report
    • feature request
    • change/modification
    • design issue
  • Do you want to request a feature or report a bug?

The sentiment analysis functionality needs to run as part of main ( through import )

  • What is the current behavior?

Import in main is commented out due to error. Auxiliary module sentiment_analysis import causing error

  • What is the expected behavior?

Generate sentiment plots without errors. Possibly create two separate functions for sentiment score generation and plot generation in
sentiment_analysis.py inside a class based and call them by importing in main.

  • Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, eg. stackoverflow, gitter, etc)

Note : ds_twets does not need to be generated again in sentiment_analysis as it is already in main.py and also converted to time-series, so it can be utilized for time-series plotting

api_key not defined error while running stream.py

  • I'm submitting a ...

    • bug report
    • feature request
    • change/modification
    • design issue
  • Do you want to request a feature or report a bug?

  • What is the current behavior?

unable to find the imported variable from twitter_config ( api keys and secrets )

  • If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem

run python stream.py

  • What is the expected behavior?

No error

  • What is the motivation / use case for changing the behavior?

  • Please tell us about your environment:

    • Version: Python 3
  • Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, eg. stackoverflow, gitter, etc)

incomplete networks in tweet_networks.py

  • I'm submitting a ...

    • bug report
    • feature request
    • change/modification
    • design issue
  • Do you want to request a feature or report a bug?

Missing features in tweet_networks.py

  • What is the current behavior?

  • What is the expected behavior?

Need to add network plots for all the tweet categories.. ie retweets, quotes, replies.

Add tweet logger

  • I'm submitting a ...

    • bug report
    • feature request
    • change/modification
    • design issue
  • Do you want to request a feature or report a bug?

Continuously log the number new tweets that keep being written to the tweets.json file with run of stream.py

Function Not found error

  • I'm submitting a ...

    • bug report
    • feature request
    • change/modification
    • design issue
  • Do you want to request a feature or report a bug?

Bug

  • What is the current behavior?
    sentiment analysis does not work due to functions not being imported from the correct files

  • If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem

run sentiment_analysis.py

  • What is the expected behavior?
    It should plot a graph of the sentiment after analysing it.

requirements.txt file is missing

from: Run pip install -r requirements.txt to install all the dependencies.

problem: requirements.txt file is missing from repo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.