Code Monkey home page Code Monkey logo

precogsummer17-task's Introduction

PreCog Summer Task A- US Elections textual analysis with twitter

Project link Please wait few minutes for the website to fully load.

The app is built with Flask. All subtasks are carried out in the backend(Querying the database) and the data is sent to the frontend as a json upon a AJAX request.The data is later visualised using charts plotted using chartist.js.Below is the set of subtasks with description of how I attempted to solve them,their drawbacks and scope for improvement.

  • Collection of tweets:

    The tweets are collected using Twython and the Twitter Streaming APIs. Refrences used:PSOSM,Twitter Streaming API documentation

  • Location of Tweets:

    The Location of the user is taken from the coordinates field in tweet object and plotted using leaflat.js.The coordiantes are stored in a geoJSON format and sent to leaflat to mark them on the map. Since,most of the users do not enable geo location,the results would not be useful for analysis.

    An approximate location can be obtained by converting textual user's location(location specified at time of account creation) to coordinates using geocoding libraries like geocoder and plotting the results with leaflat. Even though this method produced good results ,it was slow and this obviously failed to recognise imaginary user locations like 'Heaven','Hell','Somewhere I Belong',etc.

    Refrences used:Geolocation

  • List of Top 10 Hashtags being used in the stream:

    Hashtags of a tweet can be accessed under tweet['entities']['hashtags'] or with a regex pattern match.Each hashtag is encoded in UTF-8 and converted to lowercase and stored in a python dictionary which also stores its frequency.This is later converted to a list of tuples and sorted in decreasing order of frequency to get top 10 tweets.

  • Distribution of Original Tweets vs Retweeted Tweets:

    Retweets are found by searching for the word RT or an ellipsis “…” in the tweet. Their distribution vs original tweets is then plotted using chartist.js.

  • Distribution of favorite counts on Original Tweets:

    Since the tweets are streamed live ,Newly created tweets have their expected zero favorites count. An alternate approach can be adopted by finding the present favorite count of the tweet by searching by its id using Twitter rest API.But this method cannot be performed dynamically on the backend as it slows the application.

  • Distribution of Type of Tweet i.e. Text, Image, Text+Image:

    A tweet's text is assumed to be what's left after removing mentions,urls and hashtags from a tweet.Using urlparse urls are identified.Similarly,mentions and hashtags are identified.The length of remaining text is calculated to find out whether tweet contains text.Images are checked by finding the length of media field under entities - len(tweet['entities'].get('media', [])) >0

    Refrences used:stackoverflow

  • Who is more popular, Hillary or Trump?:

    The popularity is assumed to be sum of number of Unique user tweets,Retweets in popular hashtags -#trump,#maga,#draintheswamp,#trumppence2016,#clinton,#hillary,#imwithher,#strongertogether and number of tweets with mentions - @realDonaldTrump,@HillaryClinton

    A tweet with hashtags like '#trump','#clinton',etc can be used to support, criticize and make rude or sarcastic statements. A better analysis can be obtained by doing a sentiment analysis to get a sense of tweet(whether positive or negative) also taking sarcasm and not jokes into account.This can help predict popularity with more accuracy.

precogsummer17-task's People

Contributors

dependabot[bot] avatar gowtham1997 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.