Code Monkey home page Code Monkey logo

twitteranalysis_spark's Introduction

TwitterAnalysis_Spark

Analysing UC President tweets

This problem is solved in Python It is also solved in Java and Python Mapreduce approach(For Mapreduce see TwitterAnalysis_Mapreduce-)

Problewm Statement and approach is explained in PrezAnalysis.docx file

What hour of the day does @PrezOno’s tweet the most on average, using every day we have twitter data? Include a plot of the expected number of tweets for each hour of the day, for those he did tweet. For example if Ono tweeted once every day at 12:30PM, his expected number of tweets between 12 and 1 would be 1. If he alternates between 2 and 3 tweets per day, his average would be 2.5.

What day of the week does @PrezOno tweet the most on average? Use the same example as in #1 but for days of the week.

###Aim: ####Generate an output file using spark that has PrezOno tweet per hour. Use data from this output file to plot average tweets of Prezono per hour.

Approach:

• Created a RDD by reading input tweets files from HDFS.

• User defined function is passed in to this RDD. This function detects PrezOno tweets And creates another RDD containing date and time he tweeted.(This is done bt using flatmap

• Then count the total number of Days by storing days alone into separate RDD. Total tweets of PrezOno per hour is counted by creating another RDD by using reduceByKey.

• Finally average tweets per hour is calculated by dividing total tweets per hour with number of days. This is stored in separate RDD. This RDD is written into output file

Plot for output data: At 5PM (17:00) on average there are more number of tweets.

twitteranalysis_spark's People

Contributors

ranjithgangam1 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.