Code Monkey home page Code Monkey logo

soccertweetanalysis's Introduction

SoccerTweetAnalysis

This is an assignment from "Big Data Integration and Processing" Course from Coursera. The "Big Data Integration and Processing" is the 3rd course for the Big Data Specilization.

As the Sports Analyst, you are very interested in reporting on the countries with the most popularity in Twitter. So a good way to approach this problem would be to find which countries were mentioned the most in the tweets in your dataset and to analyze what words are being used the most in these tweets.

In addition to the CSV file you just exported from MongoDB, we give you a small dataset with the codes and names of some countries. To see this additional dataset, open the following file:

Downloads/big-data-3/final-project/country-list.csv To get you started, we have prepared a Jupyter notebook template, and started a SparkSQL context for you. Please open the notebook in:

Downloads/big-data-3/final-project/SoccerTweetAnalysis.ipynb. You will use this notebook to answer the questions below. So let’s get started.

Question 1: As a Sports Analyst, you are interested in how many different countries are mentioned in the tweets. Use the Spark to calculate this number. Note that regardless of how many times a single country is mentioned, this country only contributes 1 to the total.

Question 2: Next, compute the total number of times any country is mentioned. This is different from the previous question since in this calculation, if a country is mentioned three times, then it contributes 3 to the total.

Question 3: Your next task is to determine the most popular countries. You can do this by finding the three countries mentioned the most.

Question 4: After exploring the dataset, you are now interested in how many times specific countries are mentioned. For example, how many times was France mentioned?

Question 5: Which country has the most mentions: Kenya, Wales, or Netherlands?

Question 6: Finally, what is the average number of times a country is mentioned?

soccertweetanalysis's People

Contributors

pritamd avatar khushbukp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.