Code Monkey home page Code Monkey logo

ta_scraper_and_analysis's Introduction

TRIPADVISOR SCRAPING & ANALYSIS

This project is composed of 2 features:

  • The TA scraper, that got restaurants data for given cities, curated them and aggregated them all together from separate csv files to form a dataset (Jupyte Notebook & Python files)
  • The Analysis Notebooks to explore the dataset and create visualizations
  • The TA scraper, that got restaurants data for given cities, curated them and aggregated them all together from separate csv files to form a dataset (Jupyte Notebook & Python files) ;
  • The Analysis Notebooks to explore the dataset and create visualizations and stories, using the Matplotlib, Seaborn and Bokeh libraries.

The dataset has been uploaded on Kaggle, feel free to create new kernels and share your results: https://www.kaggle.com/damienbeneschi/krakow-ta-restaurans-data-raw

TA SCRAPER

"TA_Scraper.py" is the command line Python script version take takes as an argument the name of a city using '-c cityname'. "1.TA_scraper" is the Jupyter Notebook that contains the scripts for scraping data, scraping the European capitals restaurants info, curate the raw datasets (raw.csv files) and aggregate all the curated datasets into one.

The scraper creates a .csv file named "TA_cityname_restaurants_raw.csv" for each city that contains scraped data from TA about restaurants, separated by commas. The information are taken from the HTML code of pages such as https://www.tripadvisor.com/RestaurantSearch-g274772

The header of the csv file contains for restaurants (in random order):

  • Name
  • TA_ID
  • TA_URL
  • Ranking
  • Rating
  • Cuisine style (in a list object)
  • Number of reviews
  • 2 reviews (list of 2 lists: one with 2 reviews, one with the dates of the 2 reviews)

Each curated dataset is saved as "TA_cityname_restaurants_curated.csv" in the current directory. The aggregated dataset is saved as "TA_restaurants_curated.csv". It contains the restaurants information for all the cities that have been scraped, curated in order to be able to analyse them and take interesting information out of them.

ANALYSIS AND VISUALIZATION

The "2.Analysis.ipynb" notebook contains the analysis carried out in order to answer several questions about the dataset, such as:

  • What is the Price Range repartition per city?
  • What are the best city for people that have a special diet (gluten, etc.) ?
  • What are the common point between the top restaurants ?
  • an many others.

INTERACTIVE VISUALIZATION WITH BOKEH

Interactive visualization have been prodced in order to have insights of the dataset, on a global way.

An aggregated dataset have been produced at the scale of the cities. This allowed to

The Visuaization have been embeded to the notebook called "3.Analysis_Bokeh.ipynb", that contains:

  • a scatter plot with y=number of reviews , x=ranking, color=price range
  • a Bokeh App running on a Bokeh Server that allows to filter the previous plot accordong to the cities and rates selected thanks to a checkbox menu
  • 2 linked categorical scatter plots that displays horizontaly the price range and the rate with x=number of reviews
  • A Bokeh App running on a Bokeh Server that allows to visualize for each city, selected by a dropdown menu, the repartition of the restaurants according to their price range and rate
  • An interactive map that displays the cities analysied in this dataset, showing the number of restaurants and reviews for the city when hovering it.

ta_scraper_and_analysis's People

Contributors

dambeneschi avatar

Stargazers

Ekaterina Bulaeva avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.