The ta_scraper_and_analysis from katebulaeva

TRIPADVISOR SCRAPING & ANALYSIS

This project is composed of 2 features:

The TA scraper, that got restaurants data for given cities, curated them and aggregated them all together from separate csv files to form a dataset (Jupyte Notebook & Python files)
The Analysis Notebooks to explore the dataset and create visualizations
The TA scraper, that got restaurants data for given cities, curated them and aggregated them all together from separate csv files to form a dataset (Jupyte Notebook & Python files) ;
The Analysis Notebooks to explore the dataset and create visualizations and stories, using the Matplotlib, Seaborn and Bokeh libraries.

The dataset has been uploaded on Kaggle, feel free to create new kernels and share your results: https://www.kaggle.com/damienbeneschi/krakow-ta-restaurans-data-raw

TA SCRAPER

"TA_Scraper.py" is the command line Python script version take takes as an argument the name of a city using '-c cityname'. "1.TA_scraper" is the Jupyter Notebook that contains the scripts for scraping data, scraping the European capitals restaurants info, curate the raw datasets (raw.csv files) and aggregate all the curated datasets into one.

The scraper creates a .csv file named "TA_cityname_restaurants_raw.csv" for each city that contains scraped data from TA about restaurants, separated by commas. The information are taken from the HTML code of pages such as https://www.tripadvisor.com/RestaurantSearch-g274772

The header of the csv file contains for restaurants (in random order):

Name
TA_ID
TA_URL
Ranking
Rating
Cuisine style (in a list object)
Number of reviews
2 reviews (list of 2 lists: one with 2 reviews, one with the dates of the 2 reviews)

Each curated dataset is saved as "TA_cityname_restaurants_curated.csv" in the current directory. The aggregated dataset is saved as "TA_restaurants_curated.csv". It contains the restaurants information for all the cities that have been scraped, curated in order to be able to analyse them and take interesting information out of them.

ANALYSIS AND VISUALIZATION

The "2.Analysis.ipynb" notebook contains the analysis carried out in order to answer several questions about the dataset, such as:

What is the Price Range repartition per city?
What are the best city for people that have a special diet (gluten, etc.) ?
What are the common point between the top restaurants ?
an many others.

INTERACTIVE VISUALIZATION WITH BOKEH

Interactive visualization have been prodced in order to have insights of the dataset, on a global way.

An aggregated dataset have been produced at the scale of the cities. This allowed to

The Visuaization have been embeded to the notebook called "3.Analysis_Bokeh.ipynb", that contains:

a scatter plot with y=number of reviews , x=ranking, color=price range
a Bokeh App running on a Bokeh Server that allows to filter the previous plot accordong to the cities and rates selected thanks to a checkbox menu
2 linked categorical scatter plots that displays horizontaly the price range and the rate with x=number of reviews
A Bokeh App running on a Bokeh Server that allows to visualize for each city, selected by a dropdown menu, the repartition of the restaurants according to their price range and rate
An interactive map that displays the cities analysied in this dataset, showing the number of restaurants and reviews for the city when hovering it.

katebulaeva / ta_scraper_and_analysis Goto Github PK

ta_scraper_and_analysis's Introduction

ta_scraper_and_analysis's People

Contributors

Stargazers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent