Code Monkey home page Code Monkey logo

scraping-shopee's Introduction

1. Introduction

This mini project aims to crawl basic data of products in the main categories of Shopee Vietnam using Scrapy, a powerful web sccraping framework in Python. This website are dynamically generated. Website data is pulled from Shopee API (endpoints are tracked via browser's developer tools).

2. Technology used

  • Python
  • Scrapy

3. Project Specifications

Level-1 categories(Level-1 categories)

  • Product data from main (level-1) categories and its children (level-2) categories are extracted into csv file.
    • Data from each level-2 category is extracted from a different API endpoint.
  • Output csv path: scraper/output/{execution_timestamp}.csv.
  • Logging path: scraper/logs/{execution_timestamp}.log.
  • Output fields:
    • main_cat_name: Name of level-1 category.
    • child_cat_name: Name of level-2 category.
    • shop_name: Shop name.
    • shop_location: Shop location (city).
    • product_name: Name of product.
    • product_hist_sold: Quantity of product sold.
    • product_price_min: Price of product (minimum).
    • product_price_max: Price of product (maximum).
    • product_rating_avg: Average rating of product.
    • product_rating_cnt: Total amount of ratings.
    • product_url: URL of product.

(Note: product_price usually has many value depending on product type, therefore is indicated by 2 values min and max. In case product only has 1 price, the min and max value should be identical).

4. Usage

  1. Requirements: Python 3.8+ distribution with $PATH already set up.
  2. Go to project folder: cd path/to/scraping-shopee.
  3. Setup Python environment: make setup.
    • If you already run make setup for an initial time, then just activate the Python virtual environment: make venv.
  4. Initiate crawling: make crawl.
    • Argument:
      • parse_limit: Amount of items per level-2 category. If parse_limit exceeds 500, only 500 items are crawled per level-2 category due to API limit.
    • Example usage: make crawl parse_limit=100.

scraping-shopee's People

Contributors

minkminkk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.