Code Monkey home page Code Monkey logo

ranjan-mohanty / amazon-product-details-scraper Goto Github PK

View Code? Open in Web Editor NEW
11.0 1.0 4.0 107 KB

Scrape Amazon like a pro! Grab titles, descriptions, and download high-quality images. Save everything as organized JSON files for easy analysis. Works with single URLs or scrape entire product lists

License: MIT License

Python 100.00%
amazon amazon-product-image-download amazon-product-scraper scraper amazon-scraper amazon-scraping image-download python

amazon-product-details-scraper's Introduction

Amazon Product Details Scraper

GitHub License GitHub Release PyPI - Version Downloads GitHub forks GitHub Repo stars

GitHub Actions Workflow Status Codacy Badge OpenSSF Scorecard GitHub Issues or Pull Requests Libraries.io dependency status for GitHub repo

This script helps you scrape product details from Amazon product pages. It extracts information like title, description, and image URLs, saving them to JSON files.

Features

  • Fetches product details from a single Amazon product URL or a list of URLs in a file.
  • Writes extracted data to JSON files for easy storage and processing.
  • Optionally downloads product images along with details.

Installation

Requirements:

  • Python 3 (tested with 3.7+)
  • Libraries:
    • requests
    • beautifulsoup4
    • urllib3

Instructions:

  1. Make sure you have Python 3 installed. You can check by running python3 --version in your terminal.

  2. Create a virtual environment (recommended):

    • Virtual environments help isolate project dependencies and avoid conflicts with other Python installations on your system.

    • Here's how to create a virtual environment using venv:

      python3 -m venv my_env  # Replace "my_env" with your desired environment name
    • Activate the virtual environment:

      source my_env/bin/activate
  3. Install:

    python3 setup.py install

    This will automatically download and install the necessary libraries based on the specifications within the activated virtual environment.

Usage

Basic Usage:

amazon-scraper --url https://www.amazon.com/product-1  # Replace with your product URL

This will scrape details from the provided Amazon product URL and write them to a JSON file in the "output" directory (default).

Using a URL List:

  1. Create a text file containing a list of Amazon product URLs (one per line).
  2. Run the script with the --url-list option and provide the file path:
amazon-scraper --url-list product_urls.txt

This will process each URL in the file and save the scraped details for each product in separate directories within "output".

Optional: Downloading Images:

amazon-scraper --url https://www.amazon.com/product-1 --download-image

The --download-image flag enables downloading product images along with other details.

Getting Help:

The script offers a built-in help message that provides a quick overview of available options and usage instructions. To access the help, run the script with the --help option:

amazon_scraper --help

Configuration

Logging:

  • The script uses basic logging for information and error messages.
  • You can modify the logging level by editing the DEFAULT_LOG_LEVEL in config.py line in the code (refer to the Python documentation for logging configuration).

Example

Scenario:

Scrape details for two products from a file named "products.txt" and download images:

  1. Create a file named "products.txt" with the following content:

    https://www.amazon.com/product-1
    https://www.amazon.com/product-2
    
  2. Run the script with the following command:

    amazon-scraper --url-list products.txt --download-image

This will process both URLs in the file, scrape details, create separate output directories for each product, and download images.

Disclaimer

This script is for educational purposes only. Please be respectful of Amazon's terms of service when using it. Consider using official APIs provided by Amazon for extensive data collection.

amazon-product-details-scraper's People

Contributors

dependabot[bot] avatar ranjan-mohanty avatar step-security-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.