Code Monkey home page Code Monkey logo

web-scraper-python-library's Introduction

Web Scraping - Python Library

A Python library that lets you easily scrape data from popular websites using basic product information

Build Status codecov PyPI readthedocs

Overview

For many, collecting product data can be helpful for monitoring price changes or helping decide which e-commerce site to purchase from. However, creating a web scraper from scratch can be cumbersome and time consuming. My goal is to make it easier for people to collect product data, and this Python library aims to simplify the web scraping process. With basic inputs like product information and store url, you can have easy access to rich product information.

Installation

To install, run the following:

pip install web-scraper-python-library

Usage

Product Search

The following code will retrieve and print the product data for an iphone 12 from Amazon as a JSON object.

product: a product name, like you would put into the product search page of a company's website

company: 'eBay', 'Walmart', or 'Amazon'

Code

from web_scraper import main as m

json_product_data = m.scrape("iphone 12", "Amazon")

# write json to file
with open("amazon_product_data.json", "w") as file:
    file.write(json_product_data)

Output

[
  {
    "company": "Amazon",
    "asin": "B09HWS3VGM",
    "name": "TCL 10 5G UW 128GB Diamond Gray Smartphone (Verizon) (Renewed)",
    "price": 84.0,
    "extraction_date": "2023-04-27 16:59:56",
    "rating": 3.8,
    "num_ratings": 106.0,
    "image_url": "https://m.media-amazon.com/images/I/41e-4yZQl9L._AC_UY218_.jpg",
    "url": "https://www.amazon.com/TCL-Diamond-Smartphone-Verizon-Renewed/dp/B09HWS3VGM/ref=sr_1_42?keywords=iphone+12&qid=1682629195&sr=8-42"
  },
  ...
  {
    "company": "Amazon",
    "asin": "B0BS986JRZ",
    "name": "QIMHAI Smartphone Unlocked Cell Phones S22 Ultra 6.1in HD Screen Cheap Phones 2GB/16GB Android 10 Straight Talk Phone 5000mAh 128GB Extension Dual Sim Boost Mobile Phones Telefonos (Gold)",
    "price": 79.99,
    "extraction_date": "2023-04-27 16:59:56",
    "rating": 1.9,
    "num_ratings": 6.0,
    "image_url": "https://m.media-amazon.com/images/I/71fa-n5E69L._AC_UY218_.jpg",
    "url": "https://www.amazon.com/QIMHAI-Smartphone-Unlocked-Extension-Telefonos/dp/B0BS986JRZ/ref=sr_1_43?keywords=iphone+12&qid=1682629195&sr=8-43"
  }
]

web-scraper-python-library's People

Contributors

keirkeenan avatar

Stargazers

 avatar  avatar  avatar

Forkers

nickbohm555

web-scraper-python-library's Issues

Create a parse_rating function

Create a parse_rating function that takes in a rating string as input, parses the string, and returns the parsed rating as a float.

Example

Running the following should return 4.5.

print(parse_rating("4.5 out of 5 stars"))

Important Notes

  • use the function in the scrape_amazon function where the rating variable is called
  • update the README.md to show the rating as a float (update the docs/source/README.md too)
  • write a test for the function in web_scraper/tests/test_all.py

Create a scrape_all function

Create a scrape_all function that takes in a product as input, and returns a single JSON output containing the product data from all available websites (eBay, Walmart, Amazon).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.