Code Monkey home page Code Monkey logo

scraping-ecommerce-websites's Introduction

How to Scrape E-Commerce Websites With Python

Oxylabs promo code

Scraping e-commerce websites is easy with Python and Oxylabs E-Commerce Scraper API. Follow this quick guide to build your e-commerce scraper while utilizing a 1-week free API trial.

See this blog post for a complete tutorial with detailed insights and images.

1. Project setup

First, you’ll have to install Python. Please download it from here.

2. Install dependencies

pip install bs4 requests

3. Import libraries

import requests
from bs4 import BeautifulSoup
from pprint import pprint

4. Retrieve API credentials

Next, you’ll have to log in to your Oxylabs account to retrieve API credentials. If you don’t have an account yet, you can simply sign up for a free trial and go to the dashboard. There, you’ll get the necessary credentials to replace the USERNAME and PASSWORD variables in the code below:

username, password = 'USERNAME', 'PASSWORD'

5. Prepare the payload

url = "https://sandbox.oxylabs.io/products"
payload = {
    'source': 'universal_ecommerce',
    'render': 'html',
    'url': url,
}

6. Send a POST request to the API

response = requests.post(
    'https://realtime.oxylabs.io/v1/queries',
    auth=(username, password),
    json=payload,
)
print(response.status_code)

7. Parse Data

content = response.json()["results"][0]["content"]
soup = BeautifulSoup(content, "html.parser")

Let's use CSS selectors to select specific elements in the HTML file of https://sandbox.oxylabs.io/products, which you can inspect via Developer Tools on your web browser:

Title

title = soup.find('h4', {"class": "title"}).get_text(strip=True)

Price

price = soup.find('div', {"class": "price-wrapper"}).get_text(strip=True)

Availability

availability = soup.find('p', {"class": ["in-stock", "out-of-stock"]}).get_text(strip=True)

All products

data = []
for elem in soup.find_all("div", {"class": "product-card"}):
    title = elem.find('h4', {"class": "title"}).get_text(strip=True)
    price = elem.find('div', {"class": "price-wrapper"}).get_text(strip=True)


    availability = elem.find('p', {"class": ["in-stock", "out-of-stock"]}).get_text(strip=True)
    data.append({
        "title": title,
        "price": price,
        "availability": availability,
    })
pprint(data)

Full source code

import requests
from bs4 import BeautifulSoup
from pprint import pprint


username, password = 'USERNAME', 'PASSWORD'
url = "https://sandbox.oxylabs.io/products"

payload = {
    'source': 'universal_ecommerce',
    'render': 'html',
    'url': url,
}
response = requests.post(
    'https://realtime.oxylabs.io/v1/queries',
    auth=(username, password),
    json=payload,
)
print(response.status_code)


content = response.json()["results"][0]["content"]
soup = BeautifulSoup(content, "html.parser")


data = []
for elem in soup.find_all("div", {"class": "product-card"}):
    title = elem.find('h4', {"class": "title"}).get_text(strip=True)
    price = elem.find('div', {"class": "price-wrapper"}).get_text(strip=True)


    availability = elem.find('p', {"class": ["in-stock", "out-of-stock"]}).get_text(strip=True)
    data.append({
        "title": title,
        "price": price,
        "availability": availability,
    })
pprint(data)

By using the techniques described in this article, you can perform large-scale web scraping on websites that employ bot protection and CAPTCHAs.

Output

200
[
  {
    "title": "The Legend of Zelda: Ocarina of Time",
    "price": "91,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "Super Mario Galaxy",
    "price": "91,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "Super Mario Galaxy 2",
    "price": "91,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "Metroid Prime",
    "price": "89,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "Super Mario Odyssey",
    "price": "89,99 €",
    "availability": "In stock"
  },
  {
    "title": "Halo: Combat Evolved",
    "price": "87,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "The House in Fata Morgana - Dreams of the Revenants Edition -",
    "price": "83,99 €",
    "availability": "In stock"
  },
  {
    "title": "NFL 2K1",
    "price": "62,99 €",
    "availability": "In stock"
  },
  {
    "title": "Uncharted 2: Among Thieves",
    "price": "88,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "Tekken 3",
    "price": "91,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "The Legend of Zelda: The Wind Waker",
    "price": "90,99 €",
    "availability": "In stock"
  },
  {
    "title": "Gran Turismo",
    "price": "86,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "Metal Gear Solid 2: Sons of Liberty",
    "price": "88,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "Grand Theft Auto Double Pack",
    "price": "81,99 €",
    "availability": "In stock"
  },
  {
    "title": "Baldur's Gate II: Shadows of Amn",
    "price": "91,99 €",
    "availability": "In stock"
  },
  {
    "title": "Tetris Effect: Connected",
    "price": "88,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "The Legend of Zelda Collector's Edition",
    "price": "89,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "Gran Turismo 3: A-Spec",
    "price": "84,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "The Legend of Zelda: A Link to the Past",
    "price": "90,99 €",
    "availability": "In stock"
  },
  {
    "title": "The Legend of Zelda: Majora's Mask",
    "price": "91,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "The Last of Us",
    "price": "92,99 €",
    "availability": "In stock"
  },
  {
    "title": "Persona 5 Royal",
    "price": "84,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "The Last of Us Remastered",
    "price": "92,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "The Legend of Zelda: Ocarina of Time 3D",
    "price": "90,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "Chrono Cross",
    "price": "88,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "Gears of War",
    "price": "84,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "Sid Meier's Civilization II",
    "price": "88,99 €",
    "availability": "In stock"
  },
  {
    "title": "Halo 3",
    "price": "81,99 €",
    "availability": "In stock"
  },
  {
    "title": "Ninja Gaiden Black",
    "price": "88,99 €",
    "availability": "In stock"
  },
  {
    "title": "Super Mario Advance 4: Super Mario Bros. 3",
    "price": "89,99 €",
    "availability": "Out of Stock"
  },
  {
    "title": "Jet Grind Radio",
    "price": "83,99 €",
    "availability": "In stock"
  },
  {
    "title": "Grim Fandango",
    "price": "91,99 €",
    "availability": "Out of Stock"
  }
]

scraping-ecommerce-websites's People

Contributors

augustoxy avatar oxylabsorg avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.