scraper for the website books.toscrape.com
Concept | Version |
---|---|
Docker | 20.10 |
Git | 2.37 |
- Clone this repository
git clone https://github.com/jxlil/scraper-books-toscrape.com.git
cd scraper-books-toscrape.com
- Deploy scraper with
docker compose
docker compose up -d
- Check scraper status
curl http://0.0.0.0:3000/v1/scrape/
# expected output: {"status":"running"}
Now you can run this scraper to get all the books in the categories:
To run the scraper you can do the following request:
curl --silent http://0.0.0.0:3000/v1/scrape/books
In the exports/
directory you will find a JSON
and CSV
file with the results.
You can also use jq and save the result in JSON
format in any file
curl --silent http://0.0.0.0:3000/v1/scrape/books | jq . > books.json
{
"book_count": 1,
"categories": [
"Biography",
],
"books": [
{
"title": "The Rise of Theodore Roosevelt (Theodore Roosevelt #1)",
"category": "Biography",
"available_stock": 3,
"price": 42.57,
"num_stars": 3,
"upc_code": "1a5044d233936b1a",
"image_url": "http://books.toscrape.com/media/cache/4c/09/4c090b85892f532210e44d84b752b64d.jpg"
}
]
}