Comments (11)
@AlbinLindskog is this something you want to share? :D
Yeah, sure. It's not much to share though.
import json
import requests
from time import sleep
data = []
headers = {
'baseURL': 'https://api-systembolaget.azure-api.net/sb-api-ecommerce/v1',
'User-Agent': 'curl/7.54.0', # No idea why the default 'python-requests/2.28.2' doesn't work. Bad scraping protection?
}
try:
for i in range(0, 1000):
response = requests.get(f'https://www.systembolaget.se/api/gateway/productsearch/search/?page={i}', headers=headers)
data.extend(response.json()['products'])
sleep(1)
finally:
with open('data.json', 'w', encoding='utf-8') as file:
json.dump(data, file, ensure_ascii=False, indent=4)
from systembolaget-api.
I ended up pulling all the info I wanted from https://www.systembolaget.se/api/gateway/productsearch/search/.
It has the same functionality, but doesn't require any API-keys. Just a 'baseURL: https://api-extern.systembolaget.se/sb-api-ecommerce/v1' header.
from systembolaget-api.
Hi, Albin.
Does this method still work? Im getting a 401 error when sending GET
requests to https://www.systembolaget.se/api/gateway/productsearch/search/ with set baseUrl
header.
* Preparing request to https://www.systembolaget.se/api/gateway/productsearch/search/?page=1&size=30&sortBy=Score&sortDirection=Ascending&categoryLevel1=%C3%96l
* Current time is 2023-01-04T18:32:29.559Z
...
> GET /api/gateway/productsearch/search/?page=1&size=30&sortBy=Score&sortDirection=Ascending&categoryLevel1=%C3%96l HTTP/2
> Host: www.systembolaget.se
> user-agent: insomnia/2021.7.2
> baseurl: https://api-extern.systembolaget.se/sb-api-ecommerce/v1
> accept: */*
< HTTP/2 401
Been trying to scrape /sortiment
with selenium for a couple of days but its... finicky, to say the least.
Regards,
from systembolaget-api.
Hi, Albin.
Does this method still work? Im getting a 401 error when sending
GET
requests to https://www.systembolaget.se/api/gateway/productsearch/search/ with setbaseUrl
header.* Preparing request to https://www.systembolaget.se/api/gateway/productsearch/search/?page=1&size=30&sortBy=Score&sortDirection=Ascending&categoryLevel1=%C3%96l * Current time is 2023-01-04T18:32:29.559Z ... > GET /api/gateway/productsearch/search/?page=1&size=30&sortBy=Score&sortDirection=Ascending&categoryLevel1=%C3%96l HTTP/2 > Host: www.systembolaget.se > user-agent: insomnia/2021.7.2 > baseurl: https://api-extern.systembolaget.se/sb-api-ecommerce/v1 > accept: */* < HTTP/2 401
Been trying to scrape
/sortiment
with selenium for a couple of days but its... finicky, to say the least.Regards,
It looks like they've changed the baseURL.
curl 'https://www.systembolaget.se/api/gateway/productsearch/search/' -H 'baseURL: https://api-systembolaget.azure-api.net/sb-api-ecommerce/v1'
works now.
from systembolaget-api.
did you manage to solve this issue with the script or did you go your own way @AlbinLindskog ?
from systembolaget-api.
did you manage to solve this issue with the script or did you go your own way @AlbinLindskog ?
I went my own way; a simple for loop that steps through all the pages on the url above and dumps the response into a .json file. Neither pretty nor robust, but I only needed to pull the data once, so it's good enough.
from systembolaget-api.
@AlbinLindskog is this something you want to share? :D
from systembolaget-api.
@AlbinLindskog Many thanks I really appreciate it!
from systembolaget-api.
@AlbinLindskog Did they change the baserurl again?
from systembolaget-api.
@AlbinLindskog Did they change the baserurl again?
Yep, looks like it. They're using a 'Ocp-Apim-Subscription-Key'-header now instead.
curl 'https://api-extern.systembolaget.se/sb-api-ecommerce/v1/productsearch/search?page=1' -H 'Ocp-Apim-Subscription-Key: cfc702aed3094c86b92d6d4ff7a54c84'
I don't know how long it's valid for, but you can get a new one simply by inspecting the network traffic when you make a search query in your browser on their site.
from systembolaget-api.
I've gotten around to fix the code now. Thanks for helping out! Basically one will now have to identify the main app JS bundle from the HTML page, fetch that and from there find the current API key. They seem to change it every now and then. It's already updated since December.
Note; I haven't fixed the stores assortment yet. They seem to have replaced that with a search functionality.
Closing 🥳
from systembolaget-api.
Related Issues (12)
- Sort the items in the assortment struct HOT 1
- Continuously write and commit output HOT 1
- Handle graceful shutdown HOT 1
- Requests for data creates duplicates
- Remove format flags in favor of format based on file extension HOT 1
- Implement support for outputting SQLite databases HOT 1
- Deduplicate struct definitions by implementing custom marshaling HOT 1
- Can no longer output to STDOUT HOT 1
- Unicode entities not interpreted HOT 1
- Stores item in assortment null when converting HOT 2
- Pagination doesn't work correctly HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from systembolaget-api.