Code Monkey home page Code Monkey logo

Comments (11)

AlbinLindskog avatar AlbinLindskog commented on June 28, 2024 1

@AlbinLindskog is this something you want to share? :D

Yeah, sure. It's not much to share though.

import json
import requests
from time import sleep

data = []
headers = {
    'baseURL': 'https://api-systembolaget.azure-api.net/sb-api-ecommerce/v1',
    'User-Agent': 'curl/7.54.0',  # No idea why the default 'python-requests/2.28.2' doesn't work. Bad scraping protection?
}

try:
  for i in range(0, 1000):
	response = requests.get(f'https://www.systembolaget.se/api/gateway/productsearch/search/?page={i}', headers=headers)
	data.extend(response.json()['products'])
	sleep(1)
finally:
  with open('data.json', 'w', encoding='utf-8') as file:
      json.dump(data, file, ensure_ascii=False, indent=4)

from systembolaget-api.

AlbinLindskog avatar AlbinLindskog commented on June 28, 2024

I ended up pulling all the info I wanted from https://www.systembolaget.se/api/gateway/productsearch/search/.
It has the same functionality, but doesn't require any API-keys. Just a 'baseURL: https://api-extern.systembolaget.se/sb-api-ecommerce/v1' header.

from systembolaget-api.

dunderrrrrr avatar dunderrrrrr commented on June 28, 2024

Hi, Albin.

Does this method still work? Im getting a 401 error when sending GET requests to https://www.systembolaget.se/api/gateway/productsearch/search/ with set baseUrl header.

* Preparing request to https://www.systembolaget.se/api/gateway/productsearch/search/?page=1&size=30&sortBy=Score&sortDirection=Ascending&categoryLevel1=%C3%96l
* Current time is 2023-01-04T18:32:29.559Z
...
> GET /api/gateway/productsearch/search/?page=1&size=30&sortBy=Score&sortDirection=Ascending&categoryLevel1=%C3%96l HTTP/2
> Host: www.systembolaget.se
> user-agent: insomnia/2021.7.2
> baseurl: https://api-extern.systembolaget.se/sb-api-ecommerce/v1
> accept: */*

< HTTP/2 401 

Been trying to scrape /sortiment with selenium for a couple of days but its... finicky, to say the least.

Regards,

from systembolaget-api.

AlbinLindskog avatar AlbinLindskog commented on June 28, 2024

Hi, Albin.

Does this method still work? Im getting a 401 error when sending GET requests to https://www.systembolaget.se/api/gateway/productsearch/search/ with set baseUrl header.

* Preparing request to https://www.systembolaget.se/api/gateway/productsearch/search/?page=1&size=30&sortBy=Score&sortDirection=Ascending&categoryLevel1=%C3%96l
* Current time is 2023-01-04T18:32:29.559Z
...
> GET /api/gateway/productsearch/search/?page=1&size=30&sortBy=Score&sortDirection=Ascending&categoryLevel1=%C3%96l HTTP/2
> Host: www.systembolaget.se
> user-agent: insomnia/2021.7.2
> baseurl: https://api-extern.systembolaget.se/sb-api-ecommerce/v1
> accept: */*

< HTTP/2 401 

Been trying to scrape /sortiment with selenium for a couple of days but its... finicky, to say the least.

Regards,

It looks like they've changed the baseURL.

curl 'https://www.systembolaget.se/api/gateway/productsearch/search/' -H 'baseURL: https://api-systembolaget.azure-api.net/sb-api-ecommerce/v1'

works now.

from systembolaget-api.

moffepoffe avatar moffepoffe commented on June 28, 2024

did you manage to solve this issue with the script or did you go your own way @AlbinLindskog ?

from systembolaget-api.

AlbinLindskog avatar AlbinLindskog commented on June 28, 2024

did you manage to solve this issue with the script or did you go your own way @AlbinLindskog ?

I went my own way; a simple for loop that steps through all the pages on the url above and dumps the response into a .json file. Neither pretty nor robust, but I only needed to pull the data once, so it's good enough.

from systembolaget-api.

moffepoffe avatar moffepoffe commented on June 28, 2024

@AlbinLindskog is this something you want to share? :D

from systembolaget-api.

moffepoffe avatar moffepoffe commented on June 28, 2024

@AlbinLindskog Many thanks I really appreciate it!

from systembolaget-api.

sandberghannes avatar sandberghannes commented on June 28, 2024

@AlbinLindskog Did they change the baserurl again?

from systembolaget-api.

AlbinLindskog avatar AlbinLindskog commented on June 28, 2024

@AlbinLindskog Did they change the baserurl again?

Yep, looks like it. They're using a 'Ocp-Apim-Subscription-Key'-header now instead.

curl 'https://api-extern.systembolaget.se/sb-api-ecommerce/v1/productsearch/search?page=1' -H 'Ocp-Apim-Subscription-Key: cfc702aed3094c86b92d6d4ff7a54c84'

I don't know how long it's valid for, but you can get a new one simply by inspecting the network traffic when you make a search query in your browser on their site.

from systembolaget-api.

AlexGustafsson avatar AlexGustafsson commented on June 28, 2024

I've gotten around to fix the code now. Thanks for helping out! Basically one will now have to identify the main app JS bundle from the HTML page, fetch that and from there find the current API key. They seem to change it every now and then. It's already updated since December.

Note; I haven't fixed the stores assortment yet. They seem to have replaced that with a search functionality.

Closing 🥳

from systembolaget-api.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.