Data scraping solution for the PLM webpage using Python 3.9 and Scrapy 2.4
-
(Optional) Create a virtual environment in the .venv hidden directory:
python -m venv .venv
-
Install dependencies:
pip install -r requirements.txt
-
Run the spider crawl:
scrapy crawl {spider} -o {output_file}.jl
-
sort.py : Generates a jsonlines file with the data of the input .jl file sorted by the specified field.
python sort.py {input_file}.jl {output_file}.jl {field}