This repository aim to improve my previous project Linkedin Jobs Scraper, which was based on a simple web scraping process of job offers and then saved them in a csv
file.
With this repository, I intend to improve the quality of the project by incorporating a flow based on ETL processes (extract, transform, and load).
Now, the web scraping process extracts more information about the job offers and prepares them to be stored in a sqlite
database for later analysis.
Also, the scraping process handles connection errors such as error 429
, in addition to recording each process in an execution log.
-
Create a virtual environment with the
requirements.txt
dependencies. -
Run:
python src/main.py
-
The
data base
andexecution_logs
will be store indata/
foldel. -
The
analytics/
folder contains a notebook with some analytics of the data.