ETL pipeline for monitoring cripto curency price and build analytical dashboard based on collected data inside Data Warehouse.
The following system diagram represents the project structure. From the picture, it may be seen that the system is composed from 4 docker containers with following purposes:
- pipeline performs complete ETL cycle (cron job)
- warehause contains main storage of cleaned data (Clickhouse)
- stagedb plays the role of a backup storage for raw data (MongoDB)
- dashboard generates and shows reports out of cleaned data (Metabase)
Additionally, dashboard uses PostgreSQL database container as its internal storage.
โโโ docs
โ
โโโ pipeline
โ โโโ cron # Scheduler configs
โ โโโ docker # Environment configs
โ โโโ logs # Logs for pipeline service
โ โ
โ โโโ src # ETL source code
โ โ โโโ config.py # Enviroment parsers
โ โ โโโ db.py # Warehouse management
โ โ โโโ etl.py # ETL functions
โ โ โโโ run.py # Pipeline script
โ โ โโโ stagedb.py # StageDB management
โ โ
โ โโโ tests # Unittests for ETL source code
โ
โโโ warehouse
โ โโโ db # Warehouse database files (Clickhouse)
โ โโโ logs # Logs for warehouse service
โ
โโโ stagedb
โ โโโ db # StageDB database files (MongoDB)
โ
โโโ dashboard
โโโ db # Dashboard database files (PostgreSQL)
โโโ docker # Environment configs
โโโ logs # Logs for dashboard service
To run the project, perform the following steps:
- Clone repo to your machine
git clone https://github.com/Genvekt/coincap_monitor.git
cd coincap_monitor
- Create
.env
file with the following envairoment parameters:
-
API_KEY
: key that you must retrieve from here -
API_URL
: url to the CoinCap API -
STAGEDB_HOST
,STAGEDB_DB
,STAGEDB_USER
,STAGEDB_PASSWORD
,STAGEDB_PORT
: MongoDB access data -
CLICKHOUSE_HOST
,CLICKHOUSE_DB
,CLICKHOUSE_USER
,CLICKHOUSE_PASSWORD
,CLICKHOUSE_PORT
: ClickHouse access data -
POSTGRES_HOST
,POSTGRES_DB
,POSTGRES_USER
,POSTGRES_PASSWORD
,POSTGRES_PORT
: PostgreSQL access dataExample
.env
file:API_KEY={YOUR_API_KEY} API_URL=http://api.coincap.io/v2 STAGEDB_HOST=stagedb STAGEDB_DB=stagedbdb STAGEDB_USER=stagedbuser STAGEDB_PASSWORD={YOUR_MONGODB_PASSWORD} STAGEDB_PORT=27017 CLICKHOUSE_HOST=warehouse CLICKHOUSE_DB=clickhousedb CLICKHOUSE_USER=clickhouseuser CLICKHOUSE_PASSWORD={YOUR_CLICKHOUSE_PASSWORD} CLICKHOUSE_PORT=9000 POSTGRES_HOST=dashboard_db POSTGRES_DB=postgres POSTGRES_USER=postgres POSTGRES_PASSWORD={YOUR_POSTGRESQL_PASSWORD} POSTGRES_PORT=5432
- Run application
docker network create CoinCapNet
docker-compose run --build -d
- Stop application:
docker-compose down -v