This project is about building a local database in sqlite. Using DDL and DML to create, load, query, and manipulate database tables and views. Developing a web service to expose the database table for further analysis and use.
The data provided includes 2 csv files;
20220125_Ex_ap_data.csv: Data containing details of analysis.
20220125_Ex_sp_data.csv: Data containing details of projects.
- Python- The code should run using Python versions 3.8
- sqlite3 for SQL database Manangement- Python comes with a built-in sqlite database in the form of a library to provide a complete database management system without the need for downloading an additional software. Works best for practising.
- pandas for doing transformation tasks on the csv files before loading to db.
- os- python module for interacting with operating system
- Flask- python web framework used to build a web service
The project has following three elements:
- ETL Pipeline:
- create_db.py - creates a sqlite db by normalizing the data provided in the csv files.
- transform.py- used as an import in the subsequent step, performs steps like renaming columns, eliminating unwanted or additional columns.
- load.py- connects to the db created in step 1 and loads the data. The DDL and DML functions are integrated in python code.
- DB denormalization to create a view:
- view.py - creates a combined view/ report from the original tables as sp_ap_rpt.
- create a web service:
- app.py - web service using Flask framework that hooks up to ‘sp_ap_rpt’ and take an input project id (Gpxxxxx) to return a JSON output containing the values from the underlying table
db-driven-api:
- README.md: read me file
- requirements.txt
- create_db.py
- data_ingestion_ETL: package containing the ETL pipeline preparation code
- transform.py
- load.py
- classification_mapper.py
- view.py
- app.py
To execute the project follow the steps below: NOTE: Pycharm is used as IDE
- Clone the repository
- Set up a virtual environment in the project's root directory
- Open a terminal in the project directory and run:
- python3 -m virtualenv venv
- source venv/bin/activate
- pip3 install -r requirements.txt
- Open a terminal in the project directory and run:
- To create a db and the required tables run create_db.py in the project's root directory
- To populate the db with data run load.py in the directory /data_ingestion_ETL/load.py
- To create a view of the sp_ap_rpt run view.py in the project's root directory
- To create a web service, run app.py in the project's root directory
- Go to http://127.0.0.1:5000/api/projects
- To select a project_id provide the project_gold_id in the link,