Code Monkey home page Code Monkey logo

nfz-hospitalization-data's Introduction

Analyzing 21,194,349 Hospitalization Records from Poland's National Health Fund (NFZ)

Comprehensive ETL, Data Warehousing, Processing & Visualization Project

A Low-Level Approach with SQL, Shell Scripts, SQLLoader and Apache Superset

#batch #dwh #datamart #nfz #data #etl #bi #visualisation #healthcare #dataengineering
#analyst #pipeline #dataops #batch #shell #oracle #apachesuperset #sqlcl #docker #pluggabledb

Projects status: in progress


Sneak Peek

Let's start with a quick preview to grab your attention. I've prepared comprehensive cross-sectional descriptions and installers (shell & SQL scripts) for various versions of my reports. Of course, you don't have to go through the installation process. For more details, simply scroll down.

  1. Here are two dashboard options that showcase the results of my process:
SQL> select total_hospitalization_in_dataset
  2*  from rpt_totalhosp;
  
TOTAL_HOSPITALIZATION_IN_DATASET
___________________________________
21,194,349

Example 2: SQlCl: Show Report View as Text Output


  1. Data exploration & profiling (show more results)
  • Sample: Distribution based on the count of specific modes of patient discharge per admission, according to the NFZ (National Health Fund) modes of discharge and admission
REASON_FOR_ADMISSION DIS_1 DIS_2 DIS_3 DIS_4 DIS_6 DIS_7 DIS_8 DIS_9 DIS_10 DIS_11
Emergency admission - other cases (3) 6919800 15455742 1055898 188649 709275 14814 2970 1122597 819 2382
Planned admission based on a referral (6) 8711580 18046503 407859 62466 238764 6309 2295 213738 639 2988
Emergency admission due to transfer by the medical rescue team (2) 1603881 4697712 493851 154932 203370 11586 978 1175358 480 903
Admission of a newborn as a result of childbirth in this hospital (5) 713970 191286 74289 1260 20469 72 219 11430 33 93
Admission based on an oncology diagnostics and treatment card (11) 271371 340449 5535 897 2766 75 30 9750 15
(...)

and more...


Introduction

Project Objectives

Develop a ETL pipeline and utilize BI tools to process and visualize over 21 million NFZ hospitalization records (2017-2022), with a emphasis on data processing framework.

Key Components

  • Environment Setup: Script to automate database and environment configuration
  • ETL & DWH: Automated extraction, transformation, and loading of data into a star schema data mart
  • Reports & Dashboards:
    • Data Profiling Dashboard (visualisation + text-based report)
    • Main Analytical Dashboard (visualisation + text-based report)

Used data

This project is based on real NFZ data sourced from the open-data portal, dane.gov.pl. The dataset comprises all hospitalizations covered by the NFZ (National Health Fund of Poland) in 2017-2022 *.

The dataset comprises over 20 million records, with each record representing an individual patient's hospitalization data.

It's based on the public goverment repositories hub named data.gov.pl.

Used repository is not bad quality, but I saw small differences between files. I want to make some improvements, create DWH star schema, materialized view as fact table, dimensions etc... I will clean and process the data to visualize it with Apache Superset.

Tech stack

This repository showcases my Data Engineering project, highlighting my diverse data-related skills. It includes database administration, data warehousing, and ETL development tasks. My objective is to prepare and process this data for visualization purposes, including the creation of charts and dashboards. I aim to learn modern data visualization BI tool.

Tool/Software Description
Visualization
Apache Superset Visualization tool
Data Cleaning
Oracle SQL Data cleaning and manipulation
Virtual Machines
VM 1 Oracle Linux with DB 21
VM 2 Centos, Docker, Apache Superset container with cx_oracle connector
Shell Scripting
MINGW Bash Shell scripting
Data Loading
SqlLoader Data loading
IDE
SQLDeveloper 23 IDE for DBA tasks and queries
Visual Studio Code IDE with new Oracle plugin
Database Tools
SQLPlus & SQLCli Database command-line interface
Others
PlantUml Tool for diagrams
GIT Version control system

Data source

NFZ source Data preview (CSV)

ROK;MIESIAC;OW_NFZ;NIP_PODMIOTU;KOD_PRODUKTU_KONTRAKTOWEGO;KOD_PRODUKTU_JEDNOSTKOWEGO;KOD_TRYBU_PRZYJECIA;KOD_TRYBU_WYPISU;PLEC_PACJENTA;GRUPA_WIEKOWA_PACJENTA;PRZEDZIAL_DLUGOSCI_TRWANIA_HOSPITALIZACJI;LICZBA_HOSPITALIZACJI 2022;4;"07";"1132866688";"03.4580.991.02";"5.51.01.0008013";6;2;"K";"65 i wiecej";"6 i wie™cej dni";"<5" 2022;8;"02";"5562239217";"03.4220.030.02";"5.51.01.0001087";3;2;"K";"45-64";"6 i wiecej dni";"<5" 2022;11;"03";"9462146139";"03.4580.991.02";"5.51.01.0008015";6;2;"K";"65 i wiecej";"6 i wiecej dni";"<5" 2022;9;"15";"7842008454";"03.4450.040.02";"5.51.01.0012014";3;2;"K";"65 i wiecej";"0 dni";"<5"

Domain dictionaries

The data contains some foreign keys pointing to static dictionaries:

The dictionary source for Polish HL7 implementations includes

My DWH DB model

I had to transform the source file into a star schema model for that data mart...

Diagram Figure 1: DWH Model

Database Objects & Names Explanation

As a DBA, I handle SQL scripts, database creation, structure definition, user management, and permissions.

Installation

I've prepared some bash and SQL scripts to create database, structures and objects and automate the installation process. You can use either install.sh or rebuild.sh to get started.

Installation

Figure 1: Demonstration of the script execution process (gif animations)

Dashboard & Reports Results

Of course you don't have to go through the installation process. Simply open the text-based dashboard or view the visualization screenshots. However, if you're a professional user, you can review my analysis queries (along with all other scripts) by navigating to the sql/ GitHub directory.

Data profiling

Analysis


nfz-hospitalization-data's People

Contributors

szwrk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.