Code Monkey home page Code Monkey logo

azure_e2e_data_engineering_project_1's Introduction

image

Analyze Project using AdventureWorks 2019 DB

On-prem DB to Azure Cloud Pipeline with Data Factory, Lake Storage, Spark, Databricks, Synapse, PowerBI

๐Ÿ“ Table of Contents

  1. Project Overview
  2. Project Architecture
    2.1. Data Ingestion
    2.2. Data Transformation
    2.3. Data Loading
    2.4. Data Reporting
  3. Credits
  4. Contact

๐Ÿ”ฌ Project Overview

This project can be defined as End-to-end Data Engineering Project applied in Azure Cloud. Basically, Data Ingestion is applied with using Data Factory which gets raw data from on-premise SQL DB to Azure Data Lake storage in bronze layer, then data transformation process is applied by Azure Databricks using Spark and transformed data is stored in silver layer and gold layer kept cleansed data which is loaded into Synapse Serverless DB and its data is visualized in PowerBI report. Also, I used Azure Active Directory (AAD) and Azure Key Vault for the data monitoring and governance purpose.

๐Ÿ“ Project Architecture

You can find the detailed information on the diagram below:

1_project_str

๐Ÿ“ค Data Ingestion

  • Connected the on-premise SQL Server with Azure using Microsoft Integration Runtime.

2_IR

  • Setup the Resource group with needed services (Key Vault, Storage Account, Data Factory, Databricks, Synapse Analytics)

3_resource_group

  • Migrated the tables from on-premise SQL Server to Azure Data Lake Storage Gen2.

4_containers

5_pipeline_1

โš™๏ธ Data Transformation

  • Mounted Azure Blob Storage to Databricks to retrieve raw data from the Data Lake.
  • Used Spark Cluster in Azure Databricks to clean and refine the raw data.
  • Saved the cleaned data in a Delta format; optimized for further analysis.

6_databricks_bronze_to_silver

7_databricks_silver_to_gold

๐Ÿ“ฅ Data Loading

  • Used Azure Synapse Analytics to load the refined data efficiently.
  • Created SQL database and connected it to the data lake.

8_synapse_pipeline

9_gold_db_views

๐Ÿ“Š Data Reporting

  • Connected Microsoft Power BI to Azure Synapse, and used the Views of the DB to create interactive and insightful data visualizations.

10_powerbi_report

๐Ÿ› ๏ธ Technologies Used

  • Data Source: SQL Server
  • Orchestration: Azure Data Factory
  • Ingestion: Azure Data Lake Gen2
  • Storage: Azure Synapse Analytics
  • Authentication and Secrets Management: Azure Active Directory and Azure Key Vault
  • Data Visualization: PowerBI

๐Ÿ“‹ Credits

๐Ÿ“จ Contact Me

LinkedIn Website Gmail

azure_e2e_data_engineering_project_1's People

Contributors

dogucanelci avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.