Code Monkey home page Code Monkey logo

adfbuild2018's Introduction

adfbuild2018

This document will explain how to understand, use and deploy the demo assets within this repo. It is intended to provide a demo environment for Azure Data Factory (ADF) to load data from flat files in Azure Blob and Amazon AWS as well as REST API into Azure Data Warehouse with a series of data transformation activities. These assets are the demo content from the Microsoft Build 2018 Conference in Seattle for this session: Develop scalable analytical solutions with Azure Data Factory & Azure SQL Data Warehouse.

The theme of the demo is building a scalable water analytics solution for Azure SQL Data Warehouse that can identify areas in the US that are at risk of water shortages due to drought, weather patterns and other factors.

ARM Template

The ARM Template JSON and associated ARM Template Parameters JSON files contain the Azure Data Factory that I built and used at the Build conference. To install this factory, deploy the template with the parameters file to Azure with these instructions.

In that factory you will see a series of ADF objects:

Pipelines

  • Water Demo Pipeline TEMPLATE

This is the template used to create the other related water demo pipelines that use the same structure.

  • Water Demo Pipeline MAIN

This is the primary sequential data loader and data transformation pipeline that you will use in this demo. It will sequentially acquire data from different sources: Blob, AWS, REST API and land it in Azure Blob, then transform it using Azure Databricks Notebooks and Azure SQL DW stored procedures, then load Azure SQL Data Warehouse. At the end of the pipeline, either a Success or a Failure email will be sent.

  • Water Demo with Params

A copy of the Water Demo pipeline that includes parameters set in the pipeline and used in the Datasets as a way to dynamically set the files & folders that you will load. It also demonstrates that the activities in the pipeline do not need to depend on each other in a sequential manner. You can also execute activities in a parallel manner.

  • Water Pipeline Loops

The pipelines in this demo Azure Data Factory are primarily for demo purposes and for trying out features. But when you actually operationalize a production factory, you will build pipelines with parameters and loops. That makes your pipelines much more flexible re-usable. This pipeline shows how to use a loop for a load activity.

Databricks_Scala_Notebook

This is the Notebook code for the Azure Databricks activity in the water pipelines that bucketizes (groups) data using SparkSQL inside a very simple Spark Scala script. You will need to put this code inside an Azure Databricks activity on your cluster to get the end-to-end pipeline to actually work.

adfbuild2018script.sql

This SQL script contains the database schema contents (tables & stored procs) that are used in this demo. Stand-up an Azure SQL Data Warehouse and then execute this script file from SSMS connected to your Azure SQL DW database. It will create the tables that are loaded in the pipeline as well as the sprocs that are called from the stored proc activities.

adfbuild2018's People

Contributors

kromerm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.