Code Monkey home page Code Monkey logo

microsoft / azure-data-services-go-fast-codebase Goto Github PK

View Code? Open in Web Editor NEW
67.0 16.0 59.0 235.12 MB

Code base for the Azure Data Services Go Fast Framework. A framework for rapid deployment and configuration of common Azure Data Services Architectures.

License: Other

TSQL 28.62% C# 19.59% HTML 14.62% Batchfile 0.02% CSS 0.18% JavaScript 1.09% PowerShell 5.54% Dockerfile 0.05% Shell 0.52% HCL 11.79% Jsonnet 12.69% Jupyter Notebook 4.95% Python 0.34%
azure microsoft data

azure-data-services-go-fast-codebase's Introduction

Azure Data Services Go Fast Codebase

Introduction

The Azure Data Services Go Fast Codebase is a combination of Microsoft components designed to shorten the "time to value" when deploying an Azure Data Platform. Key features include:

  • Infrastructure as code (IAC) deployment of MVP Azure Data Platform
  • "Out of the box" Continuous Integration and Continuous Deployment framework
  • Enterprise grade security and monitoring with full support for Key Vault, VNETS, Private Endpoints and Managed Service Identities
  • Codeless Ingestion from commonly used enterprise source systems into an enterprise data lake
  • Users can interact with capabilities through a webpage and embedded dashboards.

This project is composed of Microsoft components and Open-Source Software (OSS) and is provided to customers and partners at no charge.

🚩 At its core this project is intended to be an accelerator. As such, it is designed to accelerate the β€œtime to value” in using the Microsoft components. As an accelerator, is it not for sale, nor is it a supported product.


Getting Started

Getting started is always the hardest part of any process so to help clients & partners get started with this repository we provide a set of online, on-boarding and upskilling workshops. Spaces in these workshops are limited and subject to an application process. If you are interested then please nominate yourself at https://forms.office.com/r/qbQrU6jFsj.

Prerequisites

Deployment of this project requires a variety of services across Azure. Please ensure that you have access to these services before continuing on to the deployment section of this guide.

To get started you will need the following:

  • An active Azure Subscription & Empty Resource Group*
  • Owner rights on the Azure Resource Group
  • Power BI Workspace (Optional)

*Note that for a fully functioning deployment the deployment process will create a Deployment Service principal and two Azure Application Registrations within the Azure Active Directory (AAD) domain that is connected to your Azure subscription. It is recommended that you use an Azure subscription and AAD on which you have the necessary privileges to perform these operations.

You can sign up for an Azure subscription here

Once you have your Prerequisite items, please move on to the Deployment Configuration step.


Deployment Configuration

You will also need some development tools to edit and run the deployment scripts provided. It is recommended you use the following:

The deployment uses a concept of Developing inside a Container to containerize all the necessary pre-requisite components without requiring them to be installed on the local machine. Follow our Configuring your System for Development Containers guide.

Once you have set up these pre-requisites you will then need to Clone this repository to your local machine.

🚩 If you want a stable deployment it is highly recommended that you checkout one of the official release tags. For example, if you wish to deploy v1.0.2 run the line below from within the directory into which you cloned the repository.

git checkout tags/v2.0.1

Deployment

To deploy the solution open Visual Studio Code and carry out the following steps.

  • βœ… From the menu select "File" then "Open Folder". Navigate the directory into which you cloned the solution. It should look like the image below with a ".devcontainer" folder at the root. Open this folder in Visual Studio Code. image
  • βœ… Next, from the Visual Studio Code menu, select "View", "Command Palette". When the search box opens type "Remote-Containers: Reopen in Container". Note that Docker Desktop needs to be running before you perform this step.
  • βœ… From the menu select "Terminal", "New Terminal". A new Powershell Core window will open at the bottom of your screen. You are now running within the Docker container.
  • βœ… You are now in the development and deployment environment. Within the new terminal window navigate to the DeploymentV2 directory using the commands below:
cd ./solution/DeploymentV2

#️⃣ Code Composition

See below for the a "Cloc" generated breakdown of the source code files by format:

Language # Files Blank Lines Comment Lines Code Lines
JSON 508 52 0 367628
YAML 20 4140 4215 186305
SQL 180 2058 2098 38799
C# 240 3910 1461 19565
Razor 322 1757 268 18142
CSS 4 2117 42 9440
HCL 97 1090 581 9248
Jupyter Notebook 15 0 1937 3755
PowerShell 70 858 704 3568
JavaScript 12 268 218 1288
Markdown 40 235 0 994
SVG 7 0 18 657
MSBuild script 8 60 2 634
Bourne Shell 3 62 72 423
Python 2 14 64 55
Dockerfile 1 6 9 34
DOS Batch 1 4 3 1
HTML 1 1 0 0
-------- -------- -------- -------- --------
SUM: 1531 16632 11692 660536

Post Deployment Set-up and Instructions

======= Coming Soon.


Cost Estimator

Coming Soon.


Navigating the Source Code

This has the following structure:

Folder/File Description
solution/ Primary source code folder with sub-directories for each core technology
solution/Database Contains source code for the meta-data database and sample databases
solution/DataFactory Contains source code for Azure Data Factory artefacts (eg. Pipelines)
solution/DeploymentV2 Contains CICD code
solution/Diagrams Contains a Structurizr diagramming project used for creation of architectural diagrams
solution/FunctionApp Contains source code for the ADS Go Fast Orchestration Functions
solution/PowerBi Contains source code for the Power BI files that can be used to provide reporting
solution/SampleFiles Contains sample data files used in functional tests
solution/Synapse Contains source code for Synapse Workspace artefacts (eg. Pipelines, Notebooks etc)
solution/WebApplication Contains source code for the ADS Go Fast web front end

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Recommended Maintenance activities when contributing:

  1. Check Azure Cli for new versions - upgrade, test and remediate where necessary
  2. Check Terraform Providers for new versions - upgrade, test and remediate where necessary
  3. Check Dependecny Libraies for Function Application and Web Application for new versions - upgrade, test and remediate where necessary

Security

Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include Microsoft, Azure, DotNet, AspNet, Xamarin, and our GitHub organizations. Please review this repository's security section for more details.

Privacy

Microsoft values your privacy. See the Microsoft Privacy Statement for more information

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

azure-data-services-go-fast-codebase's People

Contributors

alyssons-db avatar andreas-empired avatar chris-melinn avatar cyrilgagnaire avatar dependabot[bot] avatar h-sha avatar hboiled avatar hugosharpe-insight avatar jrampono avatar leighs avatar moshiulrabbi avatar paullisto-insight avatar rivms avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-data-services-go-fast-codebase's Issues

Enhance EditSettings page with validation rules

Prevent resource names that will be too long for deployment.

For example, if names are too long, then when running Deployment/workflows/CD_1a_DeployServices.ps1, we can get errors such as:

ERROR: {'code': 'InvalidTemplateDeployment', 'message': "The template deployment 'Storage_Logging' is not valid according to the validation procedure. The tracking id is '9aa1404a-2db4-4517-9070-b73696844278'. See inner errors for details."}
Inner Errors: 
{'code': 'PreflightValidationCheckFailed', 'message': 'Preflight validation failed. Please refer to the details for the specific errors.'}
Inner Errors: 
{'code': 'AccountNameInvalid', 'target': 'logstgforabcdexbtekbelurbbk', 'message': 'logstgforabcdexbtekbelurbbk is not a valid storage account name. Storage account name must be between 3 and 24 characters in length and use numbers and lower-case letters only.'}
ERROR: {"status":"Failed","error":{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.","details":[{"code":"BadRequest","message":"{\r\n  \"error\": {\r\n    \"code\": \"VaultNameNotValid\",\r\n    \"message\": \"The vault name 'adsgfkvforabcdexbtekbelurbbk' is invalid. A vault's name must be between 3-24 alphanumeric characters. The name must begin with a letter, end with a letter or digit, and not contain consecutive hyphens. Follow this link for more information: https://go.microsoft.com/fwlink/?linkid=2147742\"\r\n  }\r\n}"}]}}

CDC Incremental Export (SQL Server Only)

  • #195
  • Add an Optional IAS SQL Server to standard deployment (this will be used for testing of CDC based task type below
  • Alter the SQL Database to Azure Storage Task Type to support watermark based incremental export of SQL Server tables based on the CDC information
  • Add Functional Tests for CDC
  • Add CDC enablement to IAS SQL into IAC

Enhance Deployment via Declarative IAC

Enhance the deployment process using declarative infrastructure as code. Core objectives:

  • Simplify Deployment
  • Decrease deployment time
  • Improve Incremental Deployment Ability
  • Reduce Powershell codebase

Ability to lock Tasks, Source / Target systems & Execution Engines by user role

microsoft/azure-data-services-go-fast-capability-assessment#129

The purpose of this issue is to apply security controls to the following:

  • Source & Target Systems
  • Execution Engines
  • Associated Tasks

The is currently a security implementation in place that allows securing subject areas by owners (was implemented for WAPHA). We should extend this implementation to cover the required items above.

  • Design the security model / changes
  • Replace the SubjectAreaRoleMap table with an EntityRoleMap table that supports different entity types
  • Update the EntityRoleProvider to use the new table.
  • Update the DbUp scripts
  • Use temporal tables on the new/updated tables
  • Update controllers to add authentication checks & owner / security group assignment (Same as the SubjectAreaController does now)
    group assignment
  • Update documentation on how to secure an item.
  • Add CRUD model (Controller & Views for maintaining the EntityRoleMaps)
  • Add a navigation/menu item to the new Security Model pages

Synapse as Data-sink

  • Addition of Synapse Workspace as part of CICD
  • Adding ADF Pipeline Templates to load data into synapse

Lockbox Extensions V2.0

  • Add Synapse Workspace as data sink #33
  • Integrate ADF pipelines into Framework specific Synapse instance #39
  • Make number of IRs configurable #32

Allow ability to add a task to the current schedule instance

Creation of a new function that the front end can call to force-regenerate tasks for a specific schedule.

The logic is in the Prepare Task script.

Change to the WebApp and FunctionApp.

A warning should be displayed to warn of potential duplication of tasks

** Check with John and then let's re-review this one

How can I run a simple "Task Master" task

I configured the ADS Go Fast and it created all the objects and I can access the frontend. I tried to run existing task (Task Master called AwSample SalesLT.Customer Extract to Data Lake) which I simply enabled it but it does not run.

What should I do and where should I look?
How does the TaskInstance gets created?

Thanks,
Muhammad

Extend Lockbox to include H2O.ai

Note -> Currently added specifically in terraform. May alter to make the terraform more generic for a custom VM image and allow parameters to control the image? (linux vm)

PrepareFrameworkTasks is failing

I have deployed the ADS GO Fast and tried to load one table from configured Sql to Blob using preconfigured Task master. After investigating I found that the TaskInstance does not have any record and further I found that PrepareFrameworkTask is failing with below:
image

Can you please assist on this?

Thanks,
Muhammad

Deploy and Configure Synapse Workspace (ARM, Powershell)

Provide ability to deploy Synapse Workspace, default storage account, Spark Pool, Dedicated SQL Pool.
All configurations should be customizable via environment.json config and could be partly enabled/disabled.

This will allow for development of Synapse Data Sink pipeline within the ADS GF ADF Pipelines.

Management Zone

Currently the design of this platform builds the Data Management and Data Landing Zones into a singular zone. This is beneficial for customers that are looking to rapidly get started with their analytics use cases. It would however be good if we can use the ADS Go Fast on top of an existing Data Management Zone (https://github.com/Azure/data-management-zone). This will begin the alignment of these two open-source projects and ensure that you can use the two microsoft repos together.

Scenario:

You have already deployed a Data Management Zone (as per https://github.com/Azure/data-management-zone) and you want to be able to use the ADS Go Fast as an Azure Data Landing Zone. This requires passing in / configuring the following:

  • Allow for disabling the deployment of purview so that a shared Data Management Landing Zone account can be used
  • Allow for optionally providing an existing Azure Monitor (Log Analytics workspace ID)
  • Allow for optionally providing an existing vnet/subnets
  • Allow for optionally providing existing private DNS namespace prefixes
  • Allow for optionally providing existing synapse private link hubs
  • Provide an example configuration for deploying the platform to use an existing data management landing zone.

[COST OPTIMISATION] When running VNET attached mode. The solution is running 2x P1V2 app service plans. Would be better to consolidate these.

When you deploy the VNET integration, it scales the app services for both the function apps and the web app up to P1V2.

It doesn't make sense to run 2 app service plans at P1V2 scale given how little use/load the Metadata web app will generate. These resources should share the same app service plan.

I think it actually makes sense to combine the 2 from the beginning and drop the need for the consumption based app service plan for the function apps. You are already paying for an S1 app service for the web app, might as well reuse that plan for the Azure function as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.