Code Monkey home page Code Monkey logo

microsoft / azure-data-services-go-fast-codebase Goto Github PK

View Code? Open in Web Editor NEW
67.0 16.0 59.0 235.12 MB

Code base for the Azure Data Services Go Fast Framework. A framework for rapid deployment and configuration of common Azure Data Services Architectures.

License: Other

TSQL 28.62% C# 19.59% HTML 14.62% Batchfile 0.02% CSS 0.18% JavaScript 1.09% PowerShell 5.54% Dockerfile 0.05% Shell 0.52% HCL 11.79% Jsonnet 12.69% Jupyter Notebook 4.95% Python 0.34%
azure microsoft data

azure-data-services-go-fast-codebase's Issues

Allow ability to add a task to the current schedule instance

Creation of a new function that the front end can call to force-regenerate tasks for a specific schedule.

The logic is in the Prepare Task script.

Change to the WebApp and FunctionApp.

A warning should be displayed to warn of potential duplication of tasks

** Check with John and then let's re-review this one

Synapse as Data-sink

  • Addition of Synapse Workspace as part of CICD
  • Adding ADF Pipeline Templates to load data into synapse

How can I run a simple "Task Master" task

I configured the ADS Go Fast and it created all the objects and I can access the frontend. I tried to run existing task (Task Master called AwSample SalesLT.Customer Extract to Data Lake) which I simply enabled it but it does not run.

What should I do and where should I look?
How does the TaskInstance gets created?

Thanks,
Muhammad

Deploy and Configure Synapse Workspace (ARM, Powershell)

Provide ability to deploy Synapse Workspace, default storage account, Spark Pool, Dedicated SQL Pool.
All configurations should be customizable via environment.json config and could be partly enabled/disabled.

This will allow for development of Synapse Data Sink pipeline within the ADS GF ADF Pipelines.

[COST OPTIMISATION] When running VNET attached mode. The solution is running 2x P1V2 app service plans. Would be better to consolidate these.

When you deploy the VNET integration, it scales the app services for both the function apps and the web app up to P1V2.

It doesn't make sense to run 2 app service plans at P1V2 scale given how little use/load the Metadata web app will generate. These resources should share the same app service plan.

I think it actually makes sense to combine the 2 from the beginning and drop the need for the consumption based app service plan for the function apps. You are already paying for an S1 app service for the web app, might as well reuse that plan for the Azure function as well.

Lockbox Extensions V2.0

  • Add Synapse Workspace as data sink #33
  • Integrate ADF pipelines into Framework specific Synapse instance #39
  • Make number of IRs configurable #32

CDC Incremental Export (SQL Server Only)

  • #195
  • Add an Optional IAS SQL Server to standard deployment (this will be used for testing of CDC based task type below
  • Alter the SQL Database to Azure Storage Task Type to support watermark based incremental export of SQL Server tables based on the CDC information
  • Add Functional Tests for CDC
  • Add CDC enablement to IAS SQL into IAC

Ability to lock Tasks, Source / Target systems & Execution Engines by user role

microsoft/azure-data-services-go-fast-capability-assessment#129

The purpose of this issue is to apply security controls to the following:

  • Source & Target Systems
  • Execution Engines
  • Associated Tasks

The is currently a security implementation in place that allows securing subject areas by owners (was implemented for WAPHA). We should extend this implementation to cover the required items above.

  • Design the security model / changes
  • Replace the SubjectAreaRoleMap table with an EntityRoleMap table that supports different entity types
  • Update the EntityRoleProvider to use the new table.
  • Update the DbUp scripts
  • Use temporal tables on the new/updated tables
  • Update controllers to add authentication checks & owner / security group assignment (Same as the SubjectAreaController does now)
    group assignment
  • Update documentation on how to secure an item.
  • Add CRUD model (Controller & Views for maintaining the EntityRoleMaps)
  • Add a navigation/menu item to the new Security Model pages

Management Zone

Currently the design of this platform builds the Data Management and Data Landing Zones into a singular zone. This is beneficial for customers that are looking to rapidly get started with their analytics use cases. It would however be good if we can use the ADS Go Fast on top of an existing Data Management Zone (https://github.com/Azure/data-management-zone). This will begin the alignment of these two open-source projects and ensure that you can use the two microsoft repos together.

Scenario:

You have already deployed a Data Management Zone (as per https://github.com/Azure/data-management-zone) and you want to be able to use the ADS Go Fast as an Azure Data Landing Zone. This requires passing in / configuring the following:

  • Allow for disabling the deployment of purview so that a shared Data Management Landing Zone account can be used
  • Allow for optionally providing an existing Azure Monitor (Log Analytics workspace ID)
  • Allow for optionally providing an existing vnet/subnets
  • Allow for optionally providing existing private DNS namespace prefixes
  • Allow for optionally providing existing synapse private link hubs
  • Provide an example configuration for deploying the platform to use an existing data management landing zone.

Enhance Deployment via Declarative IAC

Enhance the deployment process using declarative infrastructure as code. Core objectives:

  • Simplify Deployment
  • Decrease deployment time
  • Improve Incremental Deployment Ability
  • Reduce Powershell codebase

Enhance EditSettings page with validation rules

Prevent resource names that will be too long for deployment.

For example, if names are too long, then when running Deployment/workflows/CD_1a_DeployServices.ps1, we can get errors such as:

ERROR: {'code': 'InvalidTemplateDeployment', 'message': "The template deployment 'Storage_Logging' is not valid according to the validation procedure. The tracking id is '9aa1404a-2db4-4517-9070-b73696844278'. See inner errors for details."}
Inner Errors: 
{'code': 'PreflightValidationCheckFailed', 'message': 'Preflight validation failed. Please refer to the details for the specific errors.'}
Inner Errors: 
{'code': 'AccountNameInvalid', 'target': 'logstgforabcdexbtekbelurbbk', 'message': 'logstgforabcdexbtekbelurbbk is not a valid storage account name. Storage account name must be between 3 and 24 characters in length and use numbers and lower-case letters only.'}
ERROR: {"status":"Failed","error":{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.","details":[{"code":"BadRequest","message":"{\r\n  \"error\": {\r\n    \"code\": \"VaultNameNotValid\",\r\n    \"message\": \"The vault name 'adsgfkvforabcdexbtekbelurbbk' is invalid. A vault's name must be between 3-24 alphanumeric characters. The name must begin with a letter, end with a letter or digit, and not contain consecutive hyphens. Follow this link for more information: https://go.microsoft.com/fwlink/?linkid=2147742\"\r\n  }\r\n}"}]}}

PrepareFrameworkTasks is failing

I have deployed the ADS GO Fast and tried to load one table from configured Sql to Blob using preconfigured Task master. After investigating I found that the TaskInstance does not have any record and further I found that PrepareFrameworkTask is failing with below:
image

Can you please assist on this?

Thanks,
Muhammad

Extend Lockbox to include H2O.ai

Note -> Currently added specifically in terraform. May alter to make the terraform more generic for a custom VM image and allow parameters to control the image? (linux vm)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.