Code Monkey home page Code Monkey logo

azure.databricks.cicd.tools's Introduction

DEPRECATED

This code is no longer maintained.

azure.databricks.cicd.tools

PowerShell tools for Deploying & Managing Databricks Solutions in Azure. These commandlets help you build continuous delivery pipelines and better source control for your scripts.

Overview

Supports Windows PowerShell 5 and Powershell Core 6.1+. We generally recommend you use PowerShell Core where possible (it's faster to load modules and downloading large DBFS files may fail in older versions).

See the Wiki for command help.

Here is some more detail on use cases for these https://datathirst.net/blog/2019/1/18/powershell-for-azure-databricks

Install-Module

https://www.powershellgallery.com/packages/azure.databricks.cicd.tools

Install-Module -Name azure.databricks.cicd.tools -Scope CurrentUser

Followed by:

Import-Module -Name azure.databricks.cicd.tools

To upgrade from a previous version

Update-Module -Name azure.databricks.cicd.tools

Create Service Principal

Create a new Service Principal, you will need the following:

  • ApplicationId (also known as ClientId)
  • Secret Key
  • TenantId

Make the Service Principal a Contributor on your Databricks Workspace using the Access Control (IAM) blade in the portal.

Connect-Databricks

You must first create a connection to Databricks. Currently there are four methods supported:

  • Use your already logged in AAD Context from Az PowerShell (requires Az module version 5.1+) - known as AZCONTEXT
  • Provide the ApplicationId/Secret and the Databricks OrganisationId for your workspace - known as DIRECT
    • This is the o=1234567890 number in the URL when you use your workspace
  • Provide the ApplicationId/Secret and the SubscriptionID, Resource Group Name & Workspace Name - known as MANAGEMENT
  • Provide a Bearer token connect as your own user account - known as BEARER
    • This is the classic method and not recommended for automated processes
    • It is however still useful for running adhoc commands from your desktop

NOTE: The first time a service principal connects it must use the MANAGEMENT method as this provisions the service principal in the workspace. Therefore after you can use the DIRECT method. Without doing this first you will receive a 403 Unauthorized response on all commands.

Examples

AZCONTEXT:

You can login with the Az Context of your PowerShell session - assuming you are logged in already use Connect-AzAccount.

Connect-Databricks -UseAzContext -DatabricksOrgId $OrgId -Region $Region

You can also use Az PowerShell to get the details using the resource name:

$Workspace = (Get-AzDatabricksWorkspace -ResourceGroupName "MyRG" -Name "MyWorkspaceName")
$Region = (($Workspace.Url).split("."))[0,1] -Join "."
Connect-Databricks -UseAzContext -DatabricksOrgId $Workspace.WorkspaceId -Region $Region

DIRECT:

Connect-Databricks -Region "westeurope" -ApplicationId "8a686772-0e5b-4cdb-ad19-bf1d1e7f89f3" -Secret "myPrivateSecret" `
            -DatabricksOrgId 1234567 `
            -TenantId "8a686772-0e5b-4cdb-ad19-bf1d1e7f89f3"

MANAGEMENT:

Connect-Databricks -Region "westeurope" -ApplicationId "8a686772-0e5b-4cdb-ad19-bf1d1e7f89f3" -Secret "myPrivateSecret" `
            -ResourceGroupName "MyResourceGroup" `
            -SubscriptionId "9a686882-0e5b-4edb-cd49-cf1f1e7f34d9" `
            -WorkspaceName "workspaceName" `
            -TenantId "8a686772-0e5b-4cdb-ad19-bf1d1e7f89f3"

You can also use this command to connect using the Bearer token so that you do not have to provide them on every command (like you did prior to version 2).

BEARER:

Connect-Databricks -BearerToken "dapi1234567890" -Region "westeurope"

You can now execute the commands as required without providing further authication in this PowerShell session:

Get-DatabricksClusters

Legacy Bearer Token Method

You can continue to execute commands using the bearer token in every request (this will override the session connection (if any)):

 Get-DatabricksClusters -BearerToken "dapi1234567890" -Region "westeurope"

This is to provide backwards compatibility with version 1 only.

Commands

For a full list of commands with help please see the Wiki.

Secrets

  • Set-DatabricksSecret
  • Add-DatabricksSecretScope

Deploys a Secret value to Databricks, this can be a key to a storage account or a password etc. The secret must be created within a scope which will be created for you if it does not exist.

Key Vault backed secret scopes

Please note that the Databricks REST API currently only supports adding of Key Vault backed scopes using AAD User credentials (NOT Service Principals). Please use the AzContext connect method as an AAD User.

Cluster Management

The following commands exist:

  • Get-DatabricksClusters - Returns a list of all clusters in your workspace
  • New-DatabricksCluster - Creates/Updates a cluster
  • Start-DatabricksCluster
  • Stop-DatabricksCluster
  • Update-DatabricksClusterResize - Modify the number of scale workers
  • Remove-DatabricksCluster - Deletes your cluster
  • Get-DatabricksNodeTypes - returns a list of valid nodes type (such as DS3v2 etc)
  • Get-DatabricksSparkVersions - returns a list of valid versions

Please see the scripts of the parameters. Examples are available in the Tests folder.

These have been designed with CI/CD in mind - ie they should all be idempotent.

DBFS

  • Add-DatabricksDBFSFile - Upload a file or folder to DBFS
  • Remove-DatabricksDBFSItem - Delete a file or folder
  • Get-DatabricksDBFSFolder - List folder contents

The Add-DatabricksDBFSFile can be used as part of a CI/CD pipeline to upload your source code to DBFS, or dependant libraries. You can also use it to deploy initialisation scripts for your clusters.

Notebooks

Export-DatabricksFolder

Pull down a folder of scripts from your Databricks workspace so that you can commit the files to your Git repo. It is recommended that you set the OutputPath to be inside your Git repo.

Parameters

-ExportPath: The folder inside Databricks you would like to clone. Eg /Shared/MyETL. Must start /
-LocalOutputPath: The local folder to clone the files to. Ideally inside a repo. Can be qualified or relative.

Import-DatabricksFolder

Deploy a folder of scripts from a local folder (Git repo) to a specific folder in your Databricks workspace.

Parameters

-LocalPath: The local folder containing the scripts to deploy. Subfolders will also be deployed.
-DatabricksPath: The folder inside Databricks you would like to deploy into. Eg /Shared/MyETL. Must start /

Jobs

  • Add-DatabricksNotebookJob - Schedule a job based on a Notebook.
  • Add-DatabricksPythonJob - Schedule a job based on a Python script (stored in DBFS).
  • Add-DatabricksJarJob - Schedule a job based on a Jar (stored in DBFS).
  • Add-DatabricksSparkSubmitJob - Schedule a job based on a spark-submit command.
  • Remove-DatabricksJob

Libraries

  • Add-DatabricksLibrary
  • Get-DatabricksLibraries

Missing Commands/Bugs

This command can be used for calling the API directly just lookup the syntax (https://docs.databricks.com/dev-tools/api/latest/index.html)

  • Invoke-DatabricksAPI

Examples

See the Wiki for help on the commands. You can also see more examples in the Tests folder.

Misc

VSTS/Azure DevOps

Deployment tasks exist here: https://marketplace.visualstudio.com/items?itemName=DataThirstLtd.databricksDeployScriptsTasks

Note that not all commandlets are available as tasks. Instead you may want to import the module and create PowerShell scripts that use these.

Contribute

Contributions are welcomed! Please create a pull request with changes/additions.

Requests

For any requests on new features please check the Databricks REST API documentation to see if it is supported first.

azure.databricks.cicd.tools's People

Contributors

bergmeister avatar bilalachahbar avatar daveruijter avatar delenemirouze avatar dsu4rez avatar gavincampbell avatar mwc360 avatar nielsams avatar peter11244 avatar richiebzzzt avatar scsmncao avatar simondmorias avatar sonkanit avatar tadziqusky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure.databricks.cicd.tools's Issues

Get-Notebooks failing since v2.1.3771

At my organization we ran into an issue since last change on v2.1.3771.

Randomly (9 out of 10 times), when we were running our CI/CD pipeline, we were getting an error while calling Export-DatabricksFolder. The error was pointing to Get-Notebooks.

After digging into the repo history, we found out you updated this specific script. Moving back to v2.1.2915 solved the issue for us.

Sorry, because I can't give you more hints since I haven't been able to replicate this error on my machine.

We are making the call as follows:
Export-DatabricksFolder -BearerToken $BearerToken -Region $Region -ExportPath "/$($WORKSPACEPATH)" -LocalOutputPath "$($REL_FOLDER_OUTPUT)\$($WORKSPACEPATH)"

The error:
##[error]Get-Notebooks : Oh dear one of the jobs has failed. Check the details of the jobs above. At C:\Program Files\WindowsPowerShell\Modules\azure.databricks.cicd.tools\2.1.3786\Private\Get-Notebooks.ps1:44 char:13 Get-Notebooks $SubfolderContents ($Object.path + "/") $Lo ... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CategoryInfo : NotSpecified: (:) [Write-Error], WriteErrorException FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Get-Notebooks

Any idea? Could anyone look into this?

Apply Permissions to Jobs

Currently when a job is created only the user that created it can access it. It should be possible to create a job or update a job and apply permissions to it.

Adding secret to Databricks scope doesn't work in Standard Tier

When I tried to add new secret to newly created databricks scope:

Add-DatabricksSecretScope -BearerToken $databricksToken -Region $resourceGroupLocation -ScopeName 'def' -AllUserAccess:$true
Set-DatabricksSecret -BearerToken $databricksToken -Region $resourceGroupLocation -ScopeName 'def' -SecretName 'password' -SecretValue 'some-secret-value'

I've got the following error message, even though "AllUsersAccess" has been declared above.

#{"error_code":"BAD_REQUEST","message":"Premium Tier is disabled in this workspace. Secret scopes can only be created with initial_manage_principal "users"."}

The same works with Databricks CLI.

databricks secrets create-scope --scope xyz --initial-manage-principal users --profile kn
databricks secrets put --scope xyz --key password --profile kn

Line issues where repos sharedbetween unix and windows machines

keep getting into issues where pushed/pulled notebooks from workspaces pickup changes in repos where the only change has been line endings. This is because engineers use windows/linux/macos and git settings are different. Latest attempt to fix this involves fixing line ending on local repo and for any notebooks that already exist overwrite the contents instead of overwriting the files.

Put Secret using AAD Authentication

If you try to create a secret or a secret scope using a service principal to authenticate you will get this error message:

{"error_code":"INTERNAL_ERROR","message":"There was an internal error handling request POST to
/api/2.0/secrets/scopes/create. Please try again later."}

This is a known issue with the Databricks REST API and is with them to address.

In the meantime to workaround please use a bearer token.

Pulling Secret from Azure keyvault with special characters like : = | ; \ / ? : error: (400) Bad Request.

Hi, In Azure CD pipeline if I am pulling secret from the key vault and if it has special characters like : = | ; \ / ? in the secret value, the "Databricks Secret Deployment task" is failing with error.

System.Net.WebException: The remote server returned an error: (400) Bad Request.
Invoke-RestMethod : {"error_code":"MALFORMED_REQUEST","message":"Invalid JSON given in the body of the request - failed to parse given JSON"}

However, other special charaters like #@!$ are accepted and working fine. If you can please look into it.

Errors with Add library

Hello,
I have this error when trying install library .whl inside the cluster:
Invoke-RestMethod : {"error_code":"PERMISSION_DENIED","message":"User: SimpleUserContext{userId=1763010680948721,
2020-09-10T23:15:05.3716987Z name=......., orgId=......,
2020-09-10T23:15:05.3720575Z azurePrincipalTenantId=.......,
2020-09-10T23:15:05.3721120Z azurePrincipalObjectId=.........} is not authorized to access cluster"}
2020-09-10T23:15:05.3721684Z At C:\Program
2020-09-10T23:15:05.3722124Z Files\WindowsPowerShell\Modules\azure.databricks.cicd.tools\2.1.2915\Public\Add-DatabricksLibrary.ps1:102 char:5
2020-09-10T23:15:05.3722648Z + Invoke-RestMethod -Uri $uri -Body $BodyText -Method 'POST' -Heade ...
2020-09-10T23:15:05.3723075Z + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2020-09-10T23:15:05.3723575Z + CategoryInfo : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-RestMethod], WebExc
2020-09-10T23:15:05.3724015Z eption
2020-09-10T23:15:05.3724421Z + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand

Does someone know why I have this error even I use principal service with contributor role?

Thank you.

Do not output the Bearer Token

We run our PowerShell scripts in Verbose mode in CI/CD pipelines to ensure we have all the information to debug in case any issues arise. What I have been seeing is that the Bearer Token is outputted in the Verbose mode when any command from this module is executed. Since Build logs are kind of accessible to everyone and the bearer token generated has a lot of permissions, is it possible to not output it at all? Or just print some ****?

image

Thanks!
-Pranav

Updating an existing cluster does not work.

Hi,
The call to the "New-DatabricksCluster" doesn't seem to work when the cluster already exists.

I have the following error message : "Missing required field: size"

I added both the UniqueNames and Update switch, but the payload sent to Rest API is invalid. The cluster parameters are not added to the payload.

The issue is probably at line 107 "New-DatabricksCluster.ps1". The "else" should be removed, and the cluster parameters must be added whether it's an add or edit command.

 If ($UniqueNames)
    {
        $ClusterId = (Get-DatabricksClusters -Bearer $BearerToken -Region $Region | Where-Object {$_.cluster_name -eq $ClusterName})
        if ($ClusterId){
            if ($Update){
                $Mode = "edit"
                $Body['cluster_id'] = $ClusterId.cluster_id
                Write-Warning "Cluster already exists - it will be updated to this configuration"
            }
            //...
        }
    }
    else{ //Remove the else
        $ClusterArgs['SparkVersion'] = $SparkVersion
        $ClusterArgs['NodeType'] = $NodeType
        //...

        $Body += GetNewClusterCluster @ClusterArgs
    }

$BodyText = $Body | ConvertTo-Json -Depth 10

Best regards,

Romain

if this module supports azure automation

Hi there,

I tried to import this module from Azure automation module gallery, but it failed, while I installed it locally.
then I tried to zip the local installed module and import the zip file to azure automation module, still no luck.
in the last try, I made an ARM template to deploy this module to azure automation, still failed.
I checked Azure automation is using powershell 5.1.

if you can confirm this please?
Thanks in advance

Files .ipynb are not supported by Import-DatabricksFolder function

Currently, those files are being skipped:

WARNING: File test.ipynb has an unknown extension - skipping file

File with extension .ipynb is just another Python files but in form of more complex format (Jupyter).
So, the following line:
$FileType = @{".py"="PYTHON";".scala"="SCALA";".r"="R";".sql"="SQL" }
should be extended to:
$FileType = @{".py"="PYTHON";".scala"="SCALA";".r"="R";".sql"="SQL";".ipynb"="PYTHON" }
and in that cases in request format should be JUPYTER.

Unble to deploy Notebook with Service principle Authentication method

My Databricks notebook deployment works fine with bearer token Authentication. but when tried with the Service principle authentication I get the below error

Error 400 Invalid resource ID.
HTTP ERROR 400
Problem accessing /api/2.0/workspace/mkdirs. Reason:
Invalid resource ID

Please help

Always getting 401 Unauthorized -AADSTS7000215: Invalid client secret is provided

Hi,

I'm always getting 401 while using the SPN Authentication. So I debug it from the Powershell command.

Connect-Databricks -Region <LOCATION> -ApplicationId <APPLICATION_ID> -Secret <SECRET> -ResourceGroupName <RESOURCE_GROUP_NAME> -SubscriptionId <SUBSCRIPTION_ID> -WorkspaceName <DATABRICKS_NAME> -TenantId <TENANT_ID>

Using fiddler I get the below

{"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided.\r\nTrace ID: 34744c89-e0e8-431f-bc20-cbd99292f600\r\nCorrelation ID: c582c863-9180-4ba3-a3f7-aa5826c7f69c\r\nTimestamp: 2019-12-10 10:04:56Z","error_codes":[7000215],"timestamp":"2019-12-10 10:04:56Z","trace_id":"34744c89-e0e8-431f-bc20-cbd99292f600","correlation_id":"c582c863-9180-4ba3-a3f7-aa5826c7f69c","error_uri":"https://login.microsoftonline.com/error?code=7000215"}

I tried with both Standard and Premium Tier but no luck so far.

Also, I tried to encode the client secrets but that also not working.

Using the fiddler I see that there are 2 requests that are going to Microsoft login.

  1. grant_type=client_credentials&client_id=xxxx&resource=https://management.core.windows.net/&client_secret=xxxx

  2. grant_type=client_credentials&client_id=xxxx&resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d&client_secret=xxxx

From the 2 why the resource is hardcoded as 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d ?

Add additional plugin to upload Global Init script

Hi

I have developed the following PS script to upload init scripts to databricks. Right now, it only works for global init scripts (no cluster scoped)

I figure it would make sense to increase the value of an existing plugin rather than creating a competing one!

The location variable is the regular verbose "East US 2" rather than the compressed eastus2. It might be good to have that as a drop down list w/ the allowed deployment regions for databricks

param
(
    [parameter(Mandatory = $true)] [String] $file,
    [parameter(Mandatory = $true)] [String] $location,
    [parameter(Mandatory = $true)] [String] $databricksAccessToken
)
$location = $location.replace(" ",'').toLower()
# You can write your azure powershell scripts inline here. 
# You can also pass predefined and custom variables to this script using arguments
$path = Split-path $file
$filename = Split-path -leaf $file

$uri ="https://$location.azuredatabricks.net/api/2.0/dbfs/put"
$hdrs = @{}
$hdrs.Add("Authorization","Bearer $databricksAccessToken")
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

Write-Host "Replacing CRLF with LF for $file"
$text = [IO.File]::ReadAllText($file) -replace "`r`n", "`n"
[IO.File]::WriteAllText($file, $text)

Write-Host "Encoding $file to BASE64"
$BinaryContents = [System.IO.File]::ReadAllBytes($File)
$EncodedContents = [System.Convert]::ToBase64String($BinaryContents)
$targetPath  = "/databricks/init/$filename"
$Body = @"
{
    "contents": "$EncodedContents",
    "path": "$targetPath",
    "overwrite": "true"
}
"@

Write-Output "Pushing file $File to $TargetPath to REST API: $uri"
Invoke-RestMethod -Uri $uri -Body $Body -Method 'POST' -Headers $hdrs

How is module published To PSGallery, & could it be published to NuGet as well?

How is the module published to PSGallery? I don't see any steps in the yaml files.

Reason I ask is that I want to publish some modules to NuGet for when PSGallery becomes unavailable again, but as we rely on this module it would be a bit redundant me doing that to our modules if this module is only available on PSGallery.

Upload a file to Databricks workspace

Hi, I am trying to upload to a databricks with a specific name specified in the target location of command.
However, it creates a folder in which the file is inside.
e.g.
Add-DatabricksDBFSFile -LocalRootFolder "fileToUpload/" -FilePattern "InitScript.sh" -TargetLocation "/databricks/init/init-script.sh"

But in databricks the file is created in --> dbfs:/databricks/init/init-script.sh/InitScript.sh

I can do this with calling the API and making a put.Maybe is interestng to create a Method to uploadfiles directly with the a concrete name.

Thank you very much.

No option to remove whole folder/workspace

API does offer DELETE command for workspaces:
https://docs.databricks.com/api/latest/workspace.html#delete

This cmdlet has been missed in this fantastic module.
Beside simple option of removing the old code from Databricks, I seriously consider to exec that action before IMPORT a folder with notebooks as IMPORT then is trillion times faster than "updating" existing notebooks at the target.
For some reason when you do that - Databricks do remove all steps and deploy it from the source. That delete operation might last dozens of seconds OR raise a timeout giving a misleading message:

The service at /api/2.0/workspace/import is temporarily unavailable. Please try again later.

This happens when you have ~400 steps in the notebook.

Scope with Azure KeyVault must have userAADToken defined

Add-DatabricksSecretScope -BearerToken "dapi" -Region "westeurope" -ScopeName "key-vault-secrets" -KeyVaultResourceId "/subscriptions//resourceGroups//providers/Microsoft.KeyVault/vaults/" -AllUserAccess

Results in an error: Invoke-RestMethod : {"error_code":"INVALID_PARAMETER_VALUE","message":"Scope with Azure KeyVault must have userAADToken defined!"}

Issue using AAD in the "Databricks Deploy Notebooks" release

Via the "Databricks Deploy Notebooks" task I want to deploy various notebooks to a workspace. I use the service principal authentication instead of bearer. However nothing is deployed using AAD, but using Bearer it is deployed. Did i miss a something?

2020-01-23T14:45:58.5076945Z ##[section]Starting: Deploy Databricks Notebooks
2020-01-23T14:45:58.5169933Z ==============================================================================
2020-01-23T14:45:58.5169981Z Task : Databricks Deploy Notebooks
2020-01-23T14:45:58.5170009Z Description : Deploys a folder of Notebooks to your Databricks workspace
2020-01-23T14:45:58.5170053Z Version : 0.8.1305
2020-01-23T14:45:58.5170082Z Author : Data Thirst Ltd
2020-01-23T14:45:58.5170108Z Help :
2020-01-23T14:45:58.5170135Z ==============================================================================
2020-01-23T14:45:59.9883482Z Connecting via AAD
2020-01-23T14:46:00.6614433Z Clean false
2020-01-23T14:46:00.6616324Z Running without clean
2020-01-23T14:46:07.7127556Z
2020-01-23T14:46:09.2780923Z
2020-01-23T14:46:09.3114822Z ##[section]Finishing: Deploy Databricks Notebooks

New-DatabricksCluster not using bearertoken to authenticate

Scenario -
I want to get all the clusters from one databricks workspace using Get-DatabricksClusters and publish them to a different workspace using New-DatabricksCluster. However at no point does New-DatabricksWorkspace make use of the parameter BearerToken when calling Invoke-DatabricksAPI.

$Response = Invoke-DatabricksAPI -Method POST -Body $Body -API "/api/2.0/clusters/create"
return $Response.cluster_id
}
if ($Mode -eq "edit"){
$ExistingClusterConfig = RemoveClusterMeta $ExistingClusterConfig
$CompareBody = RemoveClusterMeta $Body
if ((HashCompare $ExistingClusterConfig $CompareBody) -gt 0){
$Response = Invoke-DatabricksAPI -Method POST -Body $Body -API "/api/2.0/clusters/edit"

This confused me a little bit as bearertoken is mandatory?

[parameter(Mandatory = $true, ParameterSetName='Bearer')]

At any rate, I assume this leads to Invoke-DatabricksAPI to use some other bearertoken that exists inthe PowerShell session to be used instead?

Method to run existing job

I'd like to run a previously created job which doesn't contain any schedule.
I can't find that method.

Get-DatabricksServicePrincipals returns 404

Hello

Not sure I understand how to use Get-DatabricksServicePrincipals because I get 404
I run this code:

$instanceDb = Get-AzResource `
		-Name $NAME `
		-ExpandProperties
			
$tokenConf = Connect-Databricks ..........
Get-DatabricksServicePrincipals `
		-BearerToken  $tokenConf.token_value `
		-Region $Region `
		-DatabricksId $InstanceDb.properties.workspaceId 

I checked with only:

Get-DatabricksServicePrincipals

Works fine. But this is not what I want to do

What I am missing?

thanks

Issue with running on ADO build agents

Hello, the Get-FolderContents code seems to work locally (Windows 10) but when run on an ADO build agent it gives me the following:

image

Something appears to be going on with the JWTs specifically run on vs2017-win2016 build agents. Super confusing why this is a problem.

Deploy Databricks Job Not found

2019-10-18

I installed the extension in Azure DevOps today, but not able to find the task to deploy jobs. It does have the options for deploying notebooks but not jobs.

Set DatabricksPermission example is misleading

I think that the example listed below is misleading:
EXAMPLE 1
Set-DatabricksPermission -BearerToken $BearerToken -Region $Region -Principal "MyTestGroup" -PermissionLevel 'CAN_MANAGE' -DatabricksObjectType 'Cluster' -DatabricksObjectId "tubby-1234"

At the moment there is no way to set the Cluster-level permissions like Can Attach To, Can Restart or Can Manage by either Databricks REST API or cli.

Please, correct me if I'm wrong.

Thanks,
Zac

Clean workspace switch not working

Issue description
When Import-DatabricksFolder is invoked with the clean switch it should delete the specified Databricks workspace folder and all files within it before copying new files. However, it does not work as documented.

Steps to reproduce

  1. Create a deployment pipeline in Azure DevOps. Ensure that the "Clean" checkbox is ticked.
  2. Manually create a notebook or folder in that workspace in Databricks
  3. Run the deployment from Azure DevOps

Expected result
Only the new files deployed are avaiable in the Databricks workspace

Actual result
The manually created file is still in the workspace

Additional information
I believe this issue was introduced by placing the Get-DatabricksWorkspaceFolder call inside the Trap statement in https://github.com/DataThirstLtd/azure.databricks.cicd.tools/pull/114/files. This change appears to have been added to ensure that the clean switch handles the scenario where there is no workspace to clean and the Databricks API returns a 404 error, but accidentally placing this inside the trap means it is actually never called and files and folders are never cleaned from an existing workspace.

Pester 5 cannot run tests

I know there's issues around Pester 5 running older test suites, but the _InvokeTests specifies a minimum version of 4.4.2. Could we set a maximumversion of 4.9.0 as well until someone somewhere has the spare time to re-write the tests.

Direct Download with Encoding Request Results in Incorrect Encoding

Am downloading notebooks via Export-DatabricksFolder and am seeing a weird encoding issue in some of the regular expressions used in the notebooks:

If I have a notebook with £ symbols and upload or edit in the portal they appear as expected:
image

However when I download them with Export-DatabricksFolder I get an extra symbol:

image

I have noticed that if in Set-LocalNotebook I set direct_download to false and get the contents and convert from base64 then the encoding issue is not there. See example of how to get contents and decode on this gist (I know I'm not dealing with the first comment but this was just a test.)

So if we were to directly download the file to a temp file, then get-content and munge the content that we want and save to the permanent file then the weird formatting error goes away.
So replace the line below

$Response = (Invoke-RestMethod -Method Get -Uri $uri -Headers $Headers) -split '\n' | Select-Object -Skip 1

With something like this -

$LocalExportPath = Join-Path $LocalOutputPath $LocalExportPath
$tempLocalExportPath = Join-Path $LocalOutputPath $tempLocalExportPath

#Databricks exports with a comment line in the header, remove this and ensure we have Windows line endings
Invoke-RestMethod -Method Get -Uri $uri -Headers $Headers -OutFile $tempLocalExportPath
$Response = (Get-Content $tempLocalExportPath -Encoding UTF8 -Raw) -split '\n' | Select-Object -Skip 1
Remove-Item $tempLocalExportPath

I've tested it with a single notebook and a repo with the initial problem and the issue has gone away, plus no other consequential changes are appearing in other notebooks.

image

Will create a PR with changes, I don't think there are other consequences with changing this function.

There is no Stop-DatabricksJob

Similar to Start-DatabricksJob it would be nice to be able to Stop-DatabricksJob (whether it is possible with the job name, job id, or the actual job run id)

When PowerShell 5 Is Obsolete We Can Make Things Better

We can combine -SkipHeaderValidation -ContentType "charset=utf-8" to avoid having to use a temp file to save a file. So all this

Invoke-RestMethod -Method Get -Uri $uri -Headers $Headers -OutFile $tempLocalExportPath
$Response = Get-Content $tempLocalExportPath -Encoding UTF8
$Response = $response.Replace("# Databricks notebook source", " ")
Remove-Item $tempLocalExportPath
if ($Format -eq "SOURCE"){
$Response = ($Response.replace("[^`r]`n", "`r`n") -Join "`r`n")
}
Write-Verbose "Creating file $LocalExportPath"
New-Item -force -path $LocalExportPath -value $Response -type file | out-null

can be condensed into

 $Response = (Invoke-RestMethod -Method Get -Uri $uri -Headers $Headers -SkipHeaderValidation -ContentType "charset=utf-8") -split '\n' | Select-Object -Skip 1
            if ($Format -eq "SOURCE") {
                $Response = ($Response.replace("[^`r]`n", "`r`n") -Join "`r`n")
            }

            Write-Verbose "Creating file $LocalExportPath"
            New-Item -force -path $LocalExportPath -value $Response -type file

Which is what it used to look like.

Sadly SkipHeaderValidation is not availablein PowerShell 5. Ho hum.

Add FilePattern param to Import-DatabricksFolder method

Would be good to have param like -FilePattern for this command to be able filtering files in the selected path.
Example:
Import-DatabricksFolder -BearerToken $token -Region $region -LocalPath 'c:\repo\Databricks\' -DatabricksPath '/Shared/Project1' -FilePattern 'ag*.py'

Add flag to Get-DatabricksClusters for excluding "Job clusters"

We had an issue in my organization after doing some experimenting about running notebooks on Job Clusters from Data Facotry. We tried to deploy all clusters to another Databricks Workspace, and the process crashed since those Job Clusters were linked to a non-existing Cluster Pool in the target environment.

We never though that Get-DatabricksClusters will retrieve also "Job Clusters", since they are created on the spot by Data Factory and there is no way to delete them from the UI, getting the impression that "Job Clusters" section was just a "history" of job clusters used before.

I think it would be a nice addition to the module to add a flag "-ExcludeJobClusters" to the "Get-DatabricksClusters" function, to solve this issue and create some awareness to users and avoid problems.

The implementation would be super easy since it only requires a filtering of the retrieved clusters before returning them.

Enhancement request - Invoke-DatabricksAPIJSON ?

Hi Guys

Would it be possible to get support for Invoke-DatabricksAPI to accept a JSON string directly or get a new module that accepts JSON instead of a hash table?
The reasoning behind the request is I would like to store a cluster configuration in its original JSON format in source control.
Currently to deploy it, I would have to convert the original JSON format to a hash table, then pass it to the function, which then converts it back to a JSON.

Cheers
Mat

Time sleeping value in Import-DatabricksFolder.ps1 is too low to prevent 429 errors when executing function with -clean flag.

if ($Clean) { try { $ExistingFiles = Get-DatabricksWorkspaceFolder -Path $DatabricksPath } catch [System.Net.WebException] { # A 404 response is expected if the specified workspace does not exist in Databricks # In this case, there will be no existing files to clean so the exception can be safely ignored } foreach ($f in $ExistingFiles) { if ($f.object_type -eq "DIRECTORY") { Write-Verbose "Removing directory $($f.path)" Remove-DatabricksNotebook -Path $f.path -Recursive } else { Write-Verbose "Removing file $($f.path)" Remove-DatabricksNotebook -Path $f.path } Start-Sleep -Milliseconds 200 # Prevent 429 responses } }

This is the execution block of the -clean flag in the Import-DatabricksFolder.ps1 function. The StartSleep is set to 200 Milliseconds which works fine on my local machine; but I get a 429 error running on a ADO build agent. I'm wondering if the sleep time needs to be upped here, or preferably made into an optional parameter with a default value of 200.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.