Code Monkey home page Code Monkey logo

terraform-databricks-examples's Introduction

terraform-databricks-examples

This repository contains the following:

  • Multiple examples of Databricks workspace and resources deployment on Azure, AWS and GCP using Databricks Terraform provider.
  • Examples of implementing CI/CD pipelines to automate your Terraform deployments using Azure DevOps or GitHub Actions.

Using the repository

There are two ways to use this repository:

  1. Use examples as a reference for your own Terraform code: Please refer to examples folder for individual examples.
  2. Reuse modules from this repository: Please refer to modules folder.

Repository structure

Code in the repository is organized into the following folders:

  • modules - implementation of specific Terraform modules.
  • examples - specific instances that use Terraform modules.
  • cicd-pipelines - Detailed examples of implementing CI/CD pipelines to automate your Terraform deployments using Azure DevOps or GitHub Actions.

Repository content

Note
For detailed information about the examples, modules, or CI/CD pipelines, refer to README.md file inside the corresponding folder for a detailed guide on setting up the CI/CD pipeline.

Examples

The folder examples contains the following Terraform implementation examples :

Cloud Example Description
Azure adb-lakehouse Lakehouse terraform blueprints
Azure adb-with-private-link-standard Provisioning Databricks on Azure with Private Link - Standard deployment
Azure adb-vnet-injection A basic example of VNet injected Azure Databricks workspace
Azure adb-exfiltration-protection A sample implementation of Data Exfiltration Protection
Azure adb-external-hive-metastore Example template to implement external hive metastore
Azure adb-kafka ADB - single node kafka template
Azure adb-private-links Azure Databricks Private Links
Azure adb-splunk ADB workspace with single VM splunk integration
Azure adb-squid-proxy ADB clusters with HTTP proxy
Azure adb-teradata ADB with single VM Teradata integration
Azure adb-uc ADB Unity Catalog Process
Azure adb-unity-catalog-basic-demo ADB Unity Catalog end-to-end demo including UC metastore setup, Users/groups sync from AAD to databricks account, UC Catalog, External locations, Schemas, & Access Grants
Azure adb-overwatch Overwatch multi-workspace deployment on Azure
AWS aws-workspace-basic Provisioning AWS Databricks E2
AWS aws-workspace-with-firewall Provisioning AWS Databricks E2 with an AWS Firewall
AWS aws-exfiltration-protection An implementation of Data Exfiltration Protection on AWS
AWS aws-workspace-with-private-link Coming soon
AWS aws-databricks-flat AWS Databricks simple example
AWS aws-databricks-modular-privatelink Deploy multiple AWS Databricks workspaces
AWS aws-workspace-uc-simple Provisioning AWS Databricks E2 with Unity Catalog in a single apply
AWS aws-databricks-uc AWS UC
AWS aws-databricks-uc-bootstrap AWS UC
AWS aws-remote-backend-infra Simple example on remote backend
AWS aws-workspace-config Configure workspace objects
GCP gcp-sa-provisionning Provisionning of the identity with the permissions required to deploy on GCP.
GCP gcp-basic Workspace Deployment with managed vpc
GCP gcp-byovpc Workspace Deployment with customer-managed vpc

Modules

The folder modules contains the following Terraform modules :

Cloud Module Description
All databricks-department-clusters Terraform module that creates Databricks resources for a team
Azure adb-lakehouse Lakehouse terraform blueprints
Azure adb-lakehouse-uc Provisioning Unity Catalog resources and accounts principals
Azure adb-with-private-link-standard Provisioning Databricks on Azure with Private Link - Standard deployment
Azure adb-exfiltration-protection A sample implementation of Data Exfiltration Protection
Azure adb-with-private-links-exfiltration-protection Provisioning Databricks on Azure with Private Link and Data Exfiltration Protection
Azure adb-overwatch-regional-config Overwatch regional configuration on Azure
Azure adb-overwatch-mws-config Overwatch multi-workspace deployment on Azure
Azure adb-overwatch-main-ws Main Overwatch workspace deployment
Azure adb-overwatch-ws-to-monitor Overwatch deployment on the Azure workspace to monitor
Azure adb-overwatch-analysis Overwatch analysis notebooks deployment on Azure
AWS aws-workspace-basic Provisioning AWS Databricks E2
AWS aws-databricks-base-infra Provisioning AWS Infrastructure to be used for the deployment of a Databricks E2 workspace
AWS aws-databricks-unity-catalog Provisioning the AWS Infrastructure and setting up the metastore for Databricks Unity Catalog
AWS aws-databricks-workspace Provisioning AWS Databricks E2 Workspace using pre-created AWS Infra
AWS aws-workspace-with-firewall Provisioning AWS Databricks E2 with an AWS Firewall
AWS aws-exfiltration-protection An implementation of Data Exfiltration Protection on AWS
AWS aws-workspace-with-private-link Coming soon
GCP gcp-sa-provisionning Provisions the identity (SA) with the correct permissions
GCP gcp-workspace-basic Provisions a workspace with managed VPC
GCP gcp-workspace-byovpc Workspace with customer-managed VPC.

CI/CD pipelines

The cicd-pipelines folder contains the following implementation examples of pipeline:

Tool CI/CD Pipeline
GitHub Actions manual-approve-with-github-actions
Azure DevOps manual-approve-with-azure-devops

Contributing

When contributing the new code, please follow the structure described in the Repository content section:

  • The reusable code should go into the modules directory to be easily included when it's published to the Terraform registry. Prefer to implement the modular design consisting of multiple smaller modules implementing a specific functionality vs. one big module that does everything. For example, a separate module for Unity Catalog objects could be used across all clouds, so we won't need to reimplement the same functionality in cloud-specific modules.
  • Provide examples of module usage in the examples directory - it should show how to use the given module(s).

terraform-databricks-examples's People

Contributors

alekscallebat avatar alexott avatar andrew-tolbert avatar audunsolemdal avatar ayoubbenz-db avatar bricksterlad avatar cran1um avatar csimplestring avatar databeaks avatar dleiva04 avatar dspatoulas avatar flaviomalavazi avatar gchandra10 avatar hectorcast-db avatar himanshuaroradb avatar hwang-db avatar kangshung avatar mwojtyczka avatar nfx avatar nkvuong avatar pavs23 avatar pietern avatar polynomialherder avatar r3stl355 avatar victorcolome avatar yessawab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

terraform-databricks-examples's Issues

AWS Front End Private Link [Enhancement]

It'd be awesome to add an example for AWS end to end private link, including front end. Particularly awesome would be if it created a test VM that could be used to test it.

TY!

adb-with-private-link-standard unable to connect to workspace unless adding a CNAME in private dns zone

I've been following example https://github.com/databricks/terraform-databricks-examples/tree/main/modules/adb-with-private-link-standard, but I can't seem to be able to login to UI.

When clicking "launch" on databricks worksapce new tab opens, my workspace url flashes for a bit in address bar, then it changes to westeurope-c2.pl-auth.azuredatabricks.net/aad/redirect and I get an error that website can't be reached.

If I try to resolve the DNS for westeurope-c2.pl-auth.azuredatabricks.net while connected to VPN, it can't be resolved, if I add a CNAME record pointing westeurope-c2.pl-auth.azuredatabricks.net to westeurope-c2.azuredatabricks.net in my privatelink.azuredatabricks.com dns zone, then I am able to login, but it feels like a hack, since I have not seen such a requirement in the provided example.

aws-workspace-uc-simple example not working

Trying out aws-workspace-uc-simple from (link) and just running a terraform plan is already giving errors

│ Error: Unsupported argument │ │ on .terraform/modules/examples_example_aws-workspace-uc-simple.aws_base.vpc/main.tf line 32, in resource "aws_vpc" "this": │ 32: enable_classiclink = var.enable_classiclink │ │ An argument named "enable_classiclink" is not │ expected here. ╵ ╷ │ Error: Unsupported argument │ │ on .terraform/modules/examples_example_aws-workspace-uc-simple.aws_base.vpc/main.tf line 33, in resource "aws_vpc" "this": │ 33: enable_classiclink_dns_support = var.enable_classiclink_dns_support │ │ An argument named "enable_classiclink_dns_support" is │ not expected here. ╵ ╷ │ Error: Unsupported argument │ │ on .terraform/modules/examples_example_aws-workspace-uc-simple.aws_base.vpc/main.tf line 1306, in resource "aws_default_vpc" "this": │ 1306: enable_classiclink = var.default_vpc_enable_classiclink │ │ An argument named "enable_classiclink" is not │ expected here.

Add Schema Level Delta Sharing to Terraform Databricks Module

We're planning to add share automation to our system, since this module allows us to only share tables, we need to loop over and run terraform again when there is new data, it restrict us to sharing without an effort. So we're requesting that schema level sharing(it is possible to do using UI but not in module.)

Regards,

Please clarify license

Hello

Your LICENSE file mentions Copyright 2022 Databricks, Inc. All rights reserved.. This is usually a way to introduce a commercial license and can be ambiguous.

I'd suggest to replace this with the license grant suggestion shown line 192 of the LICENSE file.

Rework ADB UC modules

We need to reorganize resources a bit to make things more modular. For example, it makes sense to split modules/adb-lakehouse-uc/uc-metastore/ into metastore creation & workspace assignment, so we can easily assign multiple workspaces to metastore. Also, look what else we can improve.

Make creation of resource group in modules optional

In many deployments resource groups could be already pre-created by infra teams and actual deployment principal may not have an ability to create new resource group. It would be useful to make creation of resource group optional, allowing to pass the resource group name & location as parameters.

Create separate modules for hub/spoke networks in Hub/Spoke examples

Right now, modules like adb-exfiltration-protection, aws-exfiltration-protection, and others contain everything - both hub and spoke environments + workspaces definition. It's not very modular, so you can't deploy additional spokes, or additional workspaces into the existing spokes.

It would be really useful to create separate modules, to implement something like shown on the picture below:

  • Hub module - VNet/VPC, Firewall, ...
  • Spoke module - VNet/VPC, shared resources, like, subnets for private links, etc.
  • Workspace module for deployment into a specified spoke

Terraform diagrams

cannot read databricks_group from azure devops

image

Error when issue command from azure devops " cannot read group: default auth: azure-cli: cannot get access token: ERROR: Please run 'az login' to setup account."
No issue if issue command from local. Appreciate for any advice!
image

Architecture Diagram for 'Provisioning AWS Databricks E2 with an AWS Firewall' invalid

Hi!

I presume the goal of https://github.com/databricks/terraform-databricks-examples/blob/v0.2.17/examples/aws-workspace-with-firewall/README.md is to provision several Databricks clusters with AZ replication as distinct services across separate subnets.

The problem is the CIDR blocks of 10.4.0.0/22 and 10.4.1.0/22 collide because the first subnet captures all up to 10.4.3.254. So there's a collision in the defined subnets.

Not sure if this is my misinterpretation but makes the diagram seem inaccurate. Can someone clarify the expectations here?

`databricks_connection` resource is throwing `Error: cannot read connection: invalid ID:` after creating a connection for Lakehouse Federation

When creating a new external connection for Lakehouse Federation, terraform creates the resource but fails to retrieve the ID, throwing the message:

│ Error: cannot read connection: invalid ID: 
│ 
│   with databricks_connection.postgresql,
│   on unity_catalog_management.tf line 8, in resource "databricks_connection" "postgresql":
│    8: resource "databricks_connection" "postgresql" {
│ 
╵
Releasing state lock. This may take a few moments...

After trying to apply the same config again, it throws the "Connection already exists" error

Sample code to reproduce:

terraform {
  required_providers {

    databricks = {
      source  = "databricks/databricks"
      version = "1.35.0"
    }
}
}

provider "databricks" {
  auth_type = "azure-cli"
  alias     = "workspace"
  host      = module.databricks_workspace.workspace_url
}

resource "databricks_connection" "postgresql" {
  provider        = databricks.workspace
  name            = "lakehouse_federation_postgresql_tf"
  connection_type = "POSTGRESQL"
  comment         = "PostgreSQL Lakehouse Federation Connection"
  options = {
    host     = "HOST"
    port     = "5432"
    user     = user
    password = pwd
  }
}

Article 'Provisioning Azure Databricks with Private Link - Simplified Deployment' has some unnecessary references

The Provisioning Azure Databricks with Private Link - Simplified Deployment has references to two Azure resources that are not required for this deployment and lead to confusion with customers. In the Deploy Azure VNet and Subnets section, the TF example includes the following two resources:

resource "azurerm_network_security_rule" "aad" {
  name                        = "AllowAAD"
  priority                    = 200
  direction                   = "Outbound"
  access                      = "Allow"
  protocol                    = "Tcp"
  source_port_range           = "*"
  destination_port_range      = "443"
  source_address_prefix       = "VirtualNetwork"
  destination_address_prefix  = "AzureActiveDirectory"
  resource_group_name         = var.rg_name
  network_security_group_name = azurerm_network_security_group.this.name
}

resource "azurerm_network_security_rule" "azfrontdoor" {
  name                        = "AllowAzureFrontDoor"
  priority                    = 201
  direction                   = "Outbound"
  access                      = "Allow"
  protocol                    = "Tcp"
  source_port_range           = "*"
  destination_port_range      = "443"
  source_address_prefix       = "VirtualNetwork"
  destination_address_prefix  = "AzureFrontDoor.Frontend"
  resource_group_name         = var.rg_name
  network_security_group_name = azurerm_network_security_group.this.name
}

According to some Databricks SMEs, these are left over from a Preview implementation and no longer required since all traffic traverses via the front-end and back-end Private Endpoints. There may be other errors in this TF example (do we need reference to an NSG at all, since there should be no NSG associated with private traffic?) that should be reviewed.

adb-unity-catalog-basic-demo fails creating with "Error: cannot create metastore data access"

Please refer to https://stackoverflow.com/questions/77440091/databricks-unity-catalog-error-cannot-create-metastore-data-access

I am wondering if we should create Metastore without root storage that way we force the root storage at a catalog level to enforce strict seperation between catalogs. This seems to be a best practive and recommendation from Data bricks anyway. This will render the Metastore creation process to a minimum.

DBFS with private link does not work

Hi, I am using the adb-with-private-links-exfiltration-protection module, in the example it said:

firewallfqdn = [                                                      // we don't need scc relay and dbfs fqdn since they will go to private endpoint
  "dbartifactsprodseap.blob.core.windows.net",                        //databricks artifacts
  "dbartifactsprodeap.blob.core.windows.net",                         //databricks artifacts secondary
  "dblogprodseasia.blob.core.windows.net",                            //log blob
  "prod-southeastasia-observabilityeventhubs.servicebus.windows.net", //eventhub
  "cdnjs.com",                                                        //ganglia
]

"we don't need scc relay and dbfs fqdn since they will go to private endpoint, however, the dbfs is still not accessible when creating cluster. I have to manually add the DBFS fqdn in firewall, then it works.

Is it intended behaviour ?

Make VPC and subnet creation optional

In some aws accounts the creation of VPCs and Subnets may be blocked by a policy.

It would be great if the modules would have the create_vpc option as a variable to provide a VPC that has been created outside of the module.

adb-unity-catalog-basic-demo needs a workspace to begin with

adb-unity-catalog-basic-demo needs a workspace to begin with but with the new Unity by default feature we get a metastore created when the first workspace is launched in Azure Databricks for a new account.

I had to delete my metastore from admin console to get this demo running.

We need to think about this flow and create a metastore + workspace as part of the example

Issue: azure storage container creation error when creating UC

Hi by following the example https://github.com/databricks/terraform-databricks-examples/blob/main/examples/adb-uc/stage_3_spn_deploys_uc/storage.tf

tf apply gives error: Error: containers.Client#GetProperties: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailure" Message="This request is not authorized to perform this operation.\nRequestId:832e8aff-101e-00de-0b85-814f5f000000\nTime:2023-05-08T08:15:08.4533933Z"

How would I update this to use Azure VPN Gateway?

I would like to use best practices. I don't have express route. Was planning on Azure VPN Gateway to connect via the Azure VPN Client. I made a first pass, by putting the gateway in the hub vnet, but ran into connectivity issues as I was attempting to access eastus.pl-auth.privatelink.azuredatabricks.net (I had to move all resources to eastus closer to me). Anyway, the issue I ran into was the below. But unlike other custom apps, I don't have an app registration I can update the redirect URI (all I see is a single enterprise app record).

Any advice is appreciated.

AADSTS50011: The redirect URI 'https://eastus.pl-auth.privatelink.azuredatabricks.net/login.html' specified in the request does not match the redirect URIs configured for the application '2ff814a6-3304-4ab8-85cb-cd0e6f879c1d'. Make sure the redirect URI sent in the request matches one added to your application in the Azure portal. Navigate to https://aka.ms/redirectUriMismatchError to learn more about how to fix this.

Below is my current scripts:

vpn_gateway.tf

# --- VPN Gateway Configuration ---

# Create a public IP address for VPN Gateway
resource "azurerm_public_ip" "vpn_gateway_ip" {
  name                = "vpnGatewayPublicIP"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
  allocation_method   = "Dynamic"
  sku                 = "Basic"
}

# Create the VPN Gateway
resource "azurerm_virtual_network_gateway" "vpn_gateway" {
  name                = "vpnGateway"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name

  type     = "Vpn"
  vpn_type = "RouteBased"
  sku      = "VpnGw1"

  ip_configuration {
    name                          = "vpnGatewayConfig"
    public_ip_address_id          = azurerm_public_ip.vpn_gateway_ip.id
    private_ip_address_allocation = "Dynamic"
    subnet_id                     = azurerm_subnet.hubvpngw.id # Referencing the GatewaySubnet here
  }

  vpn_client_configuration {
    address_space = ["172.16.0.0/24"] # Client address pool
    vpn_client_protocols = ["OpenVPN"]

    # Azure AD Configuration for Authentication
    aad_tenant   = "https://login.microsoftonline.com/${var.AZURE_TENANT_ID}" # Azure AD tenant ID
    aad_issuer   = "https://sts.windows.net/${var.AZURE_TENANT_ID}/" # Azure AD issuer URL
    aad_audience = "41b23e61-6c1e-4545-b367-cd054e0ed4b4" # Azure AD audience
  }
}

My vnet.tf script is below (edited to include the gateway subnet expected by my vpn gateway):

vnet.tf

resource "azurerm_virtual_network" "this" {
  name                = "${local.prefix}-vnet"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
  address_space       = [local.cidr]
  tags                = local.tags
}

resource "azurerm_network_security_group" "this" {
  name                = "${local.prefix}-nsg"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
  tags                = local.tags
}

resource "azurerm_network_security_rule" "aad" {
  name                        = "AllowAAD"
  priority                    = 200
  direction                   = "Outbound"
  access                      = "Allow"
  protocol                    = "Tcp"
  source_port_range           = "*"
  destination_port_range      = "443"
  source_address_prefix       = "VirtualNetwork"
  destination_address_prefix  = "AzureActiveDirectory"
  resource_group_name         = azurerm_resource_group.this.name
  network_security_group_name = azurerm_network_security_group.this.name
}

resource "azurerm_network_security_rule" "azfrontdoor" {
  name                        = "AllowAzureFrontDoor"
  priority                    = 201
  direction                   = "Outbound"
  access                      = "Allow"
  protocol                    = "Tcp"
  source_port_range           = "*"
  destination_port_range      = "443"
  source_address_prefix       = "VirtualNetwork"
  destination_address_prefix  = "AzureFrontDoor.Frontend"
  resource_group_name         = azurerm_resource_group.this.name
  network_security_group_name = azurerm_network_security_group.this.name
}

resource "azurerm_subnet" "public" {
  name                 = "${local.prefix}-public"
  resource_group_name  = azurerm_resource_group.this.name
  virtual_network_name = azurerm_virtual_network.this.name
  address_prefixes     = [cidrsubnet(local.cidr, 3, 0)]

  delegation {
    name = "databricks"
    service_delegation {
      name = "Microsoft.Databricks/workspaces"
      actions = [
        "Microsoft.Network/virtualNetworks/subnets/join/action",
        "Microsoft.Network/virtualNetworks/subnets/prepareNetworkPolicies/action",
      "Microsoft.Network/virtualNetworks/subnets/unprepareNetworkPolicies/action"]
    }
  }
}

resource "azurerm_subnet_network_security_group_association" "public" {
  subnet_id                 = azurerm_subnet.public.id
  network_security_group_id = azurerm_network_security_group.this.id
}

variable "private_subnet_endpoints" {
  default = []
}

resource "azurerm_subnet" "private" {
  name                 = "${local.prefix}-private"
  resource_group_name  = azurerm_resource_group.this.name
  virtual_network_name = azurerm_virtual_network.this.name
  address_prefixes     = [cidrsubnet(local.cidr, 3, 1)]

  enforce_private_link_endpoint_network_policies = true
  enforce_private_link_service_network_policies  = true

  delegation {
    name = "databricks"
    service_delegation {
      name = "Microsoft.Databricks/workspaces"
      actions = [
        "Microsoft.Network/virtualNetworks/subnets/join/action",
        "Microsoft.Network/virtualNetworks/subnets/prepareNetworkPolicies/action",
      "Microsoft.Network/virtualNetworks/subnets/unprepareNetworkPolicies/action"]
    }
  }

  service_endpoints = var.private_subnet_endpoints
}

resource "azurerm_subnet_network_security_group_association" "private" {
  subnet_id                 = azurerm_subnet.private.id
  network_security_group_id = azurerm_network_security_group.this.id
}


resource "azurerm_subnet" "plsubnet" {
  name                                           = "${local.prefix}-privatelink"
  resource_group_name                            = azurerm_resource_group.this.name
  virtual_network_name                           = azurerm_virtual_network.this.name
  address_prefixes                               = [cidrsubnet(local.cidr, 3, 2)]
  enforce_private_link_endpoint_network_policies = true // set to true to disable subnet policy
}


resource "azurerm_virtual_network" "hubvnet" {
  name                = "${local.prefix}-hub-vnet"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
  address_space       = [var.hubcidr]
  tags                = local.tags
}

resource "azurerm_subnet" "hubfw" {
  //name must be fixed as AzureFirewallSubnet
  name                 = "AzureFirewallSubnet"
  resource_group_name  = azurerm_resource_group.this.name
  virtual_network_name = azurerm_virtual_network.hubvnet.name
  address_prefixes     = [cidrsubnet(var.hubcidr, 3, 0)]
}

resource "azurerm_virtual_network_peering" "hubvnet" {
  name                      = "peerhubtospoke"
  resource_group_name       = azurerm_resource_group.this.name
  virtual_network_name      = azurerm_virtual_network.hubvnet.name
  remote_virtual_network_id = azurerm_virtual_network.this.id
  allow_gateway_transit = true
}

resource "azurerm_virtual_network_peering" "spokevnet" {
  name                      = "peerspoketohub"
  resource_group_name       = azurerm_resource_group.this.name
  virtual_network_name      = azurerm_virtual_network.this.name
  remote_virtual_network_id = azurerm_virtual_network.hubvnet.id
  use_remote_gateways = true
}

resource "azurerm_subnet" "hubvpngw" {
  name                 = "GatewaySubnet"
  resource_group_name  = azurerm_resource_group.this.name
  virtual_network_name = azurerm_virtual_network.hubvnet.name
  address_prefixes     = [cidrsubnet(var.hubcidr, 3, 1)] # You can change the subnet range accordingly
}

And my firewall.tf updates as well:

firweall.tf

resource "azurerm_public_ip" "fwpublicip" {
  name                = "hubfirewallpublicip"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
  allocation_method   = "Static"
  sku                 = "Standard"
}

resource "azurerm_firewall" "hubfw" {
  name                = "hubfirewall"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
  sku_name            = "AZFW_VNet"
  sku_tier            = "Standard"

  ip_configuration {
    name                 = "configuration"
    subnet_id            = azurerm_subnet.hubfw.id
    public_ip_address_id = azurerm_public_ip.fwpublicip.id
  }
}

resource "azurerm_firewall_network_rule_collection" "adbfnetwork" {
  name                = "adbcontrolplanenetwork"
  azure_firewall_name = azurerm_firewall.hubfw.name
  resource_group_name = azurerm_resource_group.this.name
  priority            = 200
  action              = "Allow"

  rule {
    name = "databricks-metastore"

    source_addresses = [
      join(", ", azurerm_subnet.public.address_prefixes),
      join(", ", azurerm_subnet.private.address_prefixes),
    ]

    destination_ports = ["3306"]
    destination_addresses = [var.metastoreip]
    protocols = ["TCP"]
  }
}

resource "azurerm_firewall_application_rule_collection" "adbfqdn" {
  name                = "adbcontrolplanefqdn"
  azure_firewall_name = azurerm_firewall.hubfw.name
  resource_group_name = azurerm_resource_group.this.name
  priority            = 200
  action              = "Allow"

  rule {
    name = "databricks-control-plane-services"

    source_addresses = [
      join(", ", azurerm_subnet.public.address_prefixes),
      join(", ", azurerm_subnet.private.address_prefixes),
    ]

    target_fqdns = concat(var.firewallfqdn, ["*.azuredatabricks.net"])

    protocol {
      port = "443"
      type = "Https"
    }
  }
}

resource "azurerm_route_table" "adbroute" {
  name                = "spoke-routetable"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name

  route {
    name                   = "to-firewall"
    address_prefix         = "0.0.0.0/0"
    next_hop_type          = "VirtualAppliance"
    next_hop_in_ip_address = azurerm_firewall.hubfw.ip_configuration.0.private_ip_address
  }
}

resource "azurerm_subnet_route_table_association" "publicudr" {
  subnet_id      = azurerm_subnet.public.id
  route_table_id = azurerm_route_table.adbroute.id
}

resource "azurerm_subnet_route_table_association" "privateudr" {
  subnet_id      = azurerm_subnet.private.id
  route_table_id = azurerm_route_table.adbroute.id
}

Add support/example for authentication to databricks via Client ID/Secret for setup via Service Principal Oauth

We'd like to avoid having a specific user's username/password, so we're trying to use Service Principal Oauth secrets instead. The desired flow is:

  1. Admin User creates account on accounts.databricks.com
  2. Admin User's first and only action is creating an Admin Service Principal and generating oauth token
  3. Oath secret is added to Terraform variables (i.e. databricks_account_client_id and databricks_account_client_secret)
  4. All subsequent setup is done by terraform, authenticated as the Admin Service Principal

User does not have CREATE CATALOG on Metastore 'primary' on adb-unity-catalog-basic-demo

databricks_catalog.dev: Creating...

│ Error: cannot create catalog: User does not have CREATE CATALOG on Metastore 'primary'.

│ with databricks_catalog.dev,
│ on main.tf line 140, in resource "databricks_catalog" "dev":
│ 140: resource "databricks_catalog" "dev" {

I had to issue couple of grants to the user running Terraform apply. Even though this user is already federated to the workspace and is a metastore admin. He is also in the account_unity_admin group from AD federated down to the workspace.

GRANT CREATE EXTERNAL LOCATION ON METASTORE TO user;
GRANT CREATE CATALOG ON METASTORE TO user;

Adding a dependency on the IAM Role when creating databricks_mws_credentials

Introduction

I'm creating new Databricks workspace as a hobby using Terraform & AWS.

Issue

The issue I've faced a couple of times during the creation of the workspace, particularly the databricks_mws_credentials resource is that when we try to attach it to the IAM Role, it may not have been created yet. The terraform apply step fails due to the race condition where the aws_iam_role.cross_account_role doesn't exist yet.

I've pulled out the code which caused the issue from the docs: https://docs.databricks.com/dev-tools/terraform/tutorial-aws.html

resource "databricks_mws_credentials" "this" {
  provider         = databricks.mws
  account_id       = var.databricks_account_id
  role_arn         = aws_iam_role.cross_account_role.arn
  credentials_name = "${local.prefix}-creds"
  depends_on       = [aws_iam_role_policy.this]
}

Fix

The simplest fix:

resource "databricks_mws_credentials" "this" {
  provider         = databricks.mws
  account_id       = var.databricks_account_id
  role_arn         = aws_iam_role.cross_account_role.arn
  credentials_name = "${local.prefix}-creds"
  depends_on       = [aws_iam_role_policy.this, aws_iam_role.cross_account_role]
}

I'm willing to raise this PR to update the examples in this repo, someone from @databricks will need to kindly update the docs website.

Thank you,
Pavan

Remove threading from Table ACLs

There is a bug in the Grants API. It does not support concurrent grant/revoke operations.

TableAcl grant/revoke operations are not atomic. When granting the permissions, the service would first get all existing permissions, append with the new permissions and set the full list in the database. If there are concurrent grant requests, both requests might succeed and emit the audit logs, but what actually happens could be that the new permission list from one request overrides the other one, causing permission loss.

See the ES ticket here.

Since this bug in the platform should be fixed in the future. It should be enough to just use 1 thread for the Table ACLs. We can increase it to more threads again once the issue is fixed.

Issue while running the adb-lakehouse example

Thank you so much for building the Terraform registry modules. I am facing some issues with the modules when I use the below terraform.tfvars. I have Contributor rights on the AD App and I am running below command to Auth before running TF Apply. All the resources are getting created but it leaves me with below error messages. Not sure if the deployment has completed or not.

az login --service-principal -u "ae83018d-d8d7-4198-a10f-6187ba1daec1" -p "xyz" --tenant "9f37a392-f0ae-4280-9796-f1864a10effc"

Even though all the resources get deployed in Azure subscription. I still see below errors.

tfvars cofig used (url)


error screenshot

Am I missing some configuration? or some Role permissions?
Am I supposed to use a separate storage account name for an external location?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.