Code Monkey home page Code Monkey logo

harlequin-databricks's Introduction

harlequin-databricks

PyPI Conda Python Version Tests Tests on Windows Code Style: Black License: MIT Downloads

A Harlequin adapter for Databricks. Supports connecting to Databricks SQL warehouses or Databricks Runtime (DBR) interactive clusters.

harlequin-databricks

Installation

harlequin-databricks depends on harlequin, so installing this package will also install Harlequin.

Using pip

To install this adapter into an activated virtual environment:

pip install harlequin-databricks

Using poetry

poetry add harlequin-databricks

Using pipx

If you do not already have Harlequin installed:

pipx install harlequin-databricks

If you would like to add the Databricks adapter to an existing Harlequin installation:

pipx inject harlequin harlequin-databricks

As an Extra

Alternatively, you can install Harlequin with the databricks extra:

pip install harlequin[databricks]
poetry add harlequin[databricks]
pipx install harlequin[databricks]

Connecting to Databricks

To connect to Databricks you are going to need to provide as CLI arguments:

  • server-hostname
  • http-path
  • credentials for one of the following authentication methods:
    • a personal access token (PAT)
    • a username and password
    • an OAuth U2M type
    • a service principle client ID and secret for OAuth M2M

Personal Access Token (PAT) authentication:

harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --access-token dabpi***

Username and password (basic) authentication:

harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --username *** --password ***

OAuth U2M authentication:

For OAuth user-to-machine (U2M) authentication supply either databricks-oauth or azure-oauth to the --auth-type CLI argument:

harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --auth-type databricks-oauth

OAuth M2M authentication:

For OAuth machine-to-machine (M2M) authentication you need to pip install databricks-sdk as an additional dependency (databricks-sdk is an optional dependency of harlequin-databricks) and supply --client-id and --client-secret CLI arguments:

harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --client-id *** --client-secret ***

Store an alias for your connection string

We recommend you include an alias for your connection string in your .bash_profile/.zprofile so you can launch harlequin-databricks with a short command like hdb each time.

Run this command (once) to create the alias:

echo 'alias hdb="harlequin -a databricks --server-hostname ***.cloud.databricks.com --http-path /sql/1.0/endpoints/*** --access-token dabpi***"' >> .bash_profile    

Using Unity Catalog and want fast Data Catalog indexing?

Supply the --skip-legacy-indexing command line flag if you do not care about legacy metastores (e.g. hive_metastore) being indexed in Harlequin's Data Catalog pane.

This flag will skip indexing of old non-Unity Catalog metastores (i.e. they won't appear in the Data Catalog pane with this flag).

Because of the way legacy Databricks metastores works, a separate SQL query is required to fetch the metadata of each table in a legacy metastore. This means indexing them for Harlequin's Data Catalog pane is slow.

Databricks's Unity Catalog upgrade brought Information Schema, which allows harlequin-databricks to fetch metadata for all Unity Catalog assets with only two SQL queries.

So if your Databricks instance is running Unity Catalog, and you no longer care about the legacy metastores, setting the --skip-legacy-indexing CLI flag is recommended as it will mean much faster indexing & refreshing of the assets in the Data Catalog pane.

Initialization Scripts

Each time you start Harlequin, it will execute SQL commands from a Databricks initialization script. For example:

USE CATALOG my_catalog;
SET TIME ZONE 'Asia/Tokyo';
DECLARE yesterday DATE DEFAULT CURRENT_DATE - INTERVAL '1' DAY;

Multi-line SQL is allowed, but must be terminated by a semicolon.

Configuring the Script Location

By default, Harlequin will execute the script found at ~/.databricksrc. However, you can provide a different path using the --init-path option (aliased to -i or -init):

harlequin -a databricks --init-path /path/to/my/script.sql

Disabling Initialization

If you would like to open Harlequin without running the script you have at ~/.databricksrc, you can either pass a nonexistent path (or /dev/null) to the option above, or start Harlequin with the --no-init option:

harlequin -a databricks --no-init

Other CLI options:

For more details on other command line options, run:

harlequin --help

For more information, see the harlequin-databricks Docs.

Issues, Contributions and Feature Requests

Please report bugs/issues with this adapter via the GitHub issues page. You are welcome to attempt fixes yourself by forking this repo then opening a PR.

For feature suggestions, please post in the discussions.

Special thanks to...

Ted Conbeer, Josh Temple & Tyler Hillery.

harlequin-databricks's People

Contributors

alexmalins avatar zashirah avatar

Stargazers

mahiki avatar  avatar Luca Fedrizzi avatar Nick avatar  avatar  avatar Ted Conbeer avatar

Watchers

 avatar

harlequin-databricks's Issues

codec can;t decode byte error with Unity Catalog

What is the issue?

When loading the data catalog - unity catalog, I receive this error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2481: character maps to <undefined>

Steps to reproduce

  1. Load a unity catalog connection

Environment

This error DOES NOT occur on WSL, but in powershell and cmd prompt.

OS version: Windows 10
Python version: 3.12
harlequin-databricks version: 0.3.0
harlequin version: 1.21.0
Installed via: pip

Additional context

Unable to load catalog

What is the issue?

Catalog never loads if skip_legacy_indexing if not set in the config,
or if skip_legacy_indexing is set to true in config harlequin freezez
after loading the catalog.
Happens when using either token or oauth authetication.
We have Unity Catalog.

Steps to reproduce

I don't know if I can provide a minimal example it happens when i try
to connect.

Environment

OS version: Windows 11
Python version: 3.11.9
harlequin-databricks version: 0.3.0
harlequin version: 1.21.0
Installed via: pip

Additional context

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.