Code Monkey home page Code Monkey logo

branch-check's Introduction

branch-check-dbx

This is a sample project for Databricks, generated via cookiecutter.

While using this project, you need Python 3.X and pip or conda for package management.

Local environment setup

  1. Instantiate a local Python environment via a tool of your choice. This example is based on conda, but you can use any environment management tool:
conda create -n branch_check_dbx python=3.9
conda activate branch_check_dbx
  1. If you don't have JDK installed on your local machine, install it (in this example we use conda-based installation):
conda install -c conda-forge openjdk=11.0.15
  1. Install project locally (this will also install dev requirements):
pip install -e ".[local,test]"

Running unit tests

For unit testing, please use pytest:

pytest tests/unit --cov

Please check the directory tests/unit for more details on how to use unit tests. In the tests/unit/conftest.py you'll also find useful testing primitives, such as local Spark instance with Delta support, local MLflow and DBUtils fixture.

Running integration tests

There are two options for running integration tests:

  • On an all-purpose cluster via dbx execute
  • On a job cluster via dbx launch

For quicker startup of the job clusters we recommend using instance pools (AWS, Azure, GCP).

For an integration test on all-purpose cluster, use the following command:

dbx execute <workflow-name> --cluster-name=<name of all-purpose cluster>

To execute a task inside multitask job, use the following command:

dbx execute <workflow-name> \
    --cluster-name=<name of all-purpose cluster> \
    --job=<name of the job to test> \
    --task=<task-key-from-job-definition>

For a test on a job cluster, deploy the job assets and then launch a run from them:

dbx deploy <workflow-name> --assets-only
dbx launch <workflow-name>  --from-assets --trace

Interactive execution and development on Databricks clusters

  1. dbx expects that cluster for interactive execution supports %pip and %conda magic commands.
  2. Please configure your workflow (and tasks inside it) in conf/deployment.yml file.
  3. To execute the code interactively, provide either --cluster-id or --cluster-name.
dbx execute <workflow-name> \
    --cluster-name="<some-cluster-name>"

Multiple users also can use the same cluster for development. Libraries will be isolated per each user execution context.

Working with notebooks and Repos

To start working with your notebooks from a Repos, do the following steps:

  1. Add your git provider token to your user settings in Databricks
  2. Add your repository to Repos. This could be done via UI, or via CLI command below:
databricks repos create --url <your repo URL> --provider <your-provider>

This command will create your personal repository under /Repos/<username>/branch_check_dbx. 3. Use git_source in your job definition as described here

CI/CD pipeline settings

Please set the following secrets or environment variables for your CI provider:

  • DATABRICKS_HOST
  • DATABRICKS_TOKEN

Testing and releasing via CI pipeline

  • To trigger the CI pipeline, simply push your code to the repository. If CI provider is correctly set, it shall trigger the general testing pipeline
  • To trigger the release pipeline, get the current version from the branch_check_dbx/__init__.py file and tag the current code version:
git tag -a v<your-project-version> -m "Release tag for version <your-project-version>"
git push origin --tags

branch-check's People

Contributors

annamalaic5i avatar

Watchers

Anna avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.