Code Monkey home page Code Monkey logo

dremio-module's Introduction

Dremio Module

This module configures a Dremio cluster for data access in Fybrik. The module can either deploy a new Dremio cluster or use an existing one. The module first adds a S3 bucket to Dremio's catalog, and then promote to an Iceberg dataset. It then registers a governed virtual dataset based on the governance policy.

Before you begin

Ensure that you have the following:

  • Helm 3.3 or greater must be installed and configured on your machine.
  • Kubectl 1.18 or newer must be installed on your machine.
  • Access to a Kubernetes cluster such as Kind as a cluster administrator.

Install fybrik

Install Fybrik v1.1 using the Quick Start, without the section of Install modules.

Install Dremio (Optional)

You can install a Dremio cluster using the following command:

helm install <chart-name> charts/dremio-module/charts/dremio-cluster

Register the Fybrik module:

In dremio-module.yaml can you specify the host and port of an already existing Dremio cluster. If Dremio is deployed using the previous step, you can set the Dremio parameters in dremio-module.yaml as the following:

dremio.host: "dremio-client.<namespace of the Dremio chart>.svc.cluster.local"
dremio.port: "9047"

Alternatively, you can ask fybrik to deploy a new Dremio cluster. To that end, set the Dremio parameters in dremio-module.yaml as the following:

dremio.host: "dremio-client.fybrik-blueprints.svc.cluster.local"
dremio.port: "9047"
dremio.enabled: "true"

Either way, apply the fybrik module using the following command:

kubectl apply -f dremio-module.yaml -n fybrik-system

Create Iceberg asset

TBD

Create namespace

kubectl create namespace fybrik-sample
kubectl config set-context --current --namespace=fybrik-sample

Register Iceberg asset

Replace the values of endpoint, bucket, and object_key in sample/asset-iceberg.yaml file according to your created asset. Then, add the asset to the internal catalog using the following command:

kubectl apply -f sample/asset-iceberg.yaml

The asset has been marked as a finance data and the column _c1 has been marked with PII tag.

Register Iceberg access secret

First, create a K8S secret for the credentials for accessing the Iceberg table. Assuming the credentials are stored in as the environment variables ACCESS_KEY and SECRET_KEY respectivley, this can be done by:

kubectl create secret generic iceberg-dataset --from-literal=access_key=${ACCESS_KEY} --from-literal=secret_key=${SECRET_KEY}

You should also create a secret for accessing the Dremio cluster:

kubectl apply -f sample/secret-dremio.yaml

Define data access policy

Register a policy. The example policy removes columns tagged as PII from datasets marked as finance.

kubectl -n fybrik-system create configmap sample-policy --from-file=sample/sample-policy.rego
kubectl -n fybrik-system label configmap sample-policy openpolicyagent.org/policy=rego
while [[ $(kubectl get cm sample-policy -n fybrik-system -o 'jsonpath={.metadata.annotations.openpolicyagent\.org/policy-status}') != '{"status":"ok"}' ]]; do echo "waiting for policy to be applied" && sleep 5; done

Deploy Fybrik application

The following fybrikapplication deploys a Dremio cluster (if specificed so by the Dremio-module) and configures it via a k8s job, which registers the Iceberg asset in Dremio and applies the policy to create a virtual dataset.

kubectl apply -f sample/fybrikapplication.yaml

Wait for the fybrikapplication to be ready (could take a few minutes):

while [[ ($(kubectl get fybrikapplication fybrik-iceberg-sample -o 'jsonpath={.status.ready}') != "true") || ($(kubectl get jobs fybrik-iceberg-sample-fybrik-sample-dremio-module -n fybrik-blueprints -o 'jsonpath={.status.conditions[0].type}') != "Complete") ]]; do echo "waiting for FybrikApplication" && sleep 5; done

Use port-forward to access Dremio

kubectl port-forward svc/dremio-client -n <ns-of-Dremio> 9047:9047 &

You can access Dremio via the browser on http://localhost:9047/, use the following credentials: "name": "newUser", "password": "testpassword123"

You can enter into the Space-api space then select the sample-iceberg-vds virtual dataset that was created by the module accoring to the polices.

You can also query the data set using the sample/query_sample.py, for instance:

python sample/query.py --query '{"sql": "SELECT _c0 FROM \"Space-api\".\"sample-iceberg-vds\""}'

Cleanup

  1. Stop kubectl port-forward processes (e.g., using pkill kubectl)
  2. Delete the fybrikapplication:
    kubectl delete -f sample_assets/fybrikapplication.yaml
  3. Delete the fybrik-sample namespace:
    kubectl delete namespace fybrik-sample
  4. Delete the policy created in the fybrik-system namespace:
    NS="fybrik-system"; kubectl -n $NS get configmap | awk '/sample/{print $1}' | xargs  kubectl delete -n $NS configmap

dremio-module's People

Contributors

mohammad-nassar10 avatar tomersolomon1 avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.