agirish / drill-helm-charts Goto Github PK

View Code? Open in Web Editor NEW

16.0 2.0 12.0 196 KB

Helm Charts to Deploy Apache Drill on Kubernetes

License: Apache License 2.0

Dockerfile 31.10% Shell 68.90%

apache-drill drill kubernetes helm helm-charts helm-chart zookeeper apache-zookeeper drill-helm-charts drill-pods

drill-helm-charts's People

Contributors

Stargazers

Watchers

Forkers

rubik-ai kenneyhe paullouisb dimensie10 missaouiahmed hpeezmeral thinkingmachines farhan-latif mashraf92 fabianoa shfshihuafeng

drill-helm-charts's Issues

Override Drill Config not working

I updated the drill-override.conf with the following content:

store: {
  parquet: {
    reader: {
      int96_as_timestamp: true
    }
  }
}

(I tried with several variations, including plane store.parquet.reader.int96_as_timestamp on one line)

I launched the command to create the config map in the namespace (before deploying the chart)
I have the following configmap:

Name:         drill-config-cm
Namespace:    drill
Labels:       <none>
Annotations:  <none>

Data
====
drill-env.sh:
----

drill-override.conf:
----
store: {
  parquet: {
    reader: {
      int96_as_timestamp: true
    }
  }
}

BinaryData
====

Events:  <none>

and I change the values.yaml to allow override.
and it seems to be taken into account, as inside the drill pods, I have this:

sh-4.2# cat /opt/drill/conf/drill-override.conf
store: {
  parquet: {
    reader: {
      int96_as_timestamp: true
    }
  }
}sh-4.2#

However, once I connect to drill web UI, I still have the value false:

resource mapping not found for name: "zk-role" : invalid RBAC api version

Hello,

I cloned the repo, changed the name of the default namespace to "drill" and the type of environment to 'on-prem'
I changed the the service to use ClusterIP instead of nodePort.

When I run it against the namespace 'drill' I just created, I get:

Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: [resource mapping not found for name: "zk-role" namespace: "drill" from "": no matches for kind "Role" in version "rbac.authorization.k8s.io/v1beta1"
ensure CRDs are installed first, resource mapping not found for name: "drill-role" namespace: "drill" from "": no matches for kind "Role" in version "rbac.authorization.k8s.io/v1beta1"
ensure CRDs are installed first, resource mapping not found for name: "zk-rb" namespace: "drill" from "": no matches for kind "RoleBinding" in version "rbac.authorization.k8s.io/v1beta1"
ensure CRDs are installed first, resource mapping not found for name: "drill-rb" namespace: "drill" from "": no matches for kind "RoleBinding" in version "rbac.authorization.k8s.io/v1beta1"
ensure CRDs are installed first]

when using helm install drill1 drill/

I've found out (not yet fluent in helm) that it is related to the RBAC version (thanks to this fix), because my k8s version uses api v1 and not v1beta1 anymore

After changing the RBAC api version in 4 locations it works.

I don't know yet what is the right 'helm'-way to do that, otherwise I would propose a fix (and switching to v1 is not the solution as some people still uses v1beta1)

FYI:

Client Version: v1.24.0
Kustomize Version: v4.5.4
Server Version: v1.23.6

drill-env.sh configs doesn't seem to be in effect.

Hi.

First off I want to thank you for such contribution. It's hard to find Drill on k8s guidelines.

So I have deployed your helm chart on my on-prem cluster consisted of 1 master node and 3 slave nodes.

2 of the slave nodes have sufficient memory (32gbs) so I have set nodeAffinity to deploy drillbit pods only on those two nodes.

Our main purpose of using Drill is as a SQL query engine for RDBMS (Oracle alike) for a BI tool of our own.

I have set the memory values in values.yaml drill.memory as 13Gi and left cpu as default (4000m).

Reason being, I read thru the docs of Apache Drill and set drill-env.sh as:

export DRILL_PID_DIR="/opt/drill"
export DRILLBIT_MAX_PROC_MEM=13G
export DRILL_HEAP=4G
export DRILL_MAX_DIRECT_MEMORY=8G

However, my question is that there is no difference to the performance of a query on a large data set (size of 27 Gi).

When everything was default and I haven't set up custom drill-env.sh, and drill.memory in values.yaml was only 5Gi, the query took a little over 4 minutes and it still takes the same time.

When I look at 'Metrics' tab it looks like this when running the query:

Also here is the node status with the drillbit pod running a query. Memory request does not change even after start running my SQL query.

So I am guessing drill-env.sh configurations doesn't seem to be in effect. (HEAP_MEM, MAX_DIRECT_MEM, ..etc).

Also, for our cluster set up, is 3 zookeeper pods enough, one for each slave node?

I have experienced that, when there was only one zookeeper pod deployed, multithreading of query execution was not possible. As in, query execution had to wait until query in progress was complete. However, after I increased the replicas of zookeeper pods in the statefulset, I could run two queries concurrently.

I am also curious if one drillbit pod takes care of one query all by itself, even the load of the query is large? or does the drillbit pods split up the workload in parallel?

Lastly, I am curious about the rule-of-thumb to set how many drillbit and zookeeper pods. For example in our cluster each of the slave nodes has specs of
1:
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 6 Cores 12 Threads
32Gib RAM

2:
Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz 4 Cores 4 Threads
8Gib RAM

3:
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 6 Cores 12 Threads
32Gib RAM

What should be the pod memory to CPU ratio for drillbit pods?

Thanks so much for reading!!

How do you configure "Actively Used Direct (Estimate)" memory in "Metrics" tab of Drill Web UI?

Cross-posted from the apache drill official repo.

link: apache/drill#2789

drillbits showing "Not Available" for memory

Hey Agirish,

Thanks for your hard work on Drill and k8s support. I noticed that for me the drill bits always say Not Available under Heap Memory Usage, etc on the Drill web UI.

On yours it says Not available for only one of the drillbits, but on mine it shows that for all the drillbits in the cluster. Do you know how to get the memory to display like you did in the first drillbit?

Also, how did you get the hostname to report the FQDN? I know it's only resolvable within the cluster, but getting the namespace.svc.cluster.local part would be good.

Thanks again for any input.

agirish / drill-helm-charts Goto Github PK

drill-helm-charts's People

Contributors

Stargazers

Watchers

Forkers

drill-helm-charts's Issues

Override Drill Config not working

resource mapping not found for name: "zk-role" : invalid RBAC api version

drill-env.sh configs doesn't seem to be in effect.

How do you configure "Actively Used Direct (Estimate)" memory in "Metrics" tab of Drill Web UI?

drillbits showing "Not Available" for memory

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent