agirish / drill-helm-charts Goto Github PK
View Code? Open in Web Editor NEWHelm Charts to Deploy Apache Drill on Kubernetes
License: Apache License 2.0
Helm Charts to Deploy Apache Drill on Kubernetes
License: Apache License 2.0
I updated the drill-override.conf with the following content:
store: {
parquet: {
reader: {
int96_as_timestamp: true
}
}
}
(I tried with several variations, including plane store.parquet.reader.int96_as_timestamp
on one line)
I launched the command to create the config map in the namespace (before deploying the chart)
I have the following configmap:
Name: drill-config-cm
Namespace: drill
Labels: <none>
Annotations: <none>
Data
====
drill-env.sh:
----
drill-override.conf:
----
store: {
parquet: {
reader: {
int96_as_timestamp: true
}
}
}
BinaryData
====
Events: <none>
and I change the values.yaml to allow override.
and it seems to be taken into account, as inside the drill pods, I have this:
sh-4.2# cat /opt/drill/conf/drill-override.conf
store: {
parquet: {
reader: {
int96_as_timestamp: true
}
}
}sh-4.2#
However, once I connect to drill web UI, I still have the value false:
Hello,
I cloned the repo, changed the name of the default namespace to "drill" and the type of environment to 'on-prem'
I changed the the service to use ClusterIP instead of nodePort.
When I run it against the namespace 'drill' I just created, I get:
Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: [resource mapping not found for name: "zk-role" namespace: "drill" from "": no matches for kind "Role" in version "rbac.authorization.k8s.io/v1beta1"
ensure CRDs are installed first, resource mapping not found for name: "drill-role" namespace: "drill" from "": no matches for kind "Role" in version "rbac.authorization.k8s.io/v1beta1"
ensure CRDs are installed first, resource mapping not found for name: "zk-rb" namespace: "drill" from "": no matches for kind "RoleBinding" in version "rbac.authorization.k8s.io/v1beta1"
ensure CRDs are installed first, resource mapping not found for name: "drill-rb" namespace: "drill" from "": no matches for kind "RoleBinding" in version "rbac.authorization.k8s.io/v1beta1"
ensure CRDs are installed first]
when using helm install drill1 drill/
I've found out (not yet fluent in helm) that it is related to the RBAC version (thanks to this fix), because my k8s version uses api v1 and not v1beta1 anymore
After changing the RBAC api version in 4 locations it works.
I don't know yet what is the right 'helm'-way to do that, otherwise I would propose a fix (and switching to v1 is not the solution as some people still uses v1beta1)
FYI:
Client Version: v1.24.0
Kustomize Version: v4.5.4
Server Version: v1.23.6
Hi.
First off I want to thank you for such contribution. It's hard to find Drill on k8s guidelines.
So I have deployed your helm chart on my on-prem cluster consisted of 1 master node and 3 slave nodes.
2 of the slave nodes have sufficient memory (32gbs) so I have set nodeAffinity to deploy drillbit pods only on those two nodes.
Our main purpose of using Drill is as a SQL query engine for RDBMS (Oracle alike) for a BI tool of our own.
I have set the memory values in values.yaml drill.memory as 13Gi and left cpu as default (4000m).
Reason being, I read thru the docs of Apache Drill and set drill-env.sh as:
export DRILL_PID_DIR="/opt/drill"
export DRILLBIT_MAX_PROC_MEM=13G
export DRILL_HEAP=4G
export DRILL_MAX_DIRECT_MEMORY=8G
However, my question is that there is no difference to the performance of a query on a large data set (size of 27 Gi).
When everything was default and I haven't set up custom drill-env.sh, and drill.memory in values.yaml was only 5Gi, the query took a little over 4 minutes and it still takes the same time.
When I look at 'Metrics' tab it looks like this when running the query:
Also here is the node status with the drillbit pod running a query. Memory request does not change even after start running my SQL query.
So I am guessing drill-env.sh configurations doesn't seem to be in effect. (HEAP_MEM, MAX_DIRECT_MEM, ..etc).
Also, for our cluster set up, is 3 zookeeper pods enough, one for each slave node?
I have experienced that, when there was only one zookeeper pod deployed, multithreading of query execution was not possible. As in, query execution had to wait until query in progress was complete. However, after I increased the replicas of zookeeper pods in the statefulset, I could run two queries concurrently.
I am also curious if one drillbit pod takes care of one query all by itself, even the load of the query is large? or does the drillbit pods split up the workload in parallel?
Lastly, I am curious about the rule-of-thumb to set how many drillbit and zookeeper pods. For example in our cluster each of the slave nodes has specs of
1:
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 6 Cores 12 Threads
32Gib RAM
2:
Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz 4 Cores 4 Threads
8Gib RAM
3:
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 6 Cores 12 Threads
32Gib RAM
What should be the pod memory to CPU ratio for drillbit pods?
Thanks so much for reading!!
Cross-posted from the apache drill official repo.
link: apache/drill#2789
Hey Agirish,
Thanks for your hard work on Drill and k8s support. I noticed that for me the drill bits always say Not Available under Heap Memory Usage, etc on the Drill web UI.
On yours it says Not available for only one of the drillbits, but on mine it shows that for all the drillbits in the cluster. Do you know how to get the memory to display like you did in the first drillbit?
Also, how did you get the hostname to report the FQDN? I know it's only resolvable within the cluster, but getting the namespace.svc.cluster.local part would be good.
Thanks again for any input.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.