nuodb / nuodb-helm-charts Goto Github PK

View Code? Open in Web Editor NEW

8.0 17.0 9.0 14.51 MB

NuoDB Helm Chart for Kubernetes & OpenShift

License: BSD 3-Clause "New" or "Revised" License

Shell 9.60% Smarty 8.84% Go 79.92% HCL 0.01% Python 1.63%

nuodb database helm helm-charts kubernetes k8s

nuodb-helm-charts's People

Contributors

Stargazers

Watchers

Forkers

kmabda ksrinitemenos natarajan-r niktj777 zlecsos akiros001 bthuntercn adriansuarez

nuodb-helm-charts's Issues

Fixed an issue preventing Helm Chart release upgrade

UPGRADE FAILED: StatefulSet.apps "admin-o4z7ys-nuodb-cluster0" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

TE/SM statefulset/deployment names

Hi,

When deploying nuodb helm admin/database charts we override/set (among other things):
admin.fullnameOverride=nuodbsvc-admin
database.fullnameOverride=nuodbsvc-inventory-database
admin.domain=nuodbsvc

We would like to see the pods prefixed with the admin domain name in order to segregate pod that belong to the same admin domain based on the prefix, to make them easy to spot when having other pods in kubernetes that's don't belong to nuodb and integrate well with the way we deploy nuodb in our system.

$ kubectl get pod
NAME READY STATUS RESTARTS AGE
nuodbsvc-admin-0 1/1 Running 0 1h
nuodbsvc-inventory-database-sm-0 1/1 Running 0 2m
nuodbsvc-inventory-database-te-6d476c6c6-5fltd 1/1 Running 0 2m
nuodbsvc-job-lb-policy-nearest-nzdmd 0/1 Completed 0 1h

Currently:
SM statefulset name is defined as following
name: sm-{{ template "database.fullname" . }}
TE deployment name:
name: te-{{ template "database.fullname" . }}

Would it be okay to change the above to:
name: {{ template "database.fullname" . }}-sm
name: {{ template "database.fullname" . }}-te

respectively in order to get the desired pod name, opinion?

Thanks

Forbidden!Configured service account doesn't have access. Service account may have been revoked. daemonsets.apps is forbidden

NuoDB version: 4.0.7
NuoDB Helm charts: 2.4.0

Failure Description: NuoDB admin KAA module does not start

Log:

2020-08-27T14:48:33.553+0000 WARN  io.fabric8.kubernetes.client.informers.cache.Controller informer-controller-DaemonSet Reflector list-watching job exiting because the thread-pool is shutting down
java.util.concurrent.RejectedExecutionException: Error while starting ReflectorRunnable watch
        at io.fabric8.kubernetes.client.informers.cache.Reflector.listAndWatch(Reflector.java:85)
        at io.fabric8.kubernetes.client.informers.cache.Controller.run(Controller.java:112)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.RejectedExecutionException: Error while doing ReflectorRunnable list
        at io.fabric8.kubernetes.client.informers.cache.Reflector.getList(Reflector.java:73)
        at io.fabric8.kubernetes.client.informers.cache.Reflector.reListAndSync(Reflector.java:94)
        at io.fabric8.kubernetes.client.informers.cache.Reflector.listAndWatch(Reflector.java:80)
        ... 2 common frames omitted
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://10.96.0.1/apis/apps/v1/namespaces/testadminscaledown-riqkzt/daemonsets. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. daemonsets.apps is forbidden: User "system:serviceaccount:testadminscaledown-riqkzt:nuodb" cannot list resource "daemonsets" in API group "apps" in the namespace "testadminscaledown-riqkzt".
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:505)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:412)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.listRequestHelper(BaseOperation.java:166)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.list(BaseOperation.java:640)
        at io.fabric8.kubernetes.client.informers.SharedInformerFactory$1.list(SharedInformerFactory.java:161)
        at io.fabric8.kubernetes.client.informers.SharedInformerFactory$1.list(SharedInformerFactory.java:154)
        at io.fabric8.kubernetes.client.informers.cache.Reflector.getList(Reflector.java:67)
        ... 4 common frames omitted

admin does not allow topologySpreadConstraints

database has values options to set topologySpreadConstraints which is better for managing spreading nodes across availability zones.

YCSB: Number of threads can not be selected

Currently, the YCSB chart hard-codes the number of threads to 2. We wanna make it customizable.

YCSB: Isolation Level can not be selected

We should enable users of the YCSB demo/workload to use either Read Committed or Consistent Read (default) via a helm parameter

TLS Custom Key mount does not work in OCP 3.11

NuoDB admin does not start up with "Shutting down server due to mismatching initial membership" during upgrade to NuoDB 4.0.7

Describe the bug
The bootstrapServers label and nuodb.com/bootstrap-servers annotation, which allow an initial membership to be defined containing an arbitrary number of members, breaks upgrade from deployments prior to 2.4.0 of nuodb-helm-charts or from NuoDB releases prior to 4.0.7 to 4.0.7 with 2.4.0 of the nuodb-helm-charts.

Context
NuoDB Version: 4.0.6 to 4.0.7
Helm Charts Version: 2.4.0
Kubernetes Version: any
Environment: any

To Reproduce
Steps to reproduce the behavior:

Create domain with NuoDB 4.0.7 and helm charts 2.3.1
upgrade to newer helm charts

Expected behavior
NuoDB upgrades as expected

database.persistence should be database.sm.persistence

This property in the values.yaml of the database chart is misleading as it only applies to SMs.

A comment that it is defining the archive volume would be useful also, it is not IMHO obvious what it is for given its current location.

Helm chart repository seems to be broken

This seems to be the reason why TestUpgradeHelmFullDB in the continuous tests is broken. None of the past versions of the NuoDB Helm charts are available in the NuoDB repository:

$ helm repo add nuodb http://storage.googleapis.com/nuodb-charts
"nuodb" has been added to your repositories

$ helm search repo nuodb
NAME                      	CHART VERSION	APP VERSION	DESCRIPTION                                       
nuodb/admin               	3.0.0        	4.0.0      	Administration tier for NuoDB.                    
nuodb/database            	3.0.0        	4.0.0      	NuoDB distributed SQL database.                   
nuodb/restore             	3.0.0        	4.0.0      	On-demand restore a NuoDB SQL database.           
nuodb/storage-class       	3.0.0        	4.0.0      	Storage classes for NuoDB.                        
nuodb/transparent-hugepage	3.0.0        	4.0.0      	Disable disables transparent_hugepage on Linux ...

$ helm install nuodb/admin --version=2.4.1 --generate-name 
Error: failed to download "nuodb/admin" (hint: running `helm repo update` may help)

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nuodb" chart repository
Update Complete. ⎈ Happy Helming!⎈ 

$ helm install nuodb/admin --version=2.4.1 --generate-name 
Error: failed to download "nuodb/admin" (hint: running `helm repo update` may help)

Small change to README

This paragraph is misleading:

If you already have an OpenShift project you want to use, select it as follows:

oc project ${TILLER_NAMESPACE}

I think it should be:

If you already have an OpenShift project you want to use, select it as follows:

export TILLER_NAMESPACE=<your-existing-namespace>
oc project ${TILLER_NAMESPACE}

Document Vault integration

This is a request for a standalone howto document for the Vault integration.

What about non OpenShift users?

This section assumes you have OpenShift and the oc utility.

You do not need oc to use Helm and we should show how to setup and remove Tiller for generic Kubernetes, which doesn't have Projects either.

Catastrophic loss of admin-0 disk leads to a disjoint domain

When you delete the disk (PV/PVC) of the RAFT log of admin-0 in a multi-admin domain, the newly recreated admin pod will not join the existing RAFT domain. It will instead create a fresh domain consisting of self.

Error in admin helm chart?

The admin helm chart creates two containers. One disables THP, the other runs watch on the THP settings file every 10mins:

Why is the watch container useful? What is is for?
The watch container does not mount /sys so it cannot access the THP settings file and the log contains file not found errors.

Engine does not reconnect to a restarted admin container

Describe the bug
After a few seconds the admin gives up on waiting for the engine processes to reconnect. Logging looks like this

2021-08-12T18:06:23.390+0000 INFO  [admin-31wu94-nuodb-cluster0-0:processManagerScheduled29-1] ProcessManager Removing process with connectKey=1647498425067523578, startId=0, reason=Timed out (60000ms) awaiting connection from process: connectedState=PENDING_RECONNECT, removeAction=NONE, exitCode=null
2021-08-12T18:06:23.394+0000 INFO  [admin-31wu94-nuodb-cluster0-0:processManagerScheduled29-1] ProcessManager Removing process with connectKey=1026677207646886664, startId=1, reason=Timed out (60000ms) awaiting connection from process: connectedState=PENDING_RECONNECT, removeAction=NONE, exitCode=null

2021-08-12T18:06:23.484+0000 WARN  [:RetryingRaftClientExec4-1] DomainProcessStateMachine admin-31wu94-nuodb-cluster0-0: No entry found for process with startId=1

Also

2021-08-12T18:07:15.393+0000 WARN  [admin-31wu94-nuodb-cluster0-0:tagServerExecutor31-2] ProcessManager Evicting unknown reconnecting process EngineNode{databaseName=demo, address=te-database-klkfbv-nuodb-cluster0-demo-69d99ff6bd-fkn7j, port=48006, type=TE, pid=44, state=RUNNING, nodeId=2, version=4.2.1.vee-3, ipAddress=10.1.5.157, hostname=te-database-klkfbv-nuodb-cluster0-demo-69d99ff6bd-fkn7j}
java.lang.NullPointerException: Expected to find durable process with startId=1
	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:987)
	at com.nuodb.host.ProcessManager$AdminConnectionImpl.setAltAddress(ProcessManager.java:952)
	at com.nuodb.host.ProcessManager$AdminConnectionImpl.reconnect(ProcessManager.java:910)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.nuodb.server.TagMessageDispatcher.dispatch(TagMessageDispatcher.java:74)
	at com.nuodb.server.TagMessageDispatcher.dispatch(TagMessageDispatcher.java:44)
	at com.nuodb.server.BoundMessageDispatcher.dispatch(BoundMessageDispatcher.java:28)
	at com.nuodb.host.requests.RequestDispatcher.dispatch(RequestDispatcher.java:97)
	at com.nuodb.host.requests.RequestDispatcher.dispatch(RequestDispatcher.java:34)
	at com.nuodb.server.BoundMessageDispatcher.dispatch(BoundMessageDispatcher.java:28)
	at com.nuodb.server.Server.consumeMessages(Server.java:152)
	at com.nuodb.server.Server.lambda$acceptConnections$1(Server.java:121)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at com.nuodb.util.Threading$1.lambda$wrapTarget$0(Threading.java:29)
	at java.base/java.lang.Thread.run(Thread.java:829)

Context
NuoDB Version: any
Helm Charts Version: any
Kubernetes Version: 1.15+
Environment: any

To Reproduce
Steps to reproduce the behavior:

Install the NuoDB admin
install the NuoDB database
restart one or more admin pods

Expected behavior
The engine reconnects successfully.

Additional context
Add any other context about the problem here.

External Reference
ZenDesk/JIRA numbers if available
JIRA: DB-33760

Longer readiness timeout in admin chart

Change readinessTimeoutSeconds to 5s (currently is 1) to match same value in database charts.

Teaching Temenos today we had exactly this issue. Extending the timeout fixed it.

Example commands are using Helm 2 syntax

Describe the bug

All the examples are using helm 2 syntax.

The helm 3 syntax is helm install [NAME] [CHART] [flags]. Your examples are all of the form helm install [CHART] ... and use --name to specify the installation name (which is helm 2 syntax).

Context
NuoDB Version: any 4.x
Helm Charts Version: all - I have no idea when (or if) these charts are supposed to target helm 3.
Kubernetes Version: any
Environment: any

To Reproduce
Just look at any of the README.md documents. In particular the README.md in the root of project and the README.md in each chart. Several charts have multiple example commands.

Expected behavior
Either explicitly mention that you are using helm 2 syntax, or upgrade to helm 3 syntax or show both. Your call.

External Reference
Is a Jira required?

Database-wide options are missing in the database chart

  ## database-wide options.
  # These are applied using the --database-options on the startup command
  # change these to values appropriate for this database
  # these options are applied to all processes in the database.
  options:
    ping-timeout: 60
    max-lost-archives: 0

At startup, "job-lb-policy-nearest-*" pods sometimes fail the first few tries

K8s v1.15.5 running in minikube.

tgates@tgu19:~/nuodb/nuodb-helm-operator/deploy$ kc get pods 
NAME                                   READY   STATUS      RESTARTS   AGE
example-admin-nuodb-cluster0-0         1/1     Running     0          98s
job-lb-policy-nearest-6wlwc            0/1     Error       0          73s
job-lb-policy-nearest-rgnjx            0/1     Error       0          98s
job-lb-policy-nearest-xxsl9            0/1     Completed   0          63s
nuodb-helm-operator-7cf7c858d5-462mj   1/1     Running     0          2m58s
tgates@tgu19:~/nuodb/nuodb-helm-operator/deploy$

One of the failed tries:

tgates@tgu19:~/nuodb/nuodb-helm-operator/deploy$ kc logs job-lb-policy-nearest-rgnjx
Unable to connect to https://nuodb.nuodb.svc:8888: HTTPSConnectionPool(host='nuodb.nuodb.svc', port=8888): Max retries exceeded with url: /api/1/databases/loadBalancerPolicy/nearest (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7efd2d1ad410>: Failed to establish a new connection: [Errno -2] Name or service not known',))
tgates@tgu19:~/nuodb/nuodb-helm-operator/deploy$

Maybe there is some kind of race condition before the service is created?

Enabling SM backup pod without backup SMs causes job crash loop

Enabling the SM backup pod without providing any backup SMs will cause the initial backup job to fail and be rescheduled instantly, so hundreds of failed jobs will start piling up. This is probably not what we want.