Code Monkey home page Code Monkey logo

starrocks-kubernetes-operator's Introduction

StarRocks-Kubernetes-Operator

License

English | 中文

Overview

StarRocks Kubernetes Operator is a project that implements the deployment and operation of StarRocks, a next-generation sub-second MPP OLAP database, on Kubernetes. It facilitates the deployment of StarRocks' Frontend (FE), Backend (BE), and Compute Node (CN) components within your Kubernetes environment. It also includes Helm chart for easy installation and configuration. With StarRocks Kubernetes Operator, you can easily manage the lifecycle of StarRocks clusters, such as installing, scaling, upgrading etc.

Note

The StarRocks k8s operator was designed to be a level 2 operator. See https://sdk.operatorframework.io/docs/overview/operator-capabilities/ to understand more about the capabilities of a level 2 operator.

Prerequisites

  1. Kubernetes version >= 1.18
  2. Helm version >= 3.0

Features

Operator Features

  • Support deploying StarRocks FE, BE and CN components separately FE component is a must-have component, BE and CN components can be optionally deployed
  • Support multiple StarRocks clusters in one Kubernetes cluster
  • Support external clients outside the network of kubernetes to load data into StarRocks using STREAM LOAD
  • Support automatic scaling for CN nodes based on CPU and memory usage
  • Support mounting persistent volumes for StarRocks containers

Helm Chart Features

  • Support Helm Chart for easy installation and configuration
    • using kube-starrocks Helm chart to install both operator and StarRocks cluster
    • using operator Helm Chart to install operator, and using starrocks Helm Chart to install starrocks cluster
  • Support initializing the password of root in your StarRocks cluster during installation.
  • Support integration with other components in the Kubernetes ecosystem, such as Prometheus, Datadog, etc.

Installation

In order to use StarRocks in Kubernetes, you need to install:

  1. StarRocksCluster CRD
  2. StarRocks Operator
  3. StarRocksCluster CR

There are two ways to install Operator and StarRocks Cluster.

  1. Install Operator and StarRocks Cluster by yaml Manifest.
  2. Install Operator and StarRocks Cluster by Helm Chart.

Note: In every release, we will provide the latest version of the yaml Manifest and Helm Chart. You can find them in https://github.com/StarRocks/starrocks-kubernetes-operator/releases

Installation by yaml Manifest

Please see Deploy StarRocks With Operator document for more details.

1. Apply the StarRocksCluster CRD

kubectl apply -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/deploy/starrocks.com_starrocksclusters.yaml

2. Apply the Operator manifest

Apply the Operator manifest. By default, the Operator is configured to install in the starrocks namespace. To use the Operator in a custom namespace, download the Operator manifest and edit all instances of namespace: starrocks to specify your custom namespace. Then apply this version of the manifest to the cluster with kubectl apply -f {local-file-path} instead of using the command below.

kubectl apply -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/deploy/operator.yaml

3. Deploy the StarRocks cluster

You need to prepare a separate yaml file to deploy the StarRocks. The starrocks cluster CRD fields explains in api.md. The examples directory contains some simple example for reference.

You can use any of the template yaml file as a starting point. You can further add more configurations into the template yaml file following this deployment documentation.

For demonstration purpose, we use the starrocks-fe-and-be.yaml example template to start a 3 FE and 3 BE StarRocks cluster.

Here's an example yaml for Docker Desktop with local desktop access with StarRocks 3.2.1 so you can upgrade in later steps.

atwong@Albert-CelerData sroperatortest % cat starrocks-fe-and-be.yaml
apiVersion: starrocks.com/v1
kind: StarRocksCluster
metadata:
  name: starrockscluster-sample
  namespace: starrocks
spec:
  starRocksFeSpec:
    image: starrocks/fe-ubuntu:3.2.1
    replicas: 3
    requests:
      cpu: 1
      memory: 2Gi
    limits:
      cpu: 4
      memory: 16Gi
    service:            
      type: LoadBalancer
  starRocksBeSpec:
    image: starrocks/be-ubuntu:3.2.1
    replicas: 3
    requests:
      cpu: 1
      memory: 2Gi
    limits:
      cpu: 4
      memory: 8Gi
kubectl apply -f starrocks-fe-and-be.yaml

4. Connect the StarRocks cluster

To connect, just use the mysql client and connect to the StarRocks cluster port 9030. An example of a connection is shown below.

Note

If you want to connect remotely or through your desktop, you will need to enable the k8s Load Balander.

atwong@Albert-CelerData sroperatortest % kubectl -n starrocks get svc
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                       AGE
starrockscluster-sample-be-search    ClusterIP      None            <none>        9050/TCP                                                      5m2s
starrockscluster-sample-be-service   ClusterIP      10.103.248.52   <none>        9060/TCP,8040/TCP,9050/TCP,8060/TCP                           5m2s
starrockscluster-sample-fe-search    ClusterIP      None            <none>        9030/TCP                                                      6m22s
starrockscluster-sample-fe-service   LoadBalancer   10.99.14.222    localhost     8030:32326/TCP,9020:32578/TCP,9030:30774/TCP,9010:32505/TCP   6m22s
atwong@Albert-CelerData sroperatortest % mysql -h 127.0.0.1 -P 9030 -uroot
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.1.0 3.2.1-79ee91d

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

5. Upgrade the StarRocks cluster

To upgrade, just patch the StarRocks cluster.

kubectl -n starrocks patch starrockscluster starrockscluster-sample --type='merge' -p '{"spec":{"starRocksFeSpec":{"image":"starrocks/fe-ubuntu:latest"}}}'
kubectl -n starrocks patch starrockscluster starrockscluster-sample --type='merge' -p '{"spec":{"starRocksBeSpec":{"image":"starrocks/be-ubuntu:latest"}}}'

6. Resize the StarRocks cluster

To resize, just patch the StarRocks cluster.

Important

Once you deploy with 3 FE nodes, you are in HA mode. Do not resize FE nodes below 3 since that will affect cluster quorum. This rule doesn't apply to CN nodes.

kubectl -n starrocks patch starrockscluster starrockscluster-sample --type='merge' -p '{"spec":{"starRocksBeSpec":{"replicas":9}}}'

7. Delete/stop the StarRocks cluster

To delete/stop the StarRocks cluster, just execute the delete command.

kubectl delete -f starrocks-fe-and-be.yaml

or

kubectl delete starrockscluster starrockscluster-sample -n starrocks

8. Delete/stop the StarRocks Operator

To delete/stop the StarRocks Operate, just execute the delete command.

kubectl delete -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/deploy/operator.yaml

Installation by Helm Chart

Please see kube-starrocks for how to install both operator and StarRocks cluster by Helm Chart.

If you want more flexibility in managing your StarRocks clusters, you can deploy Operator using operator Helm Chart and StarRocks using starrocks Helm Chart separately.

Other Documents

starrocks-kubernetes-operator's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

starrocks-kubernetes-operator's Issues

[Bug] Updating config should trigger restarting the pod

At present, modifying starrocksBeSpec.config and starrocksFESpec.config merely updates the configmap without restarting the pod, thereby config changes fails to take effect.
The correct approach shall updating the configmap and restarting the pod to effectuate the updated configuration.

[Bug] helm install cannot take sa for FE/BE deployment

Version: 1.3.5

Problems
It seems that it tries to create a sa instead of accepting an existing sa.

Reproduce Steps:

  1. create a sa to a namespace e.g. celostar
  2. set these two fields with the same sa name in values.yaml : starrocksFESpec.serviceAccount starrocksFESpec.serviceAccount
  3. helm install with the values.yaml will get this error:

Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: ServiceAccount "celostar" in namespace "celostar" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "starrocks"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "celostar"

deployed CRD error

k8s version: v1.24.8
An error occurred while adding custom resources
kubectl apply -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/deploy/starrocks.com_starrocksclusters.yaml
error msg is
The CustomResourceDefinition "starrocksclusters.starrocks.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes

pod label update affects frontend service pod selector

need to differentiate builtin labels and user defined labels. Allow users add user-defined labels whatever they want, but only update services pod selector with builtin (operator determined fixed labels) so service selector will be always stable, won't be affected by user-defined labels. won't cause rolling upgrade issue.

kubectl delete starrocks-fe-and-be.yaml

Running the delete command in OpenShift gets me stuck and won't return cursor to me.

atwong@Alberts-MBP ~ % kubectl delete -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/examples/starrocks/starrocks-fe-and-be.yaml
starrockscluster.starrocks.com "starrockscluster-sample" deleted

[Feature] Support configure imagePullSecrets and annotation for the global SA

We need these two fields be configurable via helm chart values.yaml

  • imagePullSecrets: to pull image from private registry
  • annotations: to assume a role from an AWS account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-starrocks
  namespace: celostar
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::xxx:role/yyy <=========
imagePullSecrets:
  - name: docker-ecr-pull <==============
secrets:
  - name: kube-starrocks-dockercfg-h5wzz

[Bug] ValidationError when settting feEnvVars

@shileifu @kevincai
Can you take a look of this issue?
Thank you!

When setting starrocksFESpec.feEnvVars in the values.yaml got this error.
e.g.
image

helm install starrocks . -n celostar --create-namespace \
  -f values-eu-dev-1-cs.yaml --render-subchart-notes
Error: INSTALLATION FAILED: 
unable to build kubernetes objects from release manifest: error validating "": 
error validating data: 
ValidationError(StarRocksCluster.spec.starRocksFeSpec): 
unknown field "feEnvVars" in com.starrocks.v1alpha1.StarRocksCluster.spec.starRocksFeSpec

[Bug] operator wrongly uses `storageSpec.name` in the value.yaml set the wrong filed in the manifest

When setting storageSpec.name in the value.yaml file, it actually initialized it as the volume name instead of storageclassName in the eventual manifest.

values.yaml

    # fe storageSpec for persistent meta data.
    storageSpec:
      # the storageClass name for fe meta persistent volume. if name is empty used emptyDir.
      name: "gp2" <====================

generate statefulset manifest

  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: gp2 <===================== mistakenly set it as the volume name
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
      volumeMode: Filesystem
      storageClassName: gp2 <=================== should set spec.storageClassName

StarRocks Operator on Red Hat Openshift. SCC issues. Trying to create new service account and granting permissions.

How did we get here. StarRocks/starrocks#22767

executed

oc create sa starrocks-sa
oc adm policy add-scc-to-user privileged -z starrocks-sa
oc set sa deploy starrocks-controller starrocks-sa

now I get this error in the starrocks-controller pod

E0428 23:37:25.428177       1 leaderelection.go:330] error retrieving resource lock starrocks/c6c79638.starrocks.com: leases.coordination.k8s.io "c6c79638.starrocks.com" is forbidden: User "system:serviceaccount:starrocks:starrocks-sa" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "starrocks"

[Enhancement] Re-design the failure and recovery mechanism in k8s environment

The failure and recovery mechanism needs to be re-designed for the k8s environment for the following reasons:

  1. StarRocks includes a clone process to address missing replicas in situations where the number of alive replicas is insufficient, such as when a BackEnd (BE) node fails. However, in a Kubernetes (k8s) environment, the failure mode of a BE node is different. Specifically, if a BE fails a k8s liveness probe check, which may be due to reasons like VM node failure, the BE will not be permanently dead. Instead, the k8s control plane will schedule the BE on a different VM node in the form of a stateful set pod.

  2. Data tablets are not lost when a BE is detected as dead. This is because the data tablets associated with the BE are typically stored in a Persistent Volume that is backed by a separate block storage service, such as EBS or Azure Disk Storage. These block storage services have much higher durability than a commodity disk, and they do not necessarily fail when the BE fails. Consequently, when a BE stateful set pod is rescheduled to a different node, the data tablet will still be available via the Persistent Volume.

  3. it is important to note that the end-to-end time that k8s takes to reschedule a new BE might be longer than the 5 minutes threshold that StarRocks uses to consider a BE as dead. Please check the timeline graph of how a pod is reschedule to a new node on node failure.
    image

Ref:

Kubernetes starrocks 2.5.4 be errors

env: Kubernetes v1.26.1
operator: starrocks/operator:latest
deployment: starrocks/fe-ubuntu:2.5.4 starrocks/be-ubuntu:2.5.4

fe work well , be can't be started

[Mon Apr 17 08:00:50 UTC 2023] Empty $CONFIGMAP_MOUNT_PATH env var, skip it!
[Mon Apr 17 08:00:50 UTC 2023] Add myself (starrockscluster-be-0.starrockscluster-be-search.open.svc.k8s.com:9050) into FE ...
ERROR 1064 (HY000) at line 1: Unexpected exception: Same backend already exists[starrockscluster-be-0.starrockscluster-be-search.open.svc.k8s.com:9050]
[Mon Apr 17 08:00:50 UTC 2023] run start_be.sh
/opt/starrocks/be/bin/start_backend.sh: line 185: 807 Illegal instruction (core dumped) ${START_BE_CMD} "$@" >> ${LOG_FILE} 2>&1 < /dev/null

[Enhancement] Use HTTP endpoint for Readiness/Liveness Probes

The current implementation is using tcpSocket for livenessProbe/readinessProbe/startupProbe
This is a less preferred way of implementing Readiness/Liveness probe.
Detailed discussion can be found in this tech blog:
Stop Using TCP Health Checks for Kubernetes Application

Suggest using the http endpoint for k8s probes
Starrocks Official Doc: Check the health status of a cluster

      livenessProbe:
        tcpSocket:
          port: 9050
        timeoutSeconds: 1
        periodSeconds: 5
        successThreshold: 1
        failureThreshold: 3
      readinessProbe:
        tcpSocket:
          port: 9060
        timeoutSeconds: 1
        periodSeconds: 5
        successThreshold: 1
        failureThreshold: 3
      startupProbe:
        tcpSocket:
          port: 8060
        timeoutSeconds: 1
        periodSeconds: 5
        successThreshold: 1
        failureThreshold: 60

Fix the crash in unit test

Way to reproduce the error:
run go test ./... from the repo root dir to reproduce it.

$ go test ./... 
?       github.com/StarRocks/starrocks-kubernetes-operator/cmd  [no test files]
?       github.com/StarRocks/starrocks-kubernetes-operator/common       [no test files]
?       github.com/StarRocks/starrocks-kubernetes-operator/components/offline_job       [no test files]
?       github.com/StarRocks/starrocks-kubernetes-operator/components/register_container        [no test files]
ok      github.com/StarRocks/starrocks-kubernetes-operator/internal/fe  (cached)
I0131 08:29:18.092066 3563771 starrockscluster_controller.go:80] StarRocksClusterReconciler reconciler the update crd name starrockscluster-sample namespace default
I0131 08:29:18.092842 3563771 k8sutils.go:56] Creating resource service namespace default name fe-domain-search kind 
I0131 08:29:18.092858 3563771 k8sutils.go:56] Creating resource service namespace default name starrockscluster-sample-fe-service kind 
I0131 08:29:18.092912 3563771 fe_statefulset.go:68] FeController buildStatefulSetParams [{"metadata":{"name":"fe-storage","creationTimestamp":null},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"10Gi"}},"storageClassName":"shard-data"},"status":{}}]
I0131 08:29:18.093138 3563771 fe_pod.go:75] FeController buildPodTemplate vols [{"name":"fe-storage","persistentVolumeClaim":{"claimName":"fe-storage"}}]
I0131 08:29:18.093281 3563771 k8sutils.go:56] Creating resource service namespace default name starrockscluster-sample-fe kind 
I0131 08:29:18.093305 3563771 statefulset.go:128] the statefulset new hash value 1286400549 old have value 4246122645 oldGeneration 0 new Generation 0
I0131 08:29:18.093309 3563771 fe_controller.go:93] FeController Sync exist statefulset not equals to new statefuslet
I0131 08:29:18.093313 3563771 k8sutils.go:64] Updating resource service namespace default name starrockscluster-sample-fe kind l&TypeMeta{Kind:,APIVersion:,}
I0131 08:29:18.094360 3563771 cn_controller.go:51] CnController Sync the cn component is not needed namespace default starrocks cluster name starrockscluster-sample
I0131 08:29:18.094366 3563771 starrockscluster_controller.go:141] StarRocksClusterReconciler reconcile status.
I0131 08:29:18.094368 3563771 starrockscluster_controller.go:178] StarRocksClusterReconciler reconciler namespace default name starrockscluster-sample
I0131 08:29:18.094440 3563771 starrockscluster_controller.go:80] StarRocksClusterReconciler reconciler the update crd name starrockscluster-sample namespace default
I0131 08:29:18.094542 3563771 k8sutils.go:56] Creating resource service namespace default name fe-domain-search kind 
I0131 08:29:18.094555 3563771 k8sutils.go:56] Creating resource service namespace default name starrockscluster-sample-fe-service kind 
--- FAIL: TestStarRocksClusterReconciler_FeReconcileSuccess (0.00s)
panic: cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$' [recovered]
        panic: cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'

goroutine 14 [running]:
testing.tRunner.func1.2({0x16736e0, 0xc0003ffae0})
        /snap/go/10030/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
        /snap/go/10030/src/testing/testing.go:1399 +0x39f
panic({0x16736e0, 0xc0003ffae0})
        /snap/go/10030/src/runtime/panic.go:884 +0x212
k8s.io/apimachinery/pkg/api/resource.MustParse({0x0, 0x0})
        /home/d.liu/go/pkg/mod/k8s.io/[email protected]/pkg/api/resource/quantity.go:139 +0x1ce
github.com/StarRocks/starrocks-kubernetes-operator/pkg/fe_controller.(*FeController).buildStatefulSetParams(_, _, _)
        /home/d.liu/git/dengliu/starrocks-kubernetes-operator/pkg/fe_controller/fe_statefulset.go:60 +0x8fc
github.com/StarRocks/starrocks-kubernetes-operator/pkg/fe_controller.(*FeController).Sync(0xc000a73920, {0x1b025e8, 0xc000044158}, 0xc000338160)
        /home/d.liu/git/dengliu/starrocks-kubernetes-operator/pkg/fe_controller/fe_controller.go:75 +0x7aa
github.com/StarRocks/starrocks-kubernetes-operator/pkg.(*StarRocksClusterReconciler).Reconcile(0xc000a73590, {0x1b025e8, 0xc000044158}, {{{0x188475d, 0x7}, {0x189baa7, 0x17}}})
        /home/d.liu/git/dengliu/starrocks-kubernetes-operator/pkg/starrockscluster_controller.go:119 +0x522
github.com/StarRocks/starrocks-kubernetes-operator/pkg.TestStarRocksClusterReconciler_FeReconcileSuccess(0x0?)
        /home/d.liu/git/dengliu/starrocks-kubernetes-operator/pkg/starrockscluster_controller_test.go:171 +0x9bc
testing.tRunner(0xc000583380, 0x196d180)
        /snap/go/10030/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
        /snap/go/10030/src/testing/testing.go:1493 +0x35f
FAIL    github.com/StarRocks/starrocks-kubernetes-operator/pkg  0.019s
?       github.com/StarRocks/starrocks-kubernetes-operator/pkg/apis     [no test files]
?       github.com/StarRocks/starrocks-kubernetes-operator/pkg/apis/starrocks/v1alpha1  [no test files]
?       github.com/StarRocks/starrocks-kubernetes-operator/pkg/be_controller    [no test files]
?       github.com/StarRocks/starrocks-kubernetes-operator/pkg/cn_controller    [no test files]
?       github.com/StarRocks/starrocks-kubernetes-operator/pkg/common   [no test files]
?       github.com/StarRocks/starrocks-kubernetes-operator/pkg/common/hash      [no test files]
ok      github.com/StarRocks/starrocks-kubernetes-operator/pkg/common/resource_utils    (cached)
ok      github.com/StarRocks/starrocks-kubernetes-operator/pkg/fe_controller    (cached)
?       github.com/StarRocks/starrocks-kubernetes-operator/pkg/k8sutils [no test files]
?       github.com/StarRocks/starrocks-kubernetes-operator/scripts/docs/template        [no test files]
FAIL

Support Red Hat UBI-8 for base container image

The issue is around container support. Support isn't provided on the container level, it's done on the host level. So Suse will not support (provide big fixes, etc etc) for a Red Hat container (only best effort). So when there are issues between the container (different linux distro) and host (another different linux distro), you have finger pointing on who should fix what. The biggest on prem market is Red Hat.

https://www.redhat.com/en/blog/limits-compatibility-and-supportability-containers

[Bug] delete all fe pod,fe start fail

action

  • k delete pod starrockscluster-fe-0 starrockscluster-fe-1 starrockscluster-fe-2

  • fe version 2.4.3

  • fe.log
    2023-03-22 03:26:52,488 INFO (main|1) [StarRocksFE.start():123] StarRocks FE starting... 2023-03-22 03:26:52,644 INFO (main|1) [FrontendOptions.initAddrUseFqdn():211] Use FQDN init local addr, FQDN: starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local, IP: 10.116.28.181 2023-03-22 03:26:52,876 INFO (main|1) [Auth.grantRoleInternal():822] grant operator to 'root'@'%', isReplay = true 2023-03-22 03:26:52,897 INFO (main|1) [PrivilegeManager.initBuiltinRoleUnlocked():285] create built-in role root[-1] 2023-03-22 03:26:52,902 INFO (main|1) [PrivilegeManager.initBuiltinRoleUnlocked():285] create built-in role db_admin[-2] 2023-03-22 03:26:52,903 INFO (main|1) [PrivilegeManager.initBuiltinRoleUnlocked():285] create built-in role cluster_admin[-3] 2023-03-22 03:26:52,903 INFO (main|1) [PrivilegeManager.initBuiltinRoleUnlocked():285] create built-in role user_admin[-4] 2023-03-22 03:26:52,904 INFO (main|1) [PrivilegeManager.initBuiltinRoleUnlocked():285] create built-in role public[-5] 2023-03-22 03:26:52,904 INFO (main|1) [GlobalStateMgr.initAuth():976] using new privilege framework.. 2023-03-22 03:26:53,094 INFO (main|1) [NodeMgr.getHelperNodes():589] get helper nodes: [starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local:9010] 2023-03-22 03:26:53,144 INFO (main|1) [NodeMgr.getClusterIdAndRoleOnStartup():389] finished to get cluster id: 1639540152, role: FOLLOWER and node name: starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038 2023-03-22 03:26:53,145 INFO (main|1) [BDBEnvironment.ensureHelperInLocal():357] skip check local environment because helper node and local node are identical. 2023-03-22 03:26:53,169 INFO (main|1) [BDBEnvironment.setupEnvironment():267] start to setup bdb environment for 1 times 2023-03-22 03:26:53,553 INFO (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [BDBEnvironment.setupEnvironment():277] add helper[starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local:9010] as ReplicationGroupAdmin 2023-03-22 03:26:53,560 WARN (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [StateChangeExecutor.notifyNewFETypeTransfer():62] notify new FE type transfer: UNKNOWN 2023-03-22 03:26:53,578 INFO (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [BDBEnvironment.setupEnvironment():297] replicated environment is all set, wait for state change... 2023-03-22 03:27:03,580 INFO (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [BDBEnvironment.setupEnvironment():305] state change done, current role UNKNOWN 2023-03-22 03:27:03,585 INFO (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [BDBEnvironment.setupEnvironment():309] end setup bdb environment after 1 times 2023-03-22 03:27:03,588 INFO (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [GlobalStateMgr.loadImage():1293] image does not exist: /opt/starrocks/fe/meta/image/image.0 2023-03-22 03:27:03,589 INFO (stateChangeExecutor|77) [StateChangeExecutor.runOneCycle():85] begin to transfer FE type from INIT to UNKNOWN 2023-03-22 03:27:03,590 INFO (stateChangeExecutor|77) [StateChangeExecutor.runOneCycle():179] finished to transfer FE type to INIT 2023-03-22 03:27:05,589 INFO (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [GlobalStateMgr.waitForReady():1019] wait globalStateMgr to be ready. FE type: INIT. is ready: false 2023-03-22 03:27:07,589 INFO (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [GlobalStateMgr.waitForReady():1019] wait globalStateMgr to be ready. FE type: INIT. is ready: false 2023-03-22 03:27:09,590 INFO (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [GlobalStateMgr.waitForReady():1019] wait globalStateMgr to be ready. FE type: INIT. is ready: false 2023-03-22 03:27:11,590 INFO (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [GlobalStateMgr.waitForReady():1019] wait globalStateMgr to be ready. FE type: INIT. is ready: false 2023-03-22 03:27:13,591 INFO (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [GlobalStateMgr.waitForReady():1019] wait globalStateMgr to be ready. FE type: INIT. is ready: false 2023-03-22 03:27:15,592 INFO (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [GlobalStateMgr.waitForReady():1019] wait globalStateMgr to be ready. FE type: INIT. is ready: false 2023-03-22 03:27:17,592 INFO (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [GlobalStateMgr.waitForReady():1019] wait globalStateMgr to be ready. FE type: INIT. is ready: false

  • fe.warn.log
    2023-03-22 03:26:53,560 WARN (UNKNOWN starrockscluster-fe-0.fe-domain-search.tmp-mgr.svc.cluster.local_9010_1679455382038(-1)|1) [StateChangeExecutor.notifyNewFETypeTransfer():62] notify new FE type transfer: UNKNOWN

Add annotations config for service

Hi team,

Can I set service annotations in config, like:

apiVersion: starrocks.com/v1alpha1
kind: StarRocksCluster
metadata:
  name: example
  namespace: starrocks
spec:
  starRocksFeSpec:
    service:
      type: LoadBalance
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-internal: "true"
        service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
        service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: 'true'
        service.beta.kubernetes.io/aws-load-balancer-type: nlb

Thanks.

[Feature] Support datadog Log collection from file configured in an annotation

Starrocks FE/BE services has multiple sources of logs. It is not appropriate to redirect all logs to stdout and stderr.
A more proper way is to setup log collection from each log file. This will require adding annotation to the pod spec, and mount the log files as hostPath volumes.
It would be great to add a config to the helm template with values.yml values.log.enableDatadog to transparent set up these annotations and hostPath volumes

Ref: https://docs.datadoghq.com/containers/kubernetes/log/?tab=helm#examples---log-collection-from-file-configured-in-an-annotation

e.g.

ad.datadoghq.com/<CONTAINER_IDENTIFIER>.logs: |
  [
    {"type":"file","path":"/opt/starrocks/be/be.INFO","source":"file","service":"starrocks-be"},
    {"type":"file","path":"/opt/starrocks/be/be.WARNING","source":"file","service":"starrocks-be"},
  ]

ad.datadoghq.com/<CONTAINER_IDENTIFIER>.logs: |
  [
    {"type":"file","path":"/opt/starrocks/fe/fe.log","source":"file","service":"starrocks-fe"},
    {"type":"file","path":"/opt/starrocks/fe/fe.out","source":"file","service":"starrocks-fe"},
    {"type":"file","path":"/opt/starrocks/fe/fe.warn.log","source":"file","service":"starrocks-fe"},
    {"type":"file","path":"/opt/starrocks/fe/fe.audit.log","source":"file","service":"starrocks-fe"},
    {"type":"file","path":"/opt/starrocks/fe/fe.dump.log","source":"file","service":"starrocks-fe"},
  ]    

Path of config files in configmap does not match the real path

In the operator source file pkg/fe_controller/fe_pod.go, the variable fe_config_path is defined as "/etc/starrocks/fe/conf". However, when I downloaded the image starrocks/fe-ubuntu:2.5.2, its configuration path was located at /opt/starrocks/fe/conf, which does not match the value of fe_config_path.
I am curious to know why fe_config_path is not defined as "/opt/starrocks/fe/conf".

FE、BE 部署时不支持nodeSelector

目前部署环境如下:

  • K8s:1.18.10
  • starrocks/operator:v1.1、starrocks/fe:2.4.1、starrocks/be:2.4.1

看 API 文档 StarRocksCluster 中不支持加 nodeSelector

[Enhancement] use emptyDir for be/log

@kevincai @shileifu @imay

This is found on Chart V1.3.5

Currently both be/storage and be/log are mounted to the same PV. This will have negative impact to the performance and using unnecessarily be/storage space for logging.

Suggesting mounting be/log to an emptyDir as we have already done this in the fe config

be storage manifest sample:

spec:
  volumes:
    - name: be-storage
      persistentVolumeClaim:
        claimName: be-storage-kube-starrocks-be-0

      volumeMounts:
        - name: be-storage
          mountPath: /opt/starrocks/be/storage
        - name: be-storage
          mountPath: /opt/starrocks/be/log

fe storage manifest sample:

spec:
  volumes:
    - name: fe-meta
      persistentVolumeClaim:
        claimName: fe-meta-kube-starrocks-fe-0
    - name: fe-log
      emptyDir: {}
      
      volumeMounts:
        - name: fe-meta
          mountPath: /opt/starrocks/fe/meta
        - name: fe-log
          mountPath: /opt/starrocks/fe/log

Clean up the values.yaml file.

It would be great for SR team to clean up the unused values in values.yaml file.
This effort can help reduce unnecessary questions/distraction to both the chart author and helm users.

E.g.

starrocksCluster:
  # Provide a name for starprocks cluster
  name: ""
  namespace: ""
  # annotations for starrocks cluster
  annotations: {} <======

Failure when deploy to Openshift cluster

@kevincai
Deploying FE/BE stateful set fails due to some Security constraint in Openshift.
Workaround has been provided below but it would be nice to solve this issue in a better way.

mv: cannot move '/opt/starrocks/fe/conf/fe.conf' to '/opt/starrocks/fe/conf/fe.conf.bak': Permission denied
ln: failed to create symbolic link '/opt/starrocks/fe/conf/fe.conf': Permission denied
ERROR 2003 (HY000): Can't connect to MySQL server on 'xxxx-fe-service.xxxx:9030' (111)

image

Workaround

  1. add fsGroup: 1000 to both starrocksBeSpec and starrocksFESpec.
    E.g. in the helm values.yaml
  starrocksFESpec:
    # Number of replicas to deploy for a fe statefulset.
    replicas: 1
...
    fsGroup: 1000 <======= add this line
  1. role bind the sa to system:openshift:scc:anyuid

deploy the following role binding

> kubectl apply -f rolebinding.yml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: system:openshift:scc:anyuid
  namespace: you-name-space
  managedFields:
    - manager: oc
      operation: Update
      apiVersion: rbac.authorization.k8s.io/v1

subjects:
  - kind: ServiceAccount
    name: your-sa  <============ replace it with the service account you use for starrocks cluster
    namespace: you-name-space <============ replace it with the name space you use for starrocks cluster
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:openshift:scc:anyuid 

cc @imay @hagonzal

Default username password when BE join FE and when FE electing itself as leader

Is there a way to pass the default username and password for FE and BE? Inside the fe_entrypoint.sh and be_entrypoint.sh it is using username: root and password: as a default. When I change the root password and I increase the number of replica of BE, the BE will fail when it tries to join FE.

add_self()
{
    local svc=$1
    start=`date +%s`
    local timeout=$PROBE_TIMEOUT

    while true
    do
        log_stderr "Add myself ($MY_SELF:$HEARTBEAT_PORT) into FE ..."
        timeout 15 mysql --connect-timeout 2 -h $svc -P $FE_QUERY_PORT -u root --skip-column-names --batch -e "ALTER SYSTEM ADD BACKEND \"$MY_SELF:$HEARTBEAT_PORT\";"
        memlist=`show_backends $svc`
        if echo "$memlist" | grep -q -w "$MY_SELF" &>/dev/null ; then
            break;
        fi

        let "expire=start+timeout"
        now=`date +%s`
        if [[ $expire -le $now ]] ; then
            log_stderr "Time out, abort!"
            exit 1
        fi

        sleep $PROBE_INTERVAL

    done
}

This line
timeout 15 mysql --connect-timeout 2 -h $svc -P $FE_QUERY_PORT -u root --skip-column-names --batch -e "ALTER SYSTEM ADD BACKEND \"$MY_SELF:$HEARTBEAT_PORT\";"

volume mapping TLS keystore

support mapping volumes in the oeprator

mapping of TLS trust store / keystore in case of MTLS to the BE nodes

how to persist data in PV

Mirae Kim
  [3 hours ago](https://starrocks.slack.com/archives/C02FACZSNJV/p1683146831354529)
hi! we’re deploying starrocks to a kubernetes cluster using the operator, and wondering how to persist data that we load into starrocks, even if the BE pods are restarted ? I tried using a storagevolume in the [starrocksBeSpec](https://github.com/StarRocks/starrocks-kubernetes-operator/blob/main/doc/api.md#starrocksbespec) but it doesnt seem to be where the data is stored

GitHubGitHub
[starrocks-kubernetes-operator/api.md at main · StarRocks/starrocks-kubernetes-operator](https://github.com/StarRocks/starrocks-kubernetes-operator/blob/main/doc/api.md#starrocksbespec)
Contribute to StarRocks/starrocks-kubernetes-operator development by creating an account on GitHub. (50 kB)
https://github.com/StarRocks/starrocks-kubernetes-operator/blob/main/doc/api.md#starrocksbespec

5 replies


Kevin Cai
  [2 hours ago](https://starrocks.slack.com/archives/C02FACZSNJV/p1683150963829999?thread_ts=1683146831.354529&cid=C02FACZSNJV)
can you paste the snippets of storagevolume under starrocksBeSpec section?


Mirae Kim
  [1 hour ago](https://starrocks.slack.com/archives/C02FACZSNJV/p1683153936209309?thread_ts=1683146831.354529&cid=C02FACZSNJV)
storageVolumes:
  - name: starrocks-be-storage
    storageSize: 1000Gi
    mountPath: '/data'


Kevin Cai
  [1 hour ago](https://starrocks.slack.com/archives/C02FACZSNJV/p1683154359413199?thread_ts=1683146831.354529&cid=C02FACZSNJV)
mount path should be /opt/starrocks/be/storage by default. if you want a non-default data path, you have to change corresponding storage path in be.conf together.


Mirae Kim
  [1 hour ago](https://starrocks.slack.com/archives/C02FACZSNJV/p1683154544815049?thread_ts=1683146831.354529&cid=C02FACZSNJV)
Ah ok got it - so is that the issue, that the mountpath was not correctly configured ? otherwise we should expect the data to persist in this even when the BEs restart ?


Kevin Cai
  [1 hour ago](https://starrocks.slack.com/archives/C02FACZSNJV/p1683154691370679?thread_ts=1683146831.354529&cid=C02FACZSNJV)
yes. with correct mountpath, BE should be able to write its data to the persistent volume.

Add support to annotate operator pod in helm chart

This is needed for collecting logs and metrics from datadog agent.

  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/default-container: manager
      labels:
        app: {{ template "kube-starrocks.name" . }}-operator
        version: {{ $.Chart.Version }}

Multiple FE pods cannot enter the ready state after being restarted.

I started 3 FE pods with nfs storage. It works normally if I does not restart any pod.
When I restart one of the FE pods, it will recover to the ready state.
But when I execute the following command for restarting all FE pods simultaneously:
[kubectl delete -f starrocks-fe.yaml]. At this time, the data in the NFS storage has been retained.
Then execute [kubectl apply -f starrocks-fe.yaml] , the state of fe-0 is in an unready state always.
I found that The FE pods must be started simultaneously, and the number of FE pods started must be the same as the number of FE pods started in the previous startup for the FE pods to start up normally. Otherwise, if only start up a single fe pod,it will remain in an unready state.
So,I tried to fix the source code of operator. In [func NewStatefulset], under [Spec: appv1.StatefulSetSpec], I added PodManagementPolicy: appv1.ParallelPodManagement,
And also I modified the startup command in operator as following:
Before:
Command: []string{"/opt/starrocks/fe_entrypoint.sh"},
After:
var command = []string{
"/bin/bash",
"-c",
"if [ -f "/opt/starrocks/fe/meta/image/ROLE" ]; then /opt/starrocks/fe/bin/start_fe.sh; else /opt/starrocks/fe_entrypoint.sh starrockscluster-sample-fe-service.starrocks; fi",
}
Command: command,

After the above changes, FE pods startup simultaneously, and the problem disappeared.
Can the author of Starrocks Operator tell me if my modification is correct?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.