Code Monkey home page Code Monkey logo

etcd-druid's Introduction

Gardener Logo

REUSE status CI Build status Slack channel #gardener Go Report Card GoDoc CII Best Practices

Gardener implements the automated management and operation of Kubernetes clusters as a service and provides a fully validated extensibility framework that can be adjusted to any programmatic cloud or infrastructure provider.

Gardener is 100% Kubernetes-native and exposes its own Cluster API to create homogeneous clusters on all supported infrastructures. This API differs from SIG Cluster Lifecycle's Cluster API that only harmonizes how to get to clusters, while Gardener's Cluster API goes one step further and also harmonizes the make-up of the clusters themselves. That means, Gardener gives you homogeneous clusters with exactly the same bill of material, configuration and behavior on all supported infrastructures, which you can see further down below in the section on our K8s Conformance Test Coverage.

In 2020, SIG Cluster Lifecycle's Cluster API made a huge step forward with v1alpha3 and the newly added support for declarative control plane management. This made it possible to integrate managed services like GKE or Gardener. We would be more than happy, if the community would be interested, to contribute a Gardener control plane provider. For more information on the relation between Gardener API and SIG Cluster Lifecycle's Cluster API, please see here.

Gardener's main principle is to leverage Kubernetes concepts for all of its tasks.

In essence, Gardener is an extension API server that comes along with a bundle of custom controllers. It introduces new API objects in an existing Kubernetes cluster (which is called garden cluster) in order to use them for the management of end-user Kubernetes clusters (which are called shoot clusters). These shoot clusters are described via declarative cluster specifications which are observed by the controllers. They will bring up the clusters, reconcile their state, perform automated updates and make sure they are always up and running.

To accomplish these tasks reliably and to offer a high quality of service, Gardener controls the main components of a Kubernetes cluster (etcd, API server, controller manager, scheduler). These so-called control plane components are hosted in Kubernetes clusters themselves (which are called seed clusters). This is the main difference compared to many other OSS cluster provisioning tools: The shoot clusters do not have dedicated master VMs. Instead, the control plane is deployed as a native Kubernetes workload into the seeds (the architecture is commonly referred to as kubeception or inception design). This does not only effectively reduce the total cost of ownership but also allows easier implementations for "day-2 operations" (like cluster updates or robustness) by relying on all the mature Kubernetes features and capabilities.

Gardener reuses the identical Kubernetes design to span a scalable multi-cloud and multi-cluster landscape. Such familiarity with known concepts has proven to quickly ease the initial learning curve and accelerate developer productivity:

  • Kubernetes API Server = Gardener API Server
  • Kubernetes Controller Manager = Gardener Controller Manager
  • Kubernetes Scheduler = Gardener Scheduler
  • Kubelet = Gardenlet
  • Node = Seed cluster
  • Pod = Shoot cluster

Please find more information regarding the concepts and a detailed description of the architecture in our Gardener Wiki and our blog posts on kubernetes.io: Gardener - the Kubernetes Botanist (17.5.2018) and Gardener Project Update (2.12.2019).


K8s Conformance Test Coverage certified kubernetes logo

Gardener takes part in the Certified Kubernetes Conformance Program to attest its compatibility with the K8s conformance testsuite. Currently Gardener is certified for K8s versions up to v1.29, see the conformance spreadsheet.

Continuous conformance test results of the latest stable Gardener release are uploaded regularly to the CNCF test grid:

Provider/K8s v1.29 v1.28 v1.27 v1.26 v1.25
AWS Gardener v1.29 Conformance Tests Gardener v1.28 Conformance Tests Gardener v1.27 Conformance Tests Gardener v1.26 Conformance Tests Gardener v1.25 Conformance Tests
Azure Gardener v1.29 Conformance Tests Gardener v1.28 Conformance Tests Gardener v1.27 Conformance Tests Gardener v1.26 Conformance Tests Gardener v1.25 Conformance Tests
GCP Gardener v1.29 Conformance Tests Gardener v1.28 Conformance Tests Gardener v1.27 Conformance Tests Gardener v1.26 Conformance Tests Gardener v1.25 Conformance Tests
OpenStack Gardener v1.29 Conformance Tests Gardener v1.28 Conformance Tests Gardener v1.27 Conformance Tests Gardener v1.26 Conformance Tests Gardener v1.25 Conformance Tests
Alicloud Gardener v1.29 Conformance Tests Gardener v1.28 Conformance Tests Gardener v1.27 Conformance Tests Gardener v1.26 Conformance Tests Gardener v1.25 Conformance Tests
Equinix Metal N/A N/A N/A N/A N/A
vSphere N/A N/A N/A N/A N/A

Get an overview of the test results at testgrid.

Start using or developing the Gardener locally

See our documentation in the /docs repository, please find the index here.

Setting up your own Gardener landscape in the Cloud

The quickest way to test drive Gardener is to install it virtually onto an existing Kubernetes cluster, just like you would install any other Kubernetes-ready application. You can do this with our Gardener Helm Chart.

Alternatively you can use our garden setup project to create a fully configured Gardener landscape which also includes our Gardener Dashboard.

Feedback and Support

Feedback and contributions are always welcome!

All channels for getting in touch or learning about our project are listed under the community section. We are cordially inviting interested parties to join our bi-weekly meetings.

Please report bugs or suggestions about our Kubernetes clusters as such or the Gardener itself as GitHub issues or join our Slack channel #gardener (please invite yourself to the Kubernetes workspace here).

Learn More!

Please find further resources about our project here:

etcd-druid's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

etcd-druid's Issues

[Feature] Auto-recover if database directory is locked

Feature (What you would like to be added):
In some infrastructures (Azure), abnormal termination of etcd container/pod leads to the database directory lock not being released and prevents the backup-restore to hang while opening the database for verification on etcd container restart.

We should try to detect this scenario and try to recover from it automatically.

Motivation (Why is this needed?):
This happens rarely (so far only a couple of times in Azure) but requires manual intervention. Typically, a pod restart resolves the issue. But we should try and automate this.

Approach/Hint to the implement solution (optional):
Typically, a pod restart resolves the issue.

[Feature] Deploy/maintain the correct `PodDisruptionBudget` configuration according to the Etcd resource status conditions

Feature (What you would like to be added):

Deploy/maintain the correct PodDisruptionBudget configuration according to the Etcd resource status conditions.

Motivation (Why is this needed?):

Pick individually executable pieces of the multi-node proposal.

Approach/Hint to the implement solution (optional):

The deployment of PodDisruptionBudget resource is probably best done in the main controller and the dynamic modification of the resource based on the Etcd resource status is best done in the custodian controller.

It is probably better to deploy the PodDisruptionBudget only for the multi-node case (spec.replicas > 1) because deploying it for the single-node case might block node drain.

[Feature] Standalone Helm Chart for Etcd Druid

Feature (What you would like to be added):
Would be great if we would have a standalone helm chart for druid which can be used outside Gardener.

Motivation (Why is this needed?):
Using druid instead of the etcd-operator and utilizing the backup and restore capabilities.

Approach/Hint to the implement solution (optional):
Cool thing would be to have a chart in the charts repository and released inside this github project via https://github.com/helm/chart-releaser-action in the repositories gh-pages branch.

[Feature] Schedule regular backup compaction

Feature (What you would like to be added):

With the newly introduced compaction command in etcd-backup-restore to asynchronously compact backups (latest full snapshot and its following incremental snapshots into a single full snapshot) in gardener/etcd-backup-restore#301, we should enhance etcd-druid to schedule the backup compaction at regular intervals to limit the number of incremental snapshots at any point in time and hence enhance backup restoration performance.

Motivation (Why is this needed?):

Complete the functionality for the issue #88.

Approach/Hint to the implement solution (optional):

etcd-druid's main controller may create a CronJob as part of it's reconciliation cycle. There is no need include the logic for selecting existing cronjobs based on spec.selector (of the Etcd resources) because of #186.

:BUG: Etcd-sts should block owner etcd resource deletion

Currently, the druid deploy etcd statefulset with owenerReference pointing to etcd resources. But the blockOwnerDeletion field is set to false as you can see here. But deletion of etcd resource should be blocked until all the resources deployed by it like statefulset, service, configmap are deleted.
Ideally, deletion of etcd resource should guarantee that etcd server is completely down.

[Feature] Simplify reconciliation by simplifying claim logic for StatefulSet, Service and ConfigMap based on Etcd resource name

Feature (What you would like to be added):

The reconciliation flow of etcd-druid includes claiming from potentially multiple pre-existing StatefulSet, Service and ConfigMap objects if they exist. This is done by selecting the objects based on spec.selector in the Etcd resource, claiming one of the matching objects (if any) and deleting the rest of the objects (if any). If no matching objects are found then a new object is created.

The logic of claiming from multiple pre-existing objects objects based onspec.selector was done because of the following reasons.

  1. Migration from the time before etcd-druid was introduced. I.e. adopting objects created from the time before etcd-druid was introduced minimised and simplified clean up.
  2. Keeping options open before multi-node ETCD design was finalised to use a single StatefulSet for all the members of an ETCD cluster. Another alternative of using one StatefulSet for each member of an ETCD cluster was still open at that time.

Now that the migration scenario as well as the multi-node design don't need the functionality of claiming from multiple pre-existing objects, we can simplify the claim logic to just pick the object to be claimed by the same name as the Etcd resource . We will still need the claim functionality to mark it as claimed, of course.

Motivation (Why is this needed?):

Approach/Hint to the implement solution (optional):

[Feature] Automated gardener integration tests

Feature (What you would like to be added):

We should created a suite of automated tests that test gardener integration.

Motivation (Why is this needed?):

We should detect as many regression and backward compatibility issues before merging a PR and to keep the master release-ready at any point in time.

Approach/Hint to the implement solution (optional):

[Feature] Modularize and enhance status management according to multi-node ETCD proposal

Feature (What you would like to be added):

  1. Enhance the Etcd resource status structure according to the changes proposed here
  2. Enhance etcd-backup-restore and etcd-druid to update the corresponding member's health status in the Etcd resource status. This need not include the task of cutting off traffic in case of backup failure yet as the evaluation/decision on that is pending for the scenario of multi-node ETCD with ephemeral persistence. The etcd-backup-restore needs to consider the following scenarios in the implementation.

Motivation (Why is this needed?):

Pick individually executable pieces of the multi-node proposal.

Approach/Hint to the implement solution (optional):

Also, it would be preferable to use StatusWriter.Patch() to avoid race-conditions.

☂️ Gardener ETCD Operator a.k.a. ETCD Druid

Feature (What you would like to be added):
Summarise the roadmap for etcd-druid with links to the corresponding issues.

Motivation (Why is this needed?):
A central place to collect the roadmap as well as the progress.

Approach/Hint to the implement solution (optional):

  • Basic Controller
    • Define CRD types
    • Implement basic controller to deploy StatefulSet (with replicas: 1) with the containers for etcd and etcd-backup-restore the same way it is being done now.
    • Unit tests
    • Integration tests
  • Propagate etcd defragmentation schedule from the CRD to etcd-backup-restore sidecar container.
  • Trigger full snapshot before hibernation/scale down.
  • Backup compaction
    • Incremental/continuous backup is used for finer granularity backup (in the order of minutes) with full snapshots being taken at a much larger intervals (in the order of hours). This makes the backup efficient both in terms of disk, network bandwidth and backup storage space utilization as well as compute resource utilisation during backup.
    • If the proportion of changes in the incremental backup is large then this impacts the restoration times because incremental backups can only be restored in sequence
    • #61@etcd-backup-restore.
  • Multi-node etcd cluster
    • All etcd nodes within the same Kubernetes cluster.
      • I.e., one CRD instance would provision multiple etcd nodes in the same Kubernetes cluster/namespace as the CRD instance.
      • Enhance CRD types to address the use-case
      • Scale sub-resource implementation for the current CRD
      • Addi/promote etcd learners/members during scale up, including quorum adjustment.
      • Remove etcd members during scale down, including quorum adjustment.
      • Handle backup/restore in the different states of the etcd cluster
      • Multi-AZ support
        • I.e. etcd nodes distributed across availability zones in the hosting Kubernetes cluster
    • Each etcd node in a different Kubernetes cluster.
      • I.e. each etcd node will be provisioned via a separate CRD instance in a different Kubernetes cluster but these nodes will be configured to find each other to form an etcd cluster.
      • There will be as many CRD instances as the number of nodes in the etcd cluster.
      • #233@gardener.
      • Enhance CRD types to address the use-case
      • Add/promote etcd learners/members during scale up, including quorum adjustment.
      • Remove etcd members during scale down, including quorum adjustment.
      • Handle backup/restore in the different states of the etcd cluster
  • Non-disruptive Autoscaling
    • The VerticalPodAutoscaler supports multiple update policies including recreate, initial and off.
    • The recreate policy is clearly not suitable for a single-node etcd instances because of the implications on frequent, unpredictable and unmanaged down-time.
    • The initial policy does not make sense for etcd considering the longer database verification time for non-graceful shutdown.
    • For a single-node etcd instance, vertical scaling via the VerticalPodAutoscaler would always be disruptive because of the way scaling is done by VPA. It gives no opportunity to take action before the etcd pod(s) are disrupted for scaling.
    • A controller can co-ordinate the etcd-specific steps to mitigate the disruption during (vertical) scaling if an alternative way is used to vertically scale a CRD instead of the individual pods directly.
  • Non-disruptive Updates
    • For a single-node etcd instance, updates would be disruptive.
    • A controller can co-ordinate the etcd-specific steps to mitigate the disruption during updates.
  • Database Restoration
    • Database restoration is also currently done on startup (or a restart) (if database verification fails) within the same backup-restore sidecar's main process.
    • Introducing a controller enables the option to perform database restoration as a separate job.
    • The main advantage of this approach is to decouple the memory requirement of a database restoration from the regular backup (full and delta) tasks.
    • This could be especially of interest because the delta snapshot restoration requires an embedded etcd instance which might mean that the memory requirement for database restoration is almost certain to be proportionate to the database size. However, the memory requirement for backup (full and delta) need not be proportionate to the database size at all. In fact, it is very realistic to expect that the memory requirement for backup be more or less independent of the database size.
  • Migration for major updates
    • Data and/or backup migration during major updates which change the data and/or backup format or location.
  • Backup Health Verification
    • Currently, we rely on the database backups in the storage provider to remain healthy. There are no additional checks to verify if the backups are still healthy after upload.
    • A controller can be used to perform such backup health verification asynchronously.
  • #505

[Feature] Leader election settings should be increased and made configurable in chart manifests

Feature (What you would like to be added):
Leader election settings should be increased and made configurable in chart manifests.

Motivation (Why is this needed?):
The default leader election settings in controller-runtime seem to create too much load on the apiserver. It should be possible to configure them to reduce load on the apiserver without having to make any changes to etcd-druid.

Approach/Hint to the implement solution (optional):
We can introduce command-line flags and chart manifest flags along the lines of gardener/gardener#2667.

Also, it would be desirable to switch to Lease for leader election rather than ConfigMap. But controller-runtime still uses ConfigMap. So, for this, we either have to wait till controller-runtime move to Lease or we override with a custom newResourceLock factory function in the options.

[Feature] Enhance the Etcd resource status structure as proposed in the multi-node proposal

Feature (What you would like to be added):

Enhance the Etcd resource status structure according to the changes proposed here while maintaining backward compatibility for the consumers of Etcd resource status (such as the gardenlet).

Motivation (Why is this needed?):

Pick individually executable pieces of the multi-node proposal.

Approach/Hint to the implement solution (optional):

For backward compatibility, the existing status fields and the values in them need to maintained as they are. In both the main etcd-druid controller (especially, here and here) as well as the newly separated custodian controller.

[Feature] Expose the monitoring configuration as per gardener extensions contract

Feature (What you would like to be added):
Expose the associated monitoring and logging configuration as per the https://github.com/gardener/gardener/blob/master/docs/extensions/logging-and-monitoring.md

Motivation (Why is this needed?):
Thought Druid is standalone component, it is designed adhere to gardener extension contract as well. As a result it might have to take responsibility of exposing it monitoring configuration to gardener like projects.

Approach/Hint to the implement solution (optional):

[BUG] HVPA not able to scale etcd.

Describe the bug:
VPA recommender misses the permission to get scale subresource because of which VPA on etcd is not happening

Expected behavior:
As load increases based on VPA recommendations , etcd should be scaled.

How To Reproduce (as minimally and precisely as possible):

Logs:

Screenshots (if applicable):

Environment (please complete the following information):

  • Etcd version/commit ID :
  • Etcd-druid version/commit ID :
  • Cloud Provider [All/AWS/GCS/ABS/Swift/OSS]: All

Anything else we need to know?:

[Feature] Provision for multiple instances of ETCD through ETCD CRD

Feature (What you would like to be added): ETCD druid should create multiple ETCD instances(along with ETCDBR instances) as specified in ETCD CRD.

Motivation (Why is this needed?): To allow bootstrapping of multinode ETCD cluster for shoot cluster

Approach/Hint to the implement solution (optional):

Refer: #107

[BUG] Fix failing test after

Describe the bug:
The following test case is failing after the commit 72ec7a0.

Expected behavior:
No test cases should fail.

How To Reproduce (as minimally and precisely as possible):

Run make test on the commit 72ec7a0.

Logs:

• Failure [6.027 seconds]
Druid when etcd resource is created [It] if fields are set in etcd.Spec and TLS enabled, the resources should reflect the spec changes 
/tmp/build/a94a8fe5/pull-request-gardener.etcd-druid-pr.master/tmp/src/github.com/gardener/etcd-druid/controllers/etcd_controller_test.go:482

  Expected
      <string>: ConfigMap
  to match fields: {
  .Data."bootstrap.sh":
  	Expected
  	    <string>: "...tus = '143'..."
  	to equal               |
  	    <string>: "...tus == '143..."
  }
  

  /tmp/build/a94a8fe5/pull-request-gardener.etcd-druid-pr.master/tmp/src/github.com/gardener/etcd-druid/controllers/etcd_controller_test.go:882

Screenshots (if applicable):

Environment (please complete the following information):

  • Etcd version/commit ID :
  • Etcd-druid version/commit ID :
  • Cloud Provider [All/AWS/GCS/ABS/Swift/OSS]:

Anything else we need to know?:

[Feature] The latest snapshot info in etcd status

Feature (What you would like to be added):
The status in etcd resource does not reflect the current snapshot [full/delta]. Update the etcd resource status to reflect the latest snapshot information.

Motivation (Why is this needed?):
It would help in control plane migration to fetch the latest snapshot for update.
Approach/Hint to the implement solution (optional):

Update Operator

Feature (What you would like to be added):
Run multi-node ETCD during maintenance operations, so that it can quickly fail-over.

Motivation (Why is this needed?):
Shorter ETCD (=API server=cluster) downtimes during maintenance operations that effect ETCD like rolling the seed node it runs on or updating the ETCD spec.

Approach/Hint to the implement solution (optional):
Operator that scales out (with node anti-affinity) and later in again. The main question will be how to orchestrate that with Gardener as there are hooks and means missing for that at present.

[Feature] Avoid locks during reconciliation

Feature (What you would like to be added):

#163 introduced locking the main and custodian controller for every update to the Etcd resource and its status.

This should be avoided and the race conditions in the tests should be solved in a different way.

Motivation (Why is this needed?):

Such synchronisation will lead to performance bottlenecks.

Approach/Hint to the implement solution (optional):

Use StatusWriter.Patch()?

Credit: @timuthy

[Feature] Move to etcd v3.3.23 or higher

Feature (What you would like to be added):
Move to etcd v3.3.23 or the latest v.3.3.x patch release .

Motivation (Why is this needed?):

Approach/Hint to the implement solution (optional):

[Feature] Enhance reconciliation to handle multi-node scenario in etcd-druid

Feature (What you would like to be added):

Enhance reconciliation to handle multi-node scenario in etcd-druid. This should include the following topics.

  • Maintain the Ready and AllMembersReady conditions based on the contents of the members section of the Etcd resource status.
  • Create member Lease objects for every member pod.
  • Create separate Services for client and etcd peer (ref) -> TBD (#147)

Motivation (Why is this needed?):

Pick individually executable pieces of the multi-node proposal.

Approach/Hint to the implement solution (optional):

[Feature] Make it possible to have smaller auto-compaction-retention period for etcd

Feature (What you would like to be added):

Make it possible to have smaller auto-compaction-retention period for etcd (both main etcd and the embedded ETCD during restoration).

Motivation (Why is this needed?):

High update rate can overflow memory and storage if auto-compaction-retention period is long. The current value is 24h.

auto-compaction-mode: periodic
auto-compaction-retention: "24"

Approach/Hint to the implement solution (optional):

We can either change the value to be smaller by default (5m?) and/or we can make it configurable via the Etcd resource spec.

[Feature] Improve Lease informer

Feature (What you would like to be added):
Use an optimized informer for Lease resources, concretely only for objects which contain a gardener.cloud/owned-by label.

Motivation (Why is this needed?):
#214 fetches Lease objects for performing health checks on etcd members. It uses the standard Controller-Runtime client which is backed by a cache, so that all Lease objects will be considered in the informer's ListWatch function. Since Controller-Runtime v0.9.0 it is possible to setup this cache more fine granular (see here)

Approach/Hint to the implement solution (optional):
Controller Runtime is updated to v0.10.2. So the optimization of lease informer based on label is supported with the current version

[Feature] Prepare Druid for Server-Side Apply

Feature (What you would like to be added):
Druid should use and support Server-Side Apply where applicable once Gardener dropped the support for seed clusters with K8s <= 1.17 (gardener/gardener#4083).

Motivation (Why is this needed?):
Server-Side Apply makes working with the etcd resource more efficiently when there will be more then one actor (motivated here).

Tasks to be done:

  • Add markers for merge strategy to etcd resource
  • Use SSA for etcd status updates

[Feature] Automatic Update-PR script

Feature (What you would like to be added):
This repository should also benefit from automatic update PRs of dependent components. etcd-druid deploys etcd-backup-restore, hence, when a new version of it is released then automatic update PRs should be opened by CI, similar to gardener/gardener#2260.

Motivation (Why is this needed?):
Less manual actions.

Approach/Hint to the implement solution (optional):
You need such a script: https://github.com/gardener/gardener/blob/master/hack/.ci/set_dependency_version
You don't need to copy it but can also call it like the extensions as you already vendor gardener/gardener: https://github.com/gardener/gardener-extension-provider-aws/blob/master/.ci/set_dependency_version, https://github.com/gardener/gardener-extension-provider-aws/blob/master/hack/tools.go#L23

[Feature] Start a job which will copy ETCD backups between backup buckets via etcd-backup-restore

Feature (What you would like to be added):
Functionality to start a job which will use etcd-backup-restore to copy ETCD backups between backup buckets during the restore phase of Control Plane Migration (you can check the revised GEP here

The ETCD druid can find out whether it should start such a job via an additional field in the etcd resource, providing information about the source backup bucket. All necessary secrets will be handled by the BackupEntry controller and an additional "source" Backupentry resource.

Motivation (Why is this needed?):
This is needed to start an etcd-backup-restore copy operation which will be used to copy etcd backups between backup buckets. You can check issue 356 on the etcd-backup-restore repo

Approach/Hint to the implement solution (optional):
A POC was already developed for this, however we did not start the etcd-backup-restore copy operation as a job. Still the main functionality and idea is present in the POC. It is outlined here: gardener/gardener#3875

[Feature] Make Etcd CRD's `spec.backup.store` section immutable

Feature (What you would like to be added):
Make Etcd CRD's spec.backup.store section immutable.

Motivation (Why is this needed?):
Make Etcd CRD's spec.backup.store section immutable so that the storage container location isn't allowed to change mid-usage of an etcd, due to potential mismatch of snapshotting and restoration locations, allowing restorations to happen from a different etcd's backup and rendering the shoot cluster unusable. Refer gardener/gardener#4454 for a fix already made on Gardener, although we still want druid to be resilient to potential undesirable changes to the Etcd resource.

Approach/Hint to the implement solution (optional):
Since CRD immutability is yet to be support (refer kubernetes/kubernetes#65973), it might make more sense to use something like a validating webhook on the Etcd resource updates.

/cc @amshuman-kr

Multi-Node/Clustered ETCD

Stories

Please provide stories that we plan to tackle:

  • As operator ...
  • As provider ...

Motivation

We should support provisioning and management of multi-node etcd clusters via etcd-druid to serve the following goals:

  • Generally, better robustness, resilience, and high availability (HA)
  • Zero downtime maintenance (ZDM)
  • Non-disruptive scaling
  • Single-zone outage/fault tolerance
  • Ephemeral (or in-memory) persistence for better performance

Acceptance Criteria

Enhancement/Implementation Proposal (optional)

[Feature] Unit tests for etcd-druid reconciliation cycle

Feature (What you would like to be added):
Unit tests for etcd-druid reconciliation cycle.

Motivation (Why is this needed?):
We should have both positive and negative scenarios covered in the unit tests to improve our own productivity and to avoid regression.

Approach/Hint to the implement solution (optional):
Replace the kubebuilder way of tests (running kube-apiserver and etcd) with mock APIs.

[BUG] panic: "invalid memory address or nil pointer dereference"

Describe the bug:
etcd-druid panics with nil pointer.

Expected behavior:

How To Reproduce (as minimally and precisely as possible):

Logs:

E0416 09:08:10.455443       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 317 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x14e9740, 0x23fb600)
	/go/src/github.com/gardener/etcd-druid/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/gardener/etcd-druid/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82
panic(0x14e9740, 0x23fb600)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/gardener/etcd-druid/controllers.(*EtcdReconciler).getMapFromEtcd(0xc0000d4050, 0xc00086c500, 0x3fb999999999999a, 0x4, 0x0)
	/go/src/github.com/gardener/etcd-druid/controllers/etcd_controller.go:881 +0x1276
github.com/gardener/etcd-druid/controllers.(*EtcdReconciler).reconcileEtcd(0xc0000d4050, 0xc00086c500, 0xc00086c500, 0x0, 0x0, 0x0)
	/go/src/github.com/gardener/etcd-druid/controllers/etcd_controller.go:724 +0x4d
github.com/gardener/etcd-druid/controllers.(*EtcdReconciler).reconcile(0xc0000d4050, 0x18da080, 0xc000048248, 0xc00086c500, 0xc000845c40, 0x2, 0x2, 0x18a9ec0)
	/go/src/github.com/gardener/etcd-druid/controllers/etcd_controller.go:227 +0x27c
github.com/gardener/etcd-druid/controllers.(*EtcdReconciler).Reconcile(0xc0000d4050, 0xc000187300, 0x16, 0xc0005c4d00, 0xb, 0xc000758c00, 0x1, 0xc000758cc8, 0x478588)
	/go/src/github.com/gardener/etcd-druid/controllers/etcd_controller.go:189 +0x30b
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000184840, 0x15400e0, 0xc00061a400, 0x43eb00)
	/go/src/github.com/gardener/etcd-druid/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:256 +0x162
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000184840, 0x0)
	/go/src/github.com/gardener/etcd-druid/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232 +0xcb
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc000184840)
	/go/src/github.com/gardener/etcd-druid/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00069c530)
	/go/src/github.com/gardener/etcd-druid/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00069c530, 0x3b9aca00, 0x0, 0x1, 0xc00015a0c0)
	/go/src/github.com/gardener/etcd-druid/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc00069c530, 0x3b9aca00, 0xc00015a0c0)
	/go/src/github.com/gardener/etcd-druid/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/go/src/github.com/gardener/etcd-druid/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:193 +0x328
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1347ff6]

goroutine 317 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/gardener/etcd-druid/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x105
panic(0x14e9740, 0x23fb600)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/gardener/etcd-druid/controllers.(*EtcdReconciler).getMapFromEtcd(0xc0000d4050, 0xc00086c500, 0x3fb999999999999a, 0x4, 0x0)
	/go/src/github.com/gardener/etcd-druid/controllers/etcd_controller.go:881 +0x1276
github.com/gardener/etcd-druid/controllers.(*EtcdReconciler).reconcileEtcd(0xc0000d4050, 0xc00086c500, 0xc00086c500, 0x0, 0x0, 0x0)
	/go/src/github.com/gardener/etcd-druid/controllers/etcd_controller.go:724 +0x4d
github.com/gardener/etcd-druid/controllers.(*EtcdReconciler).reconcile(0xc0000d4050, 0x18da080, 0xc000048248, 0xc00086c500, 0xc000845c40, 0x2, 0x2, 0x18a9ec0)
	/go/src/github.com/gardener/etcd-druid/controllers/etcd_controller.go:227 +0x27c
github.com/gardener/etcd-druid/controllers.(*EtcdReconciler).Reconcile(0xc0000d4050, 0xc000187300, 0x16, 0xc0005c4d00, 0xb, 0xc000758c00, 0x1, 0xc000758cc8, 0x478588)
	/go/src/github.com/gardener/etcd-druid/controllers/etcd_controller.go:189 +0x30b
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000184840, 0x15400e0, 0xc00061a400, 0x43eb00)
	/go/src/github.com/gardener/etcd-druid/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:256 +0x162
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000184840, 0x0)
	/go/src/github.com/gardener/etcd-druid/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232 +0xcb
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc000184840)
	/go/src/github.com/gardener/etcd-druid/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00069c530)
	/go/src/github.com/gardener/etcd-druid/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00069c530, 0x3b9aca00, 0x0, 0x1, 0xc00015a0c0)
	/go/src/github.com/gardener/etcd-druid/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc00069c530, 0x3b9aca00, 0xc00015a0c0)
	/go/src/github.com/gardener/etcd-druid/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/go/src/github.com/gardener/etcd-druid/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:193 +0x328

Screenshots (if applicable):

Environment (please complete the following information):

  • Etcd version/commit ID :
  • Etcd-druid version/commit ID :
  • Cloud Provider [All/AWS/GCS/ABS/Swift/OSS]:

Anything else we need to know?:
etcd-druid: v0.1.14

[BUG] Validation that selector matches labels in etcd.Spec

Describe the bug:
A clear and concise description of what the bug is.
The statefulset prevents creation of statefulsets where the labels in the template does not match the selector. Etcd however does not have a similar validation.
Expected behavior:
A clear and concise description of what you expected to happen.
Etcd resource creation should through an error when the labels in the template does not match the selector.
How To Reproduce (as minimally and precisely as possible):
Have the selector field set so that it does not match the template in the statefulset.
Logs:

Screenshots (if applicable):

Environment (please complete the following information):

  • Etcd version/commit ID :
  • Etcd-druid version/commit ID :
  • Cloud Provider [All/AWS/GCS/ABS/Swift/OSS]:

Anything else we need to know?:

[Feature] Move status update to a separate controller in druid

Feature (What you would like to be added):
The main reconciliation loop in etcd-druid takes care of everything from updating the owned resources and updating the status in the Etcd resource. We should create a separate controller (still part of the etcd-druid controller manager) which reconciles only the status section of the Etcd resource.

Credit: @rfranzke ❤️

Motivation (Why is this needed?):
The main reconcilation loop is triggered only if the watch events pass some predicates. If the status update during the main reconcilation fails for any reason, the status in the Etcd resource might not be updated until the next gardener reconcilation event that matches the predicates.

Approach/Hint to the implement solution (optional):

☂️ [Feature] Add EtcdMember resource

Feature (What you would like to be added):
A new resource EtcdMemeber should be added to the druid.gardener.cloud/v1alpha1 API group.

Example:

apiVersion: druid.gardener.cloud/v1alpha1
kind: EtcdMember
metadata:
  labels:
    gardener.cloud/owned-by: etcd-test
  name: etcd-test-0 # pod name
  namespace: default
  ownerReferences:
  - apiVersion: druid.gardener.cloud/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: etcd
    name: etcd-test
    uid: <UID>
status:
  id: "1"
  lastTransitionTime: "2021-07-20T10:34:04Z"
  lastUpdateTime: "2021-07-20T10:34:04Z"
  name: member1
  reason: up and running
  role: Member
  status: Ready

Every etcd member in a cluster should have a corresponding EtcdMember resource which contains the shown status information. The EtcdMember resource ought to be created and maintained by the backup-restore side car. Etcd-Druid may set status: Unkown after heartbeatGracePeriod (ref).

Motivation (Why is this needed?):
The original proposal intended the status information of each etcd member to be part of a []members list in the etcd.status resource. However, this will lead to update conflicts as multiple clients try to update the same resource at nearly the same time and we cannot use any adequate patch technique (SSA failed for K8s versions <= 1.21, strategic-merge not supported for CRDs) to prevent that.

Subtasks

/cc @shreyas-s-rao @amshuman-kr

[Feature] Integrate backup compression feature from etcd-backup-restore with etcd-druid and gardener

Feature (What you would like to be added):

Integrate backup compression feature from etcd-backup-restore with etcd-druid (by enabling configuration via Etcd resource spec) and then integrate with gardener.

Motivation (Why is this needed?):

The backup compression feature will be used primarily in the etcd-druid and gardener context.

Approach/Hint to the implement solution (optional):

Keep the default configuration in etcd-druid to be uncompressed backups (for backward compatibility) and the default configuration in gardener integration to be compressed backups.

[Feature] Control etcd + backup-restore sidecar versions

Feature (What you would like to be added):
The etcd-druid should control the both the versions for etcd + backup-restore sidecar.

Motivation (Why is this needed?):
It controls the manifests + configuration for the statefulset and the used versions must fit to it. Hence, it makes sense to control them.

Approach/Hint to the implement solution (optional):
Please use the image vector approach (https://github.com/gardener/gardener/blob/master/charts/images.yaml) with the use of https://github.com/gardener/gardener/tree/master/pkg/utils/imagevector.
It must be possible to overwrite the image vector during deployment time.

[Feature] Move out bootstrap script to etcd image

Feature (What you would like to be added):
Move out etcd bootstrap script to etcd custom image.

Motivation (Why is this needed?):
To avoid the issue of out-of-sync configmap and statefulset spec (and hence etcd image version) during etcd version updates on Gardener landscapes.

Approach/Hint to the implement solution (optional):

Compacting Incremental Snapshot Files

Question: How are you guys dealing with the incremental backup files when restoring a cluster? I am asking, because in Kubify we expect one full snapshot file to trigger a restore operation. What is the best way to compact all the incremental backup files into one? Do you have already something handy?

[Enhancement] let's not hardcode "cluster-autoscaler.kubernetes.io/safe-to-evict=false" for all etcd

Feature (What you would like to be added):
etcd-druid should not hardcode the annotation "cluster-autoscaler.kubernetes.io/safe-to-evict=false" on etcd pods and let user configure it using annotation field in CRD.

Motivation (Why is this needed?):
Not every etcd is critical to system. The above mentioned annotation is specific to cluster-autoscaler and not etcd. Depending on the use of etcd CRD creator should have choice to add this annotation.
From gardener's POV, this etcd-main is critical but etcd-events is not that critical. So, the annotation should be set for etcd-main but not etcd-events. In future, we thought of deploying etcd for cilium networking extension, there also probably this annotation might not be required.

Approach/Hint to the implement solution (optional):
Remove the annotation from https://github.com/gardener/etcd-druid/blob/master/charts/etcd/templates/etcd-statefulset.yaml#L30.

[BUG] Automatic recovery from informer errors

Feature (What you would like to be added):
If there is any issue with watch connections used by the informers, etcd-druid should detect this and try to automatically recover from it.

Motivation (Why is this needed?):
Manual intervention is required without such detection and automatic recovery.

Approach/Hint to the implement solution (optional):
We need to revendor client-go and possibly controller-runtime to include the fix (kubernetes/kubernetes#87329) that propagates informer errors to the caller and then possibly react to it.

See also:

[BUG] Etcd failed to flush changes from WAL to the DB before shutting down/crashing.

Describe the bug:
After a single-node etcd instance provisioned via etcd-druid terminated abnormally (non-zero exit code) the etcd container restarted and the the backup-restore sidecar container (on data directory verification) had the following logs.

current etcd revision (2314180238) is less than latest snapshot revision (2314180239): possible data loss

On circumventing the backup restoration triggered because of this, it was found that the WAL directory (not checked by the `backup-restore sidecar) contained more recent revisions which were applied after the restart (without the backup restoration).

Expected behavior:
etcd-druid should try and configure etcd instances to shut down safely (and flush the WAL changes to the database) or often that.

How To Reproduce (as minimally and precisely as possible):
Not known yet.

Logs:

current etcd revision (2314180238) is less than latest snapshot revision (2314180239): possible data loss

Screenshots (if applicable):

Environment (please complete the following information):

  • Etcd version/commit ID :
  • Etcd-druid version/commit ID :
  • Cloud Provider [All/AWS/GCS/ABS/Swift/OSS]:

Anything else we need to know?:

[Feature] Performance/load test for etcd instances

Feature (What you would like to be added):
We should have performance/load test for etcd instances integrated with etcd-druid CI/CD pipelines which should test at least the following aspects.

  1. Create an etcd database close to 8Gi.
  2. Create a high rate of updates (>500/s) into etcd and high rate of delta snapshots (>4/m of >100Mi snapshots).
  3. The changes should include large sized individual changes that are close to the configured max request bytes size.
  4. Restart etcd under the such active load.
  5. Restore a large DB.

Motivation (Why is this needed?):
This will help us understand the limits and help us configure the alert thresholds.

Approach/Hint to the implement solution (optional):

[BUG] Remove operation annotation before reconciling etcd, in accordance to the gardener extension contract

Describe the bug:
The etcd controller removes the operation annotation from the Etcd resource after reconciling it, which goes against the gardener extension contract.

Expected behavior:
The etcd controller should remove the operation annotation from the Etcd resource before reconciling it here.

How To Reproduce (as minimally and precisely as possible):

Logs:

Screenshots (if applicable):

Environment (please complete the following information):

  • Etcd version/commit ID :
  • Etcd-druid version/commit ID :
  • Cloud Provider [All/AWS/GCS/ABS/Swift/OSS]:

Anything else we need to know?:

[BUG] Fix reconciliation predicates to fully comply with gardener extension contract

Describe the bug:
The main controller reconciles changes to Etcd resource spec even if gardener.cloud/reconcile annotation is not added. This is against the gardener extension contract.

Expected behavior:
The main controller should use the predicates in such a way that changes to the Etcd resource spec are reconciled only when the resource is also annotated appropriately. For example, see here.

How To Reproduce (as minimally and precisely as possible):

Logs:

Screenshots (if applicable):

Environment (please complete the following information):

  • Etcd version/commit ID : v0.5.2
  • Etcd-druid version/commit ID :
  • Cloud Provider [All/AWS/GCS/ABS/Swift/OSS]:

Anything else we need to know?:

[Feature] Move to etcd v3.4.10 or higher

Feature (What you would like to be added):
Move to etcd v3.4.10 or the latest v3.4.x patch release.

Motivation (Why is this needed?):

Approach/Hint to the implement solution (optional):
We might have to build our own custom image for etcd to package the dependencies of the bootstrap script (wget).

[Feature] Failed backups to not block incoming traffic and trigger high prio alert instead

Feature (What you would like to be added):
Currently, the health check of the etcd pods is linked to the backup health (last backup upload succeeded) in addition to just etcd health. But as long as etcd data is backed by persistent volumes (it is now), we can afford for etcd to continue serve income requests even when backup upload fails as long as high priority alerts are triggered when backup upload fails and follow up is done to resolve the issue.

Motivation (Why is this needed?):
Avoid bringing down the whole shoot cluster control-plane when backup upload fails as that basically brings the cluster to a grinding halt. This might be affordable if etcd data is backed by persistent volumes because for data loss to occur a further data corruption in the persistent volumes is required (while backup upload is failing) to cause a data loss.

See also https://github.tools.sap/kubernetes-canary/issues-canary/issues/599

Approach/Hint to the implement solution (optional):
The following tasks might have to checked/evaluated.

[BUG] Validations for etcd spec

Describe the bug:
Validations for the below scenarions:

  • backup-secretRet nil-check
  • volumeTemplateName not provided
  • StorageClass not provided

Expected behavior:
A clear and concise description of what you expected to happen.

How To Reproduce (as minimally and precisely as possible):

Logs:

Screenshots (if applicable):

Environment (please complete the following information):

  • Etcd version/commit ID :
  • Etcd-druid version/commit ID :
  • Cloud Provider [All/AWS/GCS/ABS/Swift/OSS]:

Anything else we need to know?:

[Feature] Add validation for etcd resources

Feature (What you would like to be added):

Please add validation code for etcd resources, similarly to the validation code that already exists for other Gardener extension resources, even though this is technically still dead code.

We are currently working on a new validating webhook in seed-admission-controller for such extension resources, see gardener/gardener#4293, I think we could include the validation of etcd resources there as well. Alternatively, etcd-druid could introduce its own validating webhook if for whatever reason the above option is not good enough.

Motivation (Why is this needed?):

We recently had a rather severe issue that could have been prevented if we had such validation in place, see gardener/gardener-extension-provider-azure#328 (comment). In this particular case, gardenlet was generating an etcd resource with a spec.backup.store.prefix set to -- due to a data race. With validation in place, we could have detected -- as an invalid spec.backup.store.prefix and prevented the reconciliation from continuing. This particular issue is already fixed in gardenlet (see gardener/gardener#4459 and gardener/gardener#4454), but similar issues may occur in the future.

Approach/Hint to the implement solution (optional):

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.