Code Monkey home page Code Monkey logo

tidb-operator's Introduction

TiDB Operator

codecov LICENSE Language Go Report Card GitHub release GoDoc

TiDB Operator manages TiDB clusters on Kubernetes and automates tasks related to operating a TiDB cluster. It makes TiDB a truly cloud-native database.

TiDB Operator Architecture

Features

  • Safely scaling the TiDB cluster

    TiDB Operator empowers TiDB with horizontal scalability on the cloud.

  • Rolling update of the TiDB cluster

    Gracefully perform rolling updates for the TiDB cluster in order, achieving zero-downtime of the TiDB cluster.

  • Multi-tenant support

    Users can deploy and manage multiple TiDB clusters on a single Kubernetes cluster easily.

  • Automatic failover

    TiDB Operator automatically performs failover for your TiDB cluster when node failures occur.

  • Kubernetes package manager support

    By embracing Kubernetes package manager Helm, users can easily deploy TiDB clusters with only one command.

  • Automatically monitoring TiDB cluster at creating

    Automatically deploy Prometheus, Grafana for TiDB cluster monitoring, support the following features:

    • Monitoring multiple clusters across multiple namespaces.
    • Multiple replicas.
    • Targets sharding.
    • Updating configurations and rules dynamically.
    • Thanos framework integration.
  • Heterogeneous cluster

    Users can deploy a heterogeneous cluster join existing cluster.

Quick Start

You can follow our Get Started guide to quickly start a testing Kubernetes cluster and play with TiDB Operator on your own machine.

Documentation

You can see our documentation at PingCAP website for more in-depth installation and instructions for production:

All the TiDB Operator documentation is maintained in the docs-tidb-operator repository.

Blog

Community

Feel free to reach out if you have any questions. The maintainers of this project are reachable via:

Pull Requests are welcome! Check the issue tracker for status/help-wanted issues if you're unsure where to start.

If you're planning a new feature, please file an issue or join #sig-k8s channel to discuss first.

Contributing

Contributions are welcome and greatly appreciated. See CONTRIBUTING.md for details on submitting patches and the contribution workflow.

License

TiDB is under the Apache 2.0 license. See the LICENSE file for details.

tidb-operator's People

Contributors

astroprofundis avatar aylei avatar bornchanger avatar cofyc avatar csuzhangxc avatar cvvz avatar cwen0 avatar danielzhangqd avatar dragonly avatar gregwebs avatar handlerww avatar howardlau1999 avatar husharp avatar july2993 avatar just1900 avatar kanshiori avatar lichunzhu avatar linuxgit avatar liubog2008 avatar mianhk avatar mikechengwei avatar shonge avatar shuijing198799 avatar tennix avatar wangle1321 avatar weekface avatar wizardxiao avatar xhebox avatar xiaojingchen avatar yisaer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tidb-operator's Issues

add e2e test to ensure tidbcluster's status is properly synchronized

At present, the synchronization of tidbcluster status involves the processing of pd/tikv/tidb members. The correctness of the status is the basis for the execution of all functions. It is necessary to add the test logic to ensure that the status is correct.
check points

  • pd/tikv/tidb statefulSet status sync to tidbcluster
  • whether to get the real status information of pd and tikv through pdclient

the tidbcluster status struct is defined at
https://github.com/pingcap/tidb-operator/blob/master/pkg/apis/pingcap.com/v1alpha1/types.go#L102
should add test logic at
https://github.com/pingcap/tidb-operator/blob/master/tests/e2e/create.go#L72

#158

Add imagePullPolicy support

User should be able to customize the imagePullPolicy for every containers in operator.

So we should add a related imagePullPolicy option for every images in charts/tidb-cluster/values.yaml

pkg/controller PDControl's PDClient http connection cache bug

Current GetPDClient default return the PDClient that was cached

func (pdc *defaultPDControl) GetPDClient(tc *v1alpha1.TidbCluster) PDClient {
	pdc.mutex.Lock()
	defer pdc.mutex.Unlock()
	namespace := tc.GetNamespace()
	tcName := tc.GetName()
	key := pdClientKey(namespace, tcName)
	if _, ok := pdc.pdClients[key]; !ok {
		pdc.pdClients[key] = NewPDClient(pdClientURL(namespace, tcName), timeout)
	}
	return pdc.pdClients[key]
}

But in k8s the http connection may be unavailabe when upgrade and failover happen. than the cached pdclient call api failed will cause a long time of blocking.

INITIAL_REPLICAS environment variable not set

My pd servers are failing to start with the following error:

2018-08-16T15:49:35.090319525Z /usr/local/bin/pd_start_script.sh: line 49: INITIAL_REPLICAS: parameter not set

It looks like this environment variable is set in pd_member_manager.go:

morgo@ryzen:~/kube/tidb-operator$ grep -rFi INITIAL_REPLICAS
pkg/manager/member/pd_member_manager.go:			newSet.Spec.Template.Spec.Containers[i].Env = append(newSet.Spec.Template.Spec.Containers[i].Env, corev1.EnvVar{Name: "INITIAL_REPLICAS", Value: fmt.Sprintf("%d", initialReplicas)})
charts/tidb-cluster/templates/pd-configmap.yaml:    if [[ ${ORDINAL} -lt ${INITIAL_REPLICAS} ]]
charts/tidb-cluster/templates/pd-configmap.yaml:        TOP=$((INITIAL_REPLICAS-1))

I'm following the Infoworld tutorial, Ubuntu 18.04 desktop:

morgo@ryzen:~/kube/tidb-operator$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.5", GitCommit:"32ac1c9073b132b8ba18aa830f46b77dcceb0723", GitTreeState:"clean", BuildDate:"2018-06-21T11:34:22Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
morgo@ryzen:~/kube/tidb-operator$ helm version
Client: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.9.1", GitCommit:"20adb27c7c5868466912eebdf6664e7390ebe710", GitTreeState:"clean"}
morgo@ryzen:~/kube/tidb-operator$ cat /etc/*release*
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

graceful upgrade feature proposal

why need the graceful upgrade

The current upgrade feature is a simple rolling upgrade which is provided by StatefulSet, the upgrading way has the following problems:

  • During the upgrade process, the database business will be affected. For example, if a tikv pod is the leader of multiple data regions, its upgrade will cause this part of the data to be temporarily unavailable until the cluster re-elects the leaders.
  • The current upgrade cannot ensure high availability. Although the statefulSet guarantees that the pod is upgraded one by one, its upgrade operate based on the state of a pod, but the pod state is running does not mean that tikv or pd is already available. Therefore, there may be multiple instances is unavailable in the upgrade process.

How to implements graceful upgrade

  1. The StatefulSet upgrade strategy use RollingUpdateStatefulSetStrategy and set Partition to control the upgrade
  2. Use the status provided by pd instead of the pod status to determine if it can be upgraded.
  3. Call pd's api to complete the leader's transfer or other operations before upgrade.

This feature will be split into multiple PRs in which more details will be added:

  • PD graceful upgrade #94
  • TiKV graceful upgrade #93
  • TiDB graceful upgrade #89

Dynamically configure TiKV according to CPU and memory limits

TiKV uses CPU and memory it sees to auto-configure block cache and other parameters if users don't configure these parameters. But CPU and memory upper bound in a Docker container are the same as the host machine regardless of the resources limit it has. So if users set CPU and memory limits for TiKV, TiKV would misconfigure itself.
We need to configure TiKV according to the actual CPU and memory limits. For example, Raft defaultcf block cache size needs to be set to 30%~50% of TiKV memory limits.

panic during startup

I saw this during startup. Note that it was able to restart successfully without seeing this error again. So I think it is an issue that occurs during cluster startup.

It seems like the line oldSet, err := tkmm.setLister.StatefulSets(ns).Get(controller.TiKVMemberName(tcName)) must have returned a nil pointer to oldSet?

kubectl logs -n test tidb-controller-manager-7b9bbb46d9-7pkrl
I1018 20:39:12.242431       1 version.go:37] Welcome to TiDB Operator.
I1018 20:39:12.242464       1 version.go:38] Git Commit Hash: e29b757b88848ef318a39ac34627e222b7f16b2e
I1018 20:39:12.242477       1 version.go:39] UTC Build Time:  2018-10-15 11:33:08
I1018 20:39:12.292318       1 leaderelection.go:175] attempting to acquire leader lease  test/tidb-controller-manager...
I1018 20:39:12.405856       1 leaderelection.go:184] successfully acquired lease test/tidb-controller-manager
I1018 20:39:12.405974       1 tidb_cluster_controller.go:200] Starting tidbcluster controller
E1018 20:39:14.591821       1 pd_member_manager.go:206] failed to sync TidbCluster: [test/db]'s status, error: Get http://db-pd.test:2379/pd/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I1018 20:39:14.611189       1 pod_control.go:78] Pod: [test/db-pd-0] updated successfully, TidbCluster: [test/db]
I1018 20:39:14.611934       1 event.go:218] Event(v1.ObjectReference{Kind:"TidbCluster", Namespace:"test", Name:"db", UID:"dd88a87b-d315-11e8-8232-42010a8a0035", APIVersion:"pingcap.com/v1alpha1", ResourceVersion:"17675", FieldPath:""}): type: 'Normal' reason: 'SuccessfulUpdate' update Pod db-pd-0 in TidbCluster db successful
I1018 20:39:14.612303       1 tidb_cluster_control.go:100] tidbcluster: [test/db]'s pd cluster is not running.
I1018 20:39:14.626988       1 tidbcluster_control.go:66] TidbCluster: [test/db] updated successfully
I1018 20:39:14.689320       1 event.go:218] Event(v1.ObjectReference{Kind:"TidbCluster", Namespace:"test", Name:"db", UID:"dd88a87b-d315-11e8-8232-42010a8a0035", APIVersion:"pingcap.com/v1alpha1", ResourceVersion:"17675", FieldPath:""}): type: 'Normal' reason: 'SuccessfulUpdate' update TidbCluster db successful
E1018 20:39:16.691178       1 pd_member_manager.go:206] failed to sync TidbCluster: [test/db]'s status, error: Get http://db-pd.test:2379/pd/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I1018 20:39:16.691879       1 tidb_cluster_control.go:100] tidbcluster: [test/db]'s pd cluster is not running.
E1018 20:39:37.108371       1 pd_member_manager.go:206] failed to sync TidbCluster: [test/db]'s status, error: Get http://db-pd.test:2379/pd/health: dial tcp 10.59.245.234:2379: connect: connection refused
I1018 20:39:37.109280       1 tidb_cluster_control.go:100] tidbcluster: [test/db]'s pd cluster is not running.
I1018 20:39:37.189549       1 tidbcluster_control.go:66] TidbCluster: [test/db] updated successfully
I1018 20:39:37.190368       1 event.go:218] Event(v1.ObjectReference{Kind:"TidbCluster", Namespace:"test", Name:"db", UID:"dd88a87b-d315-11e8-8232-42010a8a0035", APIVersion:"pingcap.com", ResourceVersion:"17694", FieldPath:""}): type: 'Normal' reason: 'SuccessfulUpdate' update TidbCluster db successful
E1018 20:39:37.192546       1 pd_member_manager.go:206] failed to sync TidbCluster: [test/db]'s status, error: Get http://db-pd.test:2379/pd/health: dial tcp 10.59.245.234:2379: connect: connection refused
I1018 20:39:37.193548       1 tidb_cluster_control.go:100] tidbcluster: [test/db]'s pd cluster is not running.
I1018 20:39:42.412132       1 event.go:218] Event(v1.ObjectReference{Kind:"TidbCluster", Namespace:"test", Name:"db", UID:"dd88a87b-d315-11e8-8232-42010a8a0035", APIVersion:"pingcap.com", ResourceVersion:"17750", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' create Service db-tikv-peer in TidbCluster db successful
I1018 20:39:42.425299       1 tidb_cluster_control.go:112] tidbcluster: [test/db]'s tikv cluster is not running.
I1018 20:39:42.425793       1 event.go:218] Event(v1.ObjectReference{Kind:"TidbCluster", Namespace:"test", Name:"db", UID:"dd88a87b-d315-11e8-8232-42010a8a0035", APIVersion:"pingcap.com", ResourceVersion:"17750", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' create StatefulSet db-tikv in TidbCluster db successful
I1018 20:39:42.443350       1 tidbcluster_control.go:66] TidbCluster: [test/db] updated successfully
I1018 20:39:42.445731       1 event.go:218] Event(v1.ObjectReference{Kind:"TidbCluster", Namespace:"test", Name:"db", UID:"dd88a87b-d315-11e8-8232-42010a8a0035", APIVersion:"pingcap.com", ResourceVersion:"17750", FieldPath:""}): type: 'Normal' reason: 'SuccessfulUpdate' update TidbCluster db successful
E1018 20:39:42.450730       1 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/usr/local/go/src/runtime/panic.go:63
/usr/local/go/src/runtime/signal_unix.go:388
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/manager/member/utils.go:68
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/manager/member/tikv_member_manager.go:446
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/manager/member/tikv_member_manager.go:173
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/manager/member/tikv_member_manager.go:107
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_control.go:105
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_control.go:77
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:266
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:262
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:229
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:216
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:208
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1309ff8]

goroutine 229 [running]:
github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x14fa1c0, 0x2227f70)
	/usr/local/go/src/runtime/panic.go:502 +0x229
github.com/pingcap/tidb-operator/pkg/manager/member.statefulSetIsUpgrading(...)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/manager/member/utils.go:68
github.com/pingcap/tidb-operator/pkg/manager/member.(*tikvMemberManager).syncTidbClusterStatus(0xc4203c5760, 0xc420575c00, 0xc42022fb00, 0xc42022fb00, 0x0)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/manager/member/tikv_member_manager.go:446 +0xf48
github.com/pingcap/tidb-operator/pkg/manager/member.(*tikvMemberManager).syncStatefulSetForTidbCluster(0xc4203c5760, 0xc420575c00, 0x1707f39, 0x4)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/manager/member/tikv_member_manager.go:173 +0x2a6
github.com/pingcap/tidb-operator/pkg/manager/member.(*tikvMemberManager).Sync(0xc4203c5760, 0xc420575c00, 0x1, 0x0)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/manager/member/tikv_member_manager.go:107 +0x178
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*defaultTidbClusterControl).updateTidbCluster(0xc4200383f0, 0xc420575c00, 0x0, 0x0)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_control.go:105 +0xbb
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*defaultTidbClusterControl).UpdateTidbCluster(0xc4200383f0, 0xc420575c00, 0xc420575c00, 0xc420575c00)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_control.go:77 +0x79
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*Controller).syncTidbCluster(0xc420038460, 0xc420575c00, 0x0, 0xc4205b7880)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:266 +0x3e
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*Controller).sync(0xc420038460, 0xc420938840, 0x7, 0x0, 0x0)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:262 +0x1be
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*Controller).processNextWorkItem(0xc420038460, 0xc4204d4a00)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:229 +0xec
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*Controller).worker(0xc420038460)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:216 +0x2b
github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*Controller).(github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.worker)-fm()
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:208 +0x2a
github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc42056eee0)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc42056eee0, 0x3b9aca00, 0x0, 0x1, 0xc420731c20)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc42056eee0, 0x3b9aca00, 0xc420731c20)
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by github.com/pingcap/tidb-operator/pkg/controller/tidbcluster.(*Controller).Run
	/home/jenkins/workspace/build_tidb_operator_master/go/src/github.com/pingcap/tidb-operator/pkg/controller/tidbcluster/tidb_cluster_controller.go:208 +0x1af

Automatic failover feature proposal

@tennix @onlymellb @xiaojingchen and @weekface discussed the automatic failover for PD, TiKV, and TiDB yesterday, this is a talk summary.

Failure definition

When a node of k8s node can't work or other reasons, the tidb component (PD/TiKV/TiDB) can't work properly (Obtained by query the pd API), the TiDB Operator needs to be able to observe this situation and take active intervention and increase the replicas, these scenarios include:

  1. pd member is not healthy for a while, can be obtained from PD Cluster's /pd/health API
  2. tikv state is not Up for a while, can be obtained from PD Cluster's /pd/api/v1/stores API
  3. tidb can't respond for a while, can be obtained from tidb instance's /status API

PD automatic failover

When a PD peer was observed as the failure for a while (add a LastTransitionTime attr to PDStatus) (can implement as a Informer like k8s), operator should:

  1. mark this peer as the failure
  2. invoke deleteMember api to delete this member from the pd cluster
  3. increase the replicas to add a new PD peer
  4. try to delete the PVC and Pod of this PD peer over and over, let StatefulSet create the new PD peer with the same ordinal, but not use the tombstone PV
  5. decrease the replicas when all members are ready again

TiKV automatic failover

When a TiKV peer was observed as failure for a while (add a LastTransitionTime attr to TiKVStatus) (can implement as a Informer like k8s also), for example: its state is not Up, operator just increase the replicas to add a new TiKV peer, that is all.

TiDB automatic failover

When a TiDB peer was observed as the failure for a while (add a LastTransitionTime attr to TiDBStatus) (can implement as a Informer like k8s also), operator should:

  1. increase the replicas to add a new TiDB peer
  2. decrease the replicas when all members are ready again

This feature will be split into multiple PRs and any suggestions will be welcomed.

  • add Informer function to pdControl
  • PD automatic failover, issue #57 PR #74
  • TiKV automatic failover, issue #58 PR #77
  • TiDB automatic failover, issue #59 PR #86

Allow user to specify pd/tikv/tidb configuration file in values.yaml

Currently, when users want to modify pd/tikv/tidb configuration file, they have to modify files in templates directory. Since there're template variables in these files, configuration files cannot be copy & paste. Besides, it's also impossible to customize programmatically. So we should allow users to specify configuration files in values.yaml.
By default, users don't need to provide configuration files in values.yaml and configuration files under templates folder will be used. And if advanced users provide in values.yaml, it will use that instead.

e2e: failed when upgrade a tidb cluster

Smoke
  should create a tidb cluster
  /local/go/src/github.com/pingcap/tidb-operator/tests/e2e/e2e_test.go:56
STEP: Then all members should running
Oct 24 06:04:24.922: INFO: statefulsets.apps "demo-cluster-pd" not found
Oct 24 06:04:29.891: INFO: statefulsets.apps "demo-cluster-pd" not found
Oct 24 06:04:34.894: INFO: statefulsets.apps "demo-cluster-pd" not found
Oct 24 06:04:39.899: INFO: pdSet.Status: {ObservedGeneration:0xc420204788 Replicas:1 ReadyReplicas:0 CurrentReplicas:1 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc420204798 Conditions:[]}
Oct 24 06:04:39.899: INFO: pdSet.Spec.Replicas(1) != tc.Spec.PD.Replicas(3)
Oct 24 06:04:44.892: INFO: pdSet.Status: {ObservedGeneration:0xc420205c08 Replicas:1 ReadyReplicas:0 CurrentReplicas:1 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc420205c18 Conditions:[]}
Oct 24 06:04:44.893: INFO: pdSet.Spec.Replicas(1) != tc.Spec.PD.Replicas(3)
Oct 24 06:04:49.893: INFO: pdSet.Status: {ObservedGeneration:0xc420651138 Replicas:1 ReadyReplicas:0 CurrentReplicas:1 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc420651148 Conditions:[]}
Oct 24 06:04:49.893: INFO: pdSet.Spec.Replicas(1) != tc.Spec.PD.Replicas(3)
Oct 24 06:04:54.892: INFO: pdSet.Status: {ObservedGeneration:0xc420684668 Replicas:1 ReadyReplicas:1 CurrentReplicas:1 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc420684678 Conditions:[]}
Oct 24 06:04:54.892: INFO: pdSet.Spec.Replicas(1) != tc.Spec.PD.Replicas(3)
Oct 24 06:04:59.893: INFO: pdSet.Status: {ObservedGeneration:0xc420685c18 Replicas:2 ReadyReplicas:1 CurrentReplicas:2 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc420685c28 Conditions:[]}
Oct 24 06:04:59.893: INFO: pdSet.Spec.Replicas(2) != tc.Spec.PD.Replicas(3)
Oct 24 06:05:04.892: INFO: pdSet.Status: {ObservedGeneration:0xc420685ff8 Replicas:2 ReadyReplicas:1 CurrentReplicas:2 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc420632108 Conditions:[]}
Oct 24 06:05:04.892: INFO: pdSet.Spec.Replicas(2) != tc.Spec.PD.Replicas(3)
Oct 24 06:05:09.894: INFO: pdSet.Status: {ObservedGeneration:0xc4205e3048 Replicas:2 ReadyReplicas:1 CurrentReplicas:2 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc4205e3058 Conditions:[]}
Oct 24 06:05:09.894: INFO: pdSet.Spec.Replicas(2) != tc.Spec.PD.Replicas(3)
Oct 24 06:05:14.893: INFO: pdSet.Status: {ObservedGeneration:0xc42055e6f8 Replicas:2 ReadyReplicas:2 CurrentReplicas:2 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc42055e708 Conditions:[]}
Oct 24 06:05:14.893: INFO: pdSet.Spec.Replicas(2) != tc.Spec.PD.Replicas(3)
Oct 24 06:05:19.892: INFO: pdSet.Status: {ObservedGeneration:0xc420515b98 Replicas:3 ReadyReplicas:2 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc420515ba8 Conditions:[]}
Oct 24 06:05:19.892: INFO: pdSet.Status.ReadyReplicas(2) != 3
Oct 24 06:05:24.892: INFO: pdSet.Status: {ObservedGeneration:0xc420204988 Replicas:3 ReadyReplicas:2 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc420204998 Conditions:[]}
Oct 24 06:05:24.893: INFO: pdSet.Status.ReadyReplicas(2) != 3
Oct 24 06:05:29.893: INFO: pdSet.Status: {ObservedGeneration:0xc42071a778 Replicas:3 ReadyReplicas:2 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc42071a788 Conditions:[]}
Oct 24 06:05:29.894: INFO: pdSet.Status.ReadyReplicas(2) != 3
Oct 24 06:05:34.893: INFO: pdSet.Status: {ObservedGeneration:0xc42076e5e8 Replicas:3 ReadyReplicas:2 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc42076e5f8 Conditions:[]}
Oct 24 06:05:34.893: INFO: pdSet.Status.ReadyReplicas(2) != 3
Oct 24 06:05:39.894: INFO: pdSet.Status: {ObservedGeneration:0xc4207ca4c8 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc4207ca4d8 Conditions:[]}
Oct 24 06:05:39.894: INFO: tc.Status.PD.Members count(2) != 3
Oct 24 06:05:44.893: INFO: pdSet.Status: {ObservedGeneration:0xc420684d08 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc420684d18 Conditions:[]}
Oct 24 06:05:44.905: INFO: tikvSet.Status: {ObservedGeneration:0xc420204ec0 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-599f9688 CollisionCount:0xc420204f20 Conditions:[]}
Oct 24 06:05:44.905: INFO: store(1) state != Up
Oct 24 06:05:49.894: INFO: pdSet.Status: {ObservedGeneration:0xc420017928 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc420017938 Conditions:[]}
Oct 24 06:05:49.906: INFO: tikvSet.Status: {ObservedGeneration:0xc4205157a0 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-599f9688 CollisionCount:0xc420515890 Conditions:[]}
Oct 24 06:05:49.906: INFO: store(7) state != Up
Oct 24 06:05:54.894: INFO: pdSet.Status: {ObservedGeneration:0xc420632d88 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc420632d98 Conditions:[]}
Oct 24 06:05:54.906: INFO: tikvSet.Status: {ObservedGeneration:0xc42074e410 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-599f9688 CollisionCount:0xc42074e470 Conditions:[]}
Oct 24 06:05:54.906: INFO: store(7) state != Up
Oct 24 06:05:59.893: INFO: pdSet.Status: {ObservedGeneration:0xc4206f4268 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-75998c4fb4 UpdateRevision:demo-cluster-pd-75998c4fb4 CollisionCount:0xc4206f4278 Conditions:[]}
Oct 24 06:05:59.903: INFO: tikvSet.Status: {ObservedGeneration:0xc4206f51d0 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-599f9688 CollisionCount:0xc4206f5230 Conditions:[]}
Oct 24 06:05:59.909: INFO: tidbSet.Status: {ObservedGeneration:0xc4206f5c70 Replicas:2 ReadyReplicas:2 CurrentReplicas:2 UpdatedReplicas:0 CurrentRevision:demo-cluster-tidb-76d745768b UpdateRevision:demo-cluster-tidb-76d745768b CollisionCount:0xc4206f5cd0 Conditions:[]}
Oct 24 06:05:59.923: INFO: pv: pvc-b12506c4-d752-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:05:59.926: INFO: pv: pvc-bb53b427-d752-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:05:59.934: INFO: pv: pvc-c7c457b6-d752-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:05:59.938: INFO: pv: pvc-c7d0bc3e-d752-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:05:59.941: INFO: pv: pvc-c7d5a8a3-d752-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:06:00.098: INFO: pv: pvc-c7daa839-d752-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
STEP: When create a table and add some data to this table
Oct 24 06:06:05.102: INFO: can't create table to mysql: dial tcp 10.63.241.14:4000: getsockopt: connection refused
Oct 24 06:06:10.102: INFO: can't create table to mysql: dial tcp 10.63.241.14:4000: getsockopt: connection refused
STEP: Then the data is correct

• [SLOW TEST:120.689 seconds]
Smoke
/local/go/src/github.com/pingcap/tidb-operator/tests/e2e/e2e_test.go:55
  should create a tidb cluster
  /local/go/src/github.com/pingcap/tidb-operator/tests/e2e/e2e_test.go:56
------------------------------
Smoke
  should upgrade a tidb cluster
  /local/go/src/github.com/pingcap/tidb-operator/tests/e2e/e2e_test.go:59
STEP: When upgrade TiDB cluster to newer version
Oct 24 06:06:25.593: INFO: Images after upgraded: PD: gcr.io/smooth-tendril-207212/pingcap/pd:v2.0.58, TiKV: pingcap/tikv:v2.0.5, TiDB: pingcap/tidb:v2.0.5
STEP: Then members should be upgrade in order: pd ==> tikv ==> tidb
Oct 24 06:06:30.610: INFO: pd is upgrading
Oct 24 06:06:35.610: INFO: pd is upgrading
Oct 24 06:06:40.610: INFO: pd is upgrading
Oct 24 06:06:45.609: INFO: pd is upgrading
Oct 24 06:06:50.608: INFO: pd is upgrading
Oct 24 06:06:55.611: INFO: pd is upgrading
Oct 24 06:07:00.610: INFO: pd is upgrading
Oct 24 06:07:05.610: INFO: pd is upgrading
Oct 24 06:07:10.608: INFO: pd is upgrading
Oct 24 06:07:15.612: INFO: pd is upgrading
Oct 24 06:07:20.608: INFO: pd is upgrading
Oct 24 06:07:25.724: INFO: pd is upgrading
Oct 24 06:07:30.609: INFO: pd is upgrading
Oct 24 06:07:35.616: INFO: pd is upgrading
Oct 24 06:07:40.610: INFO: pd is upgrading
Oct 24 06:07:45.609: INFO: pd is upgrading
Oct 24 06:07:50.610: INFO: pd is upgrading
Oct 24 06:07:55.622: INFO: pd is upgrading
Oct 24 06:08:00.612: INFO: pd is upgrading
Oct 24 06:08:05.942: INFO: pd is upgrading

• Failure [105.368 seconds]
Smoke
/local/go/src/github.com/pingcap/tidb-operator/tests/e2e/e2e_test.go:55
  should upgrade a tidb cluster [It]
  /local/go/src/github.com/pingcap/tidb-operator/tests/e2e/e2e_test.go:59

  Expected
      <bool>: true
  to be false

  /local/go/src/github.com/pingcap/tidb-operator/tests/e2e/upgrade.go:109
------------------------------
Smoke
  should scale in/out a tidb cluster
  /local/go/src/github.com/pingcap/tidb-operator/tests/e2e/e2e_test.go:62
STEP: When scale out TiDB cluster: pd ==> [5], tikv ==> [5], tidb ==> [3]
Oct 24 06:08:11.489: INFO: Replicas after scaled out: PD: 5 , TiKV: 5, TiDB: 3
STEP: Then TiDB cluster should scale out successfully
Oct 24 06:08:16.497: INFO: pdSet.Status: {ObservedGeneration:0xc420493128 Replicas:5 ReadyReplicas:3 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420493138 Conditions:[]}
Oct 24 06:08:16.497: INFO: pdSet.Status.ReadyReplicas(3) != 5
Oct 24 06:08:21.498: INFO: pdSet.Status: {ObservedGeneration:0xc4207a3568 Replicas:5 ReadyReplicas:3 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4207a3578 Conditions:[]}
Oct 24 06:08:21.499: INFO: pdSet.Status.ReadyReplicas(3) != 5
Oct 24 06:08:26.497: INFO: pdSet.Status: {ObservedGeneration:0xc4208cedb8 Replicas:5 ReadyReplicas:3 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4208cedc8 Conditions:[]}
Oct 24 06:08:26.498: INFO: pdSet.Status.ReadyReplicas(3) != 5
Oct 24 06:08:31.503: INFO: pdSet.Status: {ObservedGeneration:0xc420684bd8 Replicas:5 ReadyReplicas:4 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420684be8 Conditions:[]}
Oct 24 06:08:31.503: INFO: pdSet.Status.ReadyReplicas(4) != 5
Oct 24 06:08:36.499: INFO: pdSet.Status: {ObservedGeneration:0xc420514d88 Replicas:5 ReadyReplicas:4 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420514d98 Conditions:[]}
Oct 24 06:08:36.499: INFO: pdSet.Status.ReadyReplicas(4) != 5
Oct 24 06:08:41.501: INFO: pdSet.Status: {ObservedGeneration:0xc4202f73c8 Replicas:5 ReadyReplicas:4 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4202f73d8 Conditions:[]}
Oct 24 06:08:41.502: INFO: pdSet.Status.ReadyReplicas(4) != 5
Oct 24 06:08:46.496: INFO: pdSet.Status: {ObservedGeneration:0xc4209a99a8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4209a99b8 Conditions:[]}
Oct 24 06:08:46.496: INFO: tc.Status.PD.Members count(4) != 5
Oct 24 06:08:51.498: INFO: pdSet.Status: {ObservedGeneration:0xc4206e5e78 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4206e5e88 Conditions:[]}
Oct 24 06:08:51.498: INFO: tc.Status.PD.Members count(4) != 5
Oct 24 06:08:56.498: INFO: pdSet.Status: {ObservedGeneration:0xc420758278 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420758288 Conditions:[]}
Oct 24 06:08:56.509: INFO: tikvSet.Status: {ObservedGeneration:0xc4207591e0 Replicas:3 ReadyReplicas:2 CurrentReplicas:2 UpdatedReplicas:1 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420759250 Conditions:[]}
Oct 24 06:08:56.509: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:09:01.497: INFO: pdSet.Status: {ObservedGeneration:0xc4209db608 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4209db618 Conditions:[]}
Oct 24 06:09:01.506: INFO: tikvSet.Status: {ObservedGeneration:0xc4209a8570 Replicas:3 ReadyReplicas:2 CurrentReplicas:2 UpdatedReplicas:1 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4209a85e0 Conditions:[]}
Oct 24 06:09:01.507: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:09:06.497: INFO: pdSet.Status: {ObservedGeneration:0xc4207eb938 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4207eb948 Conditions:[]}
Oct 24 06:09:06.510: INFO: tikvSet.Status: {ObservedGeneration:0xc4202f68f0 Replicas:3 ReadyReplicas:2 CurrentReplicas:2 UpdatedReplicas:1 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4202f6960 Conditions:[]}
Oct 24 06:09:06.511: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:09:11.498: INFO: pdSet.Status: {ObservedGeneration:0xc420514b78 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420514b98 Conditions:[]}
Oct 24 06:09:11.509: INFO: tikvSet.Status: {ObservedGeneration:0xc420016bd0 Replicas:3 ReadyReplicas:3 CurrentReplicas:2 UpdatedReplicas:1 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420016cc0 Conditions:[]}
Oct 24 06:09:11.509: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:09:16.504: INFO: pdSet.Status: {ObservedGeneration:0xc420685ee8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420685ef8 Conditions:[]}
Oct 24 06:09:16.516: INFO: tikvSet.Status: {ObservedGeneration:0xc4207914a0 Replicas:3 ReadyReplicas:3 CurrentReplicas:1 UpdatedReplicas:1 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420791510 Conditions:[]}
Oct 24 06:09:16.516: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:09:21.498: INFO: pdSet.Status: {ObservedGeneration:0xc4207c8d98 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4207c8dc8 Conditions:[]}
Oct 24 06:09:21.507: INFO: tikvSet.Status: {ObservedGeneration:0xc4207580d0 Replicas:3 ReadyReplicas:2 CurrentReplicas:1 UpdatedReplicas:1 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420758140 Conditions:[]}
Oct 24 06:09:21.508: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:09:26.498: INFO: pdSet.Status: {ObservedGeneration:0xc4209ef018 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4209ef028 Conditions:[]}
Oct 24 06:09:26.510: INFO: tikvSet.Status: {ObservedGeneration:0xc4209eff80 Replicas:3 ReadyReplicas:2 CurrentReplicas:1 UpdatedReplicas:1 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4209efff0 Conditions:[]}
Oct 24 06:09:26.510: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:09:31.498: INFO: pdSet.Status: {ObservedGeneration:0xc42072afd8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc42072afe8 Conditions:[]}
Oct 24 06:09:31.509: INFO: tikvSet.Status: {ObservedGeneration:0xc420790820 Replicas:3 ReadyReplicas:2 CurrentReplicas:1 UpdatedReplicas:1 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420790890 Conditions:[]}
Oct 24 06:09:31.510: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:09:36.498: INFO: pdSet.Status: {ObservedGeneration:0xc42027a7e8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc42027a818 Conditions:[]}
Oct 24 06:09:36.509: INFO: tikvSet.Status: {ObservedGeneration:0xc4200168c0 Replicas:3 ReadyReplicas:2 CurrentReplicas:1 UpdatedReplicas:1 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420016960 Conditions:[]}
Oct 24 06:09:36.509: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:09:41.496: INFO: pdSet.Status: {ObservedGeneration:0xc4202f6478 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4202f6488 Conditions:[]}
Oct 24 06:09:41.507: INFO: tikvSet.Status: {ObservedGeneration:0xc4202f7400 Replicas:3 ReadyReplicas:2 CurrentReplicas:1 UpdatedReplicas:1 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4202f7470 Conditions:[]}
Oct 24 06:09:41.507: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:09:46.497: INFO: pdSet.Status: {ObservedGeneration:0xc4207eeb18 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4207eeb28 Conditions:[]}
Oct 24 06:09:46.508: INFO: tikvSet.Status: {ObservedGeneration:0xc4207efa80 Replicas:3 ReadyReplicas:2 CurrentReplicas:1 UpdatedReplicas:1 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4207efaf0 Conditions:[]}
Oct 24 06:09:46.509: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:09:51.496: INFO: pdSet.Status: {ObservedGeneration:0xc4207ebdb8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4207ebdc8 Conditions:[]}
Oct 24 06:09:51.511: INFO: tikvSet.Status: {ObservedGeneration:0xc4202f6d30 Replicas:3 ReadyReplicas:2 CurrentReplicas:1 UpdatedReplicas:1 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4202f6da0 Conditions:[]}
Oct 24 06:09:51.511: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:09:56.496: INFO: pdSet.Status: {ObservedGeneration:0xc420515168 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420515178 Conditions:[]}
Oct 24 06:09:56.504: INFO: tikvSet.Status: {ObservedGeneration:0xc4200176d0 Replicas:3 ReadyReplicas:2 CurrentReplicas:1 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420017890 Conditions:[]}
Oct 24 06:09:56.505: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:10:01.497: INFO: pdSet.Status: {ObservedGeneration:0xc420790538 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420790548 Conditions:[]}
Oct 24 06:10:01.509: INFO: tikvSet.Status: {ObservedGeneration:0xc420791860 Replicas:3 ReadyReplicas:2 CurrentReplicas:1 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4207918d0 Conditions:[]}
Oct 24 06:10:01.509: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:10:06.497: INFO: pdSet.Status: {ObservedGeneration:0xc4207ef018 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4207ef028 Conditions:[]}
Oct 24 06:10:06.513: INFO: tikvSet.Status: {ObservedGeneration:0xc420956410 Replicas:3 ReadyReplicas:3 CurrentReplicas:1 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4209564b0 Conditions:[]}
Oct 24 06:10:06.513: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:10:11.496: INFO: pdSet.Status: {ObservedGeneration:0xc4208b7f08 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4208b7f18 Conditions:[]}
Oct 24 06:10:11.509: INFO: tikvSet.Status: {ObservedGeneration:0xc420750e70 Replicas:3 ReadyReplicas:3 CurrentReplicas:1 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420750ee0 Conditions:[]}
Oct 24 06:10:11.510: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:10:16.497: INFO: pdSet.Status: {ObservedGeneration:0xc4208e32c8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4208e32d8 Conditions:[]}
Oct 24 06:10:16.508: INFO: tikvSet.Status: {ObservedGeneration:0xc420974230 Replicas:3 ReadyReplicas:3 CurrentReplicas:1 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4209742a0 Conditions:[]}
Oct 24 06:10:16.509: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:10:21.497: INFO: pdSet.Status: {ObservedGeneration:0xc420791a48 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420791a58 Conditions:[]}
Oct 24 06:10:21.509: INFO: tikvSet.Status: {ObservedGeneration:0xc420684c00 Replicas:3 ReadyReplicas:3 CurrentReplicas:1 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420684c70 Conditions:[]}
Oct 24 06:10:21.509: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:10:26.497: INFO: pdSet.Status: {ObservedGeneration:0xc420514d78 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420514d88 Conditions:[]}
Oct 24 06:10:26.506: INFO: tikvSet.Status: {ObservedGeneration:0xc42055e900 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc42055eab0 Conditions:[]}
Oct 24 06:10:26.506: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:10:31.498: INFO: pdSet.Status: {ObservedGeneration:0xc4207ea798 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4207ea7a8 Conditions:[]}
Oct 24 06:10:31.508: INFO: tikvSet.Status: {ObservedGeneration:0xc420750250 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4207502b0 Conditions:[]}
Oct 24 06:10:31.508: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:10:36.498: INFO: pdSet.Status: {ObservedGeneration:0xc4209750a8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4209750b8 Conditions:[]}
Oct 24 06:10:36.509: INFO: tikvSet.Status: {ObservedGeneration:0xc42075a010 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc42075a070 Conditions:[]}
Oct 24 06:10:36.509: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:10:41.497: INFO: pdSet.Status: {ObservedGeneration:0xc4208a4498 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4208a44a8 Conditions:[]}
Oct 24 06:10:41.508: INFO: tikvSet.Status: {ObservedGeneration:0xc4208a5400 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4208a5460 Conditions:[]}
Oct 24 06:10:41.508: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:10:46.496: INFO: pdSet.Status: {ObservedGeneration:0xc420750df8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420750e08 Conditions:[]}
Oct 24 06:10:46.505: INFO: tikvSet.Status: {ObservedGeneration:0xc420751d60 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420751dc0 Conditions:[]}
Oct 24 06:10:46.506: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:10:51.497: INFO: pdSet.Status: {ObservedGeneration:0xc4205a9898 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4205a98b8 Conditions:[]}
Oct 24 06:10:51.506: INFO: tikvSet.Status: {ObservedGeneration:0xc42055f580 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc42055f6c0 Conditions:[]}
Oct 24 06:10:51.506: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:10:56.496: INFO: pdSet.Status: {ObservedGeneration:0xc420205dc8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420205dd8 Conditions:[]}
Oct 24 06:10:56.506: INFO: tikvSet.Status: {ObservedGeneration:0xc420684d40 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420684da0 Conditions:[]}
Oct 24 06:10:56.507: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:11:01.496: INFO: pdSet.Status: {ObservedGeneration:0xc4208cf148 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4208cf158 Conditions:[]}
Oct 24 06:11:01.506: INFO: tikvSet.Status: {ObservedGeneration:0xc4208a40b0 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4208a4110 Conditions:[]}
Oct 24 06:11:01.506: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:11:06.496: INFO: pdSet.Status: {ObservedGeneration:0xc420491a48 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420491a58 Conditions:[]}
Oct 24 06:11:06.505: INFO: tikvSet.Status: {ObservedGeneration:0xc4208de9b0 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4208dea10 Conditions:[]}
Oct 24 06:11:06.505: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:11:11.497: INFO: pdSet.Status: {ObservedGeneration:0xc420966e38 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420966e48 Conditions:[]}
Oct 24 06:11:11.508: INFO: tikvSet.Status: {ObservedGeneration:0xc420967da0 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420967e00 Conditions:[]}
Oct 24 06:11:11.508: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:11:16.498: INFO: pdSet.Status: {ObservedGeneration:0xc4208ce7b8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4208ce7c8 Conditions:[]}
Oct 24 06:11:16.509: INFO: tikvSet.Status: {ObservedGeneration:0xc4208cf720 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4208cf780 Conditions:[]}
Oct 24 06:11:16.509: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:11:21.498: INFO: pdSet.Status: {ObservedGeneration:0xc420685828 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420685838 Conditions:[]}
Oct 24 06:11:21.507: INFO: tikvSet.Status: {ObservedGeneration:0xc420204ad0 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:2 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420204b30 Conditions:[]}
Oct 24 06:11:21.507: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:11:26.496: INFO: pdSet.Status: {ObservedGeneration:0xc42055f7f8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc42055f808 Conditions:[]}
Oct 24 06:11:26.506: INFO: tikvSet.Status: {ObservedGeneration:0xc4205a9810 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:3 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4205a98d0 Conditions:[]}
Oct 24 06:11:26.506: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:11:31.498: INFO: pdSet.Status: {ObservedGeneration:0xc4209769a8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4209769b8 Conditions:[]}
Oct 24 06:11:31.509: INFO: tikvSet.Status: {ObservedGeneration:0xc420977930 Replicas:3 ReadyReplicas:2 CurrentReplicas:0 UpdatedReplicas:3 CurrentRevision:demo-cluster-tikv-599f9688 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420977990 Conditions:[]}
Oct 24 06:11:31.510: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:11:36.498: INFO: pdSet.Status: {ObservedGeneration:0xc4207b3d28 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4207b3d38 Conditions:[]}
Oct 24 06:11:36.511: INFO: tikvSet.Status: {ObservedGeneration:0xc4208b6c90 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4208b6cf0 Conditions:[]}
Oct 24 06:11:36.511: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:11:41.498: INFO: pdSet.Status: {ObservedGeneration:0xc4207ebda8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4207ebdb8 Conditions:[]}
Oct 24 06:11:41.514: INFO: tikvSet.Status: {ObservedGeneration:0xc4202f6d10 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4202f6d70 Conditions:[]}
Oct 24 06:11:41.514: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:11:46.497: INFO: pdSet.Status: {ObservedGeneration:0xc420515158 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420515168 Conditions:[]}
Oct 24 06:11:46.507: INFO: tikvSet.Status: {ObservedGeneration:0xc4200176c0 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420017870 Conditions:[]}
Oct 24 06:11:46.507: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:11:51.497: INFO: pdSet.Status: {ObservedGeneration:0xc420790538 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420790548 Conditions:[]}
Oct 24 06:11:51.508: INFO: tikvSet.Status: {ObservedGeneration:0xc420791860 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4207918c0 Conditions:[]}
Oct 24 06:11:51.508: INFO: tikvSet.Spec.Replicas(3) != tc.Spec.TiKV.Replicas(5)
Oct 24 06:11:56.497: INFO: pdSet.Status: {ObservedGeneration:0xc4207b20f8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4207b2108 Conditions:[]}
Oct 24 06:11:56.507: INFO: tikvSet.Status: {ObservedGeneration:0xc4207b3060 Replicas:5 ReadyReplicas:3 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4207b30c0 Conditions:[]}
Oct 24 06:11:56.508: INFO: tikvSet.Status.ReadyReplicas(3) != 5
Oct 24 06:12:01.497: INFO: pdSet.Status: {ObservedGeneration:0xc42079ba48 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc42079ba58 Conditions:[]}
Oct 24 06:12:01.506: INFO: tikvSet.Status: {ObservedGeneration:0xc4206f09b0 Replicas:5 ReadyReplicas:3 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4206f0a10 Conditions:[]}
Oct 24 06:12:01.506: INFO: tikvSet.Status.ReadyReplicas(3) != 5
Oct 24 06:12:06.498: INFO: pdSet.Status: {ObservedGeneration:0xc42093d7a8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc42093d7b8 Conditions:[]}
Oct 24 06:12:06.509: INFO: tikvSet.Status: {ObservedGeneration:0xc4207b2710 Replicas:5 ReadyReplicas:3 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4207b2770 Conditions:[]}
Oct 24 06:12:06.509: INFO: tikvSet.Status.ReadyReplicas(3) != 5
Oct 24 06:12:11.507: INFO: pdSet.Status: {ObservedGeneration:0xc42072a398 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc42072a3a8 Conditions:[]}
Oct 24 06:12:11.521: INFO: tikvSet.Status: {ObservedGeneration:0xc42072bad0 Replicas:5 ReadyReplicas:4 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc42072bb60 Conditions:[]}
Oct 24 06:12:11.522: INFO: tikvSet.Status.ReadyReplicas(4) != 5
Oct 24 06:12:16.496: INFO: pdSet.Status: {ObservedGeneration:0xc42027b858 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc42027b868 Conditions:[]}
Oct 24 06:12:16.504: INFO: tikvSet.Status: {ObservedGeneration:0xc4205143c0 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420514420 Conditions:[]}
Oct 24 06:12:16.505: INFO: store(1002) state != Up
Oct 24 06:12:21.498: INFO: pdSet.Status: {ObservedGeneration:0xc4202f7df8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4202f7e08 Conditions:[]}
Oct 24 06:12:21.507: INFO: tikvSet.Status: {ObservedGeneration:0xc4207ebd50 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4207ebdb0 Conditions:[]}
Oct 24 06:12:21.507: INFO: store(1001) state != Up
Oct 24 06:12:26.498: INFO: pdSet.Status: {ObservedGeneration:0xc4207ebe08 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4207ebe18 Conditions:[]}
Oct 24 06:12:26.508: INFO: tikvSet.Status: {ObservedGeneration:0xc4202f6dc0 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4202f6e20 Conditions:[]}
Oct 24 06:12:26.508: INFO: store(1002) state != Up
Oct 24 06:12:31.497: INFO: pdSet.Status: {ObservedGeneration:0xc42027a2b8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc42027a2d8 Conditions:[]}
Oct 24 06:12:31.508: INFO: tikvSet.Status: {ObservedGeneration:0xc42027bdb0 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc42027be20 Conditions:[]}
Oct 24 06:12:31.508: INFO: store(1002) state != Up
Oct 24 06:12:36.505: INFO: pdSet.Status: {ObservedGeneration:0xc42072b5e8 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc42072b608 Conditions:[]}
Oct 24 06:12:36.516: INFO: tikvSet.Status: {ObservedGeneration:0xc420734df0 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420734e60 Conditions:[]}
Oct 24 06:12:36.516: INFO: store(1002) state != Up
Oct 24 06:12:41.498: INFO: pdSet.Status: {ObservedGeneration:0xc420771128 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420771138 Conditions:[]}
Oct 24 06:12:41.512: INFO: tikvSet.Status: {ObservedGeneration:0xc4207a4090 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4207a40f0 Conditions:[]}
Oct 24 06:12:41.512: INFO: store(1002) state != Up
Oct 24 06:12:46.497: INFO: pdSet.Status: {ObservedGeneration:0xc4209e7638 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc4209e7648 Conditions:[]}
Oct 24 06:12:46.515: INFO: tikvSet.Status: {ObservedGeneration:0xc4208ceaf0 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4208ceb50 Conditions:[]}
Oct 24 06:12:46.516: INFO: store(1002) state != Up
Oct 24 06:12:51.503: INFO: pdSet.Status: {ObservedGeneration:0xc420204028 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420204038 Conditions:[]}
Oct 24 06:12:51.514: INFO: tikvSet.Status: {ObservedGeneration:0xc420204fe0 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc420205040 Conditions:[]}
Oct 24 06:12:51.514: INFO: store(1002) state != Up
Oct 24 06:12:56.503: INFO: pdSet.Status: {ObservedGeneration:0xc420205f98 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-pd-6cdcbcc6c6 UpdateRevision:demo-cluster-pd-6cdcbcc6c6 CollisionCount:0xc420205fa8 Conditions:[]}
Oct 24 06:12:56.521: INFO: tikvSet.Status: {ObservedGeneration:0xc4205e3ef0 Replicas:5 ReadyReplicas:5 CurrentReplicas:5 UpdatedReplicas:0 CurrentRevision:demo-cluster-tikv-746bdd7954 UpdateRevision:demo-cluster-tikv-746bdd7954 CollisionCount:0xc4205e3f50 Conditions:[]}
Oct 24 06:12:56.528: INFO: tidbSet.Status: {ObservedGeneration:0xc4206330e0 Replicas:3 ReadyReplicas:3 CurrentReplicas:3 UpdatedReplicas:0 CurrentRevision:demo-cluster-tidb-8596bb8786 UpdateRevision:demo-cluster-tidb-8596bb8786 CollisionCount:0xc420633190 Conditions:[]}
Oct 24 06:12:56.540: INFO: pv: pvc-b12506c4-d752-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:12:56.544: INFO: pv: pvc-bb53b427-d752-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:12:56.547: INFO: pv: pvc-c7c457b6-d752-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:12:56.551: INFO: pv: pvc-2fb4176c-d753-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:12:56.708: INFO: pv: pvc-2fef7d20-d753-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:12:56.907: INFO: pv: pvc-c7d0bc3e-d752-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:12:57.106: INFO: pv: pvc-c7d5a8a3-d752-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:12:57.306: INFO: pv: pvc-c7daa839-d752-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:12:57.507: INFO: pv: pvc-b31725b4-d753-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
Oct 24 06:12:57.707: INFO: pv: pvc-b32721b1-d753-11e8-8d3b-42010a80027e's persistentVolumeReclaimPolicy is Retain
STEP: And should scaled out correctly

• Failure [296.776 seconds]
Smoke
/local/go/src/github.com/pingcap/tidb-operator/tests/e2e/e2e_test.go:55
  should scale in/out a tidb cluster [It]
  /local/go/src/github.com/pingcap/tidb-operator/tests/e2e/e2e_test.go:62

  Expected error:
      <*errors.errorString | 0xc420468f60>: {
          s: "pod: [demo-cluster-tikv-0] have be recreated",
      }
      pod: [demo-cluster-tikv-0] have be recreated
  not to have occurred

  /local/go/src/github.com/pingcap/tidb-operator/tests/e2e/scale.go:51
------------------------------


Summarizing 2 Failures:

[Fail] Smoke [It] should upgrade a tidb cluster
/local/go/src/github.com/pingcap/tidb-operator/tests/e2e/upgrade.go:109

[Fail] Smoke [It] should scale in/out a tidb cluster
/local/go/src/github.com/pingcap/tidb-operator/tests/e2e/scale.go:51

Ran 3 of 3 Specs in 599.677 seconds
FAIL! -- 1 Passed | 2 Failed | 0 Pending | 0 Skipped
--- FAIL: TestE2E (599.68s)
FAIL

Ginkgo ran 1 suite in 9m59.741046613s
Test Suite Failed

TiKV automatic failover

When a TiKV peer's state is not Up for a while, operator should just increase the replicas to add a new TiKV peer, that is all.

port of #47

SSD with AWS EKS Kubernetes

Hi there,

I am trying to following this guide to setup tidb, while working with a Kubernetes cloud started via an AWS EKS service (via Terraform). (You may see more info here as I am trying to note down the steps here: #71)

However when onto the steps of Deploy your first TiDB cluster, with commandhelm install ./charts/tidb-cluster -n tidb --namespace=tidb --set pd.storageClassName=pd-ssd,tikv.storageClassName=pd-ssd

I get error from Kube Dashboard showing me the pd pods are not started successfully:

0/2 nodes are available: 2 node(s) didn't find available persistent volumes to bind.

Questions:
For demo purpose, I guess I can get it over with --set pd.storageClassName=pd,tikv.storageClassName=pd ?

And more importantly, if I would like to setup tidb to use ssd with AWS, do you have some guide on how to achieve that? Thanks :)

Need a new error type to requeue the items

Now the tidbcluster Controller has a attribute queue which is used for tracking the items that should be processed.

If the sync method returns an error, we log the error message and requeue the item, else forget it.

So if we have a task that needs several syncs, it needs to requeue this item after the first sync, but don't log the error (because it is not an error, we just want to process the item again), we should defines a new error type (maybe named RequeueError), if ther error returns from the sync method is a RequeueError type, don't log the error message.

Setting timezone correctly for TiDB cluster

We're using alpine Linux as the base image for pd/tikv/tidb Docker image. Currently, in kubernetes we mount host machine /etc/localtime to pod, so the log timestamp is the same with the host machine.

But tidb/tikv also need to know timezone name which cannot retrieve only from /etc/localtime. Normally on a real machine, /etc/localtime is a soft link to /usr/share/zoneinfo/<timezone-name>. With the soft link, TiDB and TiKV can retrieve timezone name. Also, some Linux distros set timezone name in /etc/timezone or /etc/TZ and Alpine Linux uses TZ environment variable.

So on Kubernetes, we need to use a soft link for /etc/localtime instead of just mount host's /etc/localtime. Besides we'd better set TZ environment and /etc/TZ too to make programs funtion well in containers.

Note: Previous released Docker images on DockerHub doesn't contain tzdata package which will never work for this. Only newer version >= v2.0.7 contains tzdata package. So if users want to use older version of TiDB, then they have to build PD/TiKV/TiDB Docker images themselves to include tzdata package.

TiDB cluster full backup feature

TiDB uses mydumper to achieve the full backup.

With TiDB Operator, we need to use helm to create a PVC and a CronJob, the CronJob will create Pods which will all use the same PVC and PV.

In every Pods created by CronJob, we should use mydumper to backup the whole TiDB cluster's data to the different unique data directory in PV.

Manage tidb service by charts

Services of tidb cluster include two types, the headless services and external access services. now they all maintained by tidb operator.

The headless services are used for network communication between internal pods and are invisible to users, so they should be maintained by the operator itself, but the tidb service is exposed to the user. It should be modified by the user if needed, so we should move the tidb service to charts and manage it by Helm.

TiKV graceful upgrade

TiKV graceful upgrade steps:

  1. call the pd's api pd/api/v1/schedulers with the value {"evict-leader-scheduler":storeID}, then pd will start a scheduler to evict leaders in the tikv.
  2. wait util all leaders in the tikv have be evicted
  3. call call the pd's api pd/api/v1/schedulers/{storeID} to delete the scheduler
  4. check the tikv's status until they are all available then continue to upgrade the next tikv pod

Google Cloud TiDB cluster deployment fails on kubernetes versions >1.9.7-gke.6

Following through the google-kubernetes-tutorial on a Kubernetes cluster running version 1.10.7-gke.6, everything works fine until the point of deploying the cluster:

helm install ./charts/tidb-cluster -n tidb --namespace=tidb --set pd.storageClassName=pd-ssd,tikv.storageClassName=pd-ssd

Then,

watch kubectl get pods --namespace tidb -o wide
NAME                              READY     STATUS      RESTARTS   AGE       IP            NODE
demo-monitor-5bc85fdb7f-cllxw     2/2       Running     0          5m        10.16.2.81    gke-xxxxx-default-pool-4f916ebe-lss5
demo-monitor-configurator-zssv6   0/1       Completed   0          5m        10.16.0.164   gke-xxxxx-default-pool-4f916ebe-1mlb
demo-pd-0                         0/1       Pending     0          5m        <none>        <none>

It hangs like this indefinitely waiting for the PVC to start. The pd-demo-pd-0 PVC is stuck in a "Pending" state with the message "waiting for first consumer to be created before binding".

This may be an issue with kubernetes itself.

PD graceful upgrade

pd graceful upgrade must evict leader to other member before upgrade the member.
the major logic of pd upgrade:

        for i = replicas - 1; i >= 0; i-- {
		if !canUpgrade(i) {
			continue
		}
		currentLeader := queryLeader()
		if pd[i].memberID != currentLeader {
			upgrade(i)
		} else {
			if i == replicas-1 {
				transferTo(0)
			} else {
				transferTo(replicas - 1)
			}
		}
	}

pd's leader api:

  • query current leader : /pd/api/v1/leader
  • transfer leader to special member : /pd/api/v1/leader/transfer/{memberName}

label's key names are not uniform between monitor and pd/tidb/tikv

the monitor's deployments and pods create by helm, their label keys is difference with pd/tikv/tidb's label keys.

monitor's labels:
app=tidb-cluster,chart=tidb-cluster-0.1.0,component=monitor,heritage=Tiller,release=tidb-cluster-e2e
pd's labels:
cluster.pingcap.com/app=pd,cluster.pingcap.com/owner=tidbCluster,cluster.pingcap.com/tidbCluster=demo-cluster

we need to unify their label keys naming conventions for management and query.

  1. should decide whether to use long or short names
    long name: cluster.pingcap.com/app
    short name: app
  2. should define the unified label key names
    owner=tidb-operator
    cluster=tidb cluster name
    app=pd/tidb/tikv/monitor
  3. limited only use lowercase characters and ‘-’

The above content needs to be discussed,please makes your suggestions

the pd and tikv parallel upgrade when pd sync status failed

e2e log:

• Failure [10.027 seconds]
Smoke
/var/src/github.com/pingcap/tidb-operator/tests/e2e/e2e_test.go:55
  should upgrade a tidb cluster [It]
  /var/src/github.com/pingcap/tidb-operator/tests/e2e/e2e_test.go:59

  Expected
      <v1alpha1.MemberPhase>: Upgrade
  not to equal
      <v1alpha1.MemberPhase>: Upgrade

  /var/src/github.com/pingcap/tidb-operator/tests/e2e/upgrade.go:105

the cause:
when sync pd status error ,we have not set pd phase to upgrade, so the tikv start to upgrade. after a while, pd sync success, pd start to upgrade. then pd and tikv become parallel upgrade.

Add tolerations onto pods of tidb cluster

Pods of tidb clusters are necessary to be deployed on dedicated nodes under certain scenarios. Some taints may be added to these dedicated nodes to ensure that other pods will not be scheduled onto them. So maybe we should apply corresponding tolerations to tidb pods?

configurator job is unstable

I am seeing a lot of variance in the configurator job. Sometimes it completes successfully in a minute. Other times it takes much longer to the point that I assume it is failing.

@tennix identified that this separate job is not necessary in Grafana version 5.3 if we adjust update our chart configuration to work on 5.3.

The usages of retry.RetryOnConflict are wrong

Now the usages of retry.RetryOnConflict are wrong in operator. We must get the latest object from kube-apiserveragain and populate the changes before another update if the server returns a conflicting write.

The correct usage is:

err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
tc.Status = *status
tc.Spec.PD.Replicas = pdReplicas
var updateErr error
updateTC, updateErr = rtc.cli.PingcapV1alpha1().TidbClusters(ns).Update(tc)
if updateErr == nil {
glog.Infof("TidbCluster: [%s/%s] updated successfully", ns, tcName)
return nil
}
glog.Errorf("failed to update TidbCluster: [%s/%s], error: %v", ns, tcName, updateErr)
if updated, err := rtc.tcLister.TidbClusters(ns).Get(tcName); err == nil {
// make a copy so we don't mutate the shared cache
tc = updated.DeepCopy()
} else {
utilruntime.HandleError(fmt.Errorf("error getting updated TidbCluster %s/%s from lister: %v", ns, tcName, err))
}
return updateErr
})

The wrong ones are:

err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
// TODO: verify if StatefulSet identity(name, namespace, labels) matches TidbCluster
_, updateErr := sc.kubeCli.AppsV1beta1().StatefulSets(tc.Namespace).Update(set)
if updateErr == nil {
return nil
}
if updated, err := sc.setLister.StatefulSets(tc.Namespace).Get(set.Name); err != nil {
// make a copy so we don't mutate the shared cache
set = updated.DeepCopy()
} else {
utilruntime.HandleError(fmt.Errorf("error getting updated StatefulSet %s/%s from lister: %v", tc.Namespace, set.Name, err))
}
return updateErr
})

err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
_, err := rpc.kubeCli.CoreV1().Pods(ns).Update(pod)
if err != nil {
glog.Errorf("failed to update pod %s/%s with cluster labels %v, TidbCluster: %s, err: %v", ns, podName, labels, tcName, err)
return err
}
glog.V(4).Infof("update pod %s/%s with cluster labels %v successfully, TidbCluster: %s", ns, podName, labels, tcName)
return nil
})

err = retry.RetryOnConflict(retry.DefaultBackoff, func() error {
_, err = rpc.kubeCli.CoreV1().PersistentVolumes().Update(pv)
if err != nil {
glog.Errorf("failed to update PV: [%s], TidbCluster %s/%s, error: %v", pvName, ns, tcName, err)
return err
}
glog.V(4).Infof("PV: [%s] updated successfully, TidbCluster: %s/%s", pvName, ns, tcName)
return nil
})

err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
_, err := rpc.kubeCli.CoreV1().PersistentVolumeClaims(ns).Update(pvc)
if err != nil {
glog.Errorf("failed to update PVC: [%s/%s], TidbCluster: %s, error: %v", ns, pvcName, tcName, err)
return err
}
glog.V(4).Infof("update PVC: [%s/%s] successfully, TidbCluster: %s", ns, pvcName, tcName)
return nil
})

err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {
_, updateErr := sc.kubeCli.CoreV1().Services(tc.Namespace).Update(svc)
if updateErr == nil {
return nil
}
if updated, err := sc.svcLister.Services(tc.Namespace).Get(svc.Name); err != nil {
svc = updated.DeepCopy()
} else {
utilruntime.HandleError(fmt.Errorf("error getting updated Service %s/%s from lister: %v", tc.Namespace, svc.Name, err))
}
return updateErr
})

Scaling does not work with tidb:latest

I deployed a GKE cluster using the tutorial, with one exception:

helm install ./charts/tidb-cluster -n tidb --namespace=tidb --set pd.storageClassName=pd-ssd,tikv.storageClassName=pd-ssd,pd.image=pingcap/pd:latest,tidb.image=pingcap/tidb:latest,tikv.image=pingcap/tikv:latest

This step no longer works:

helm upgrade tidb charts/tidb-cluster --set pd.storageClassName=pd-ssd,tikv.storageClassName=pd-ssd,tikv.replicas=5

The nodes are in CrashLoopBackOff status.

Remove noisy error log message

When we don't set the resources for TidbCluster, tidb-operator will print these noisy error messages:

E0824 07:53:15.726431       1 util.go:161] failed to parse CPU resource  to quantity: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0824 07:53:15.726524       1 util.go:166] failed to parse memory resource  to quantity: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0824 07:53:15.772255       1 util.go:161] failed to parse CPU resource  to quantity: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'
E0824 07:53:15.773356       1 util.go:166] failed to parse memory resource  to quantity: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'

We should remove these noisy error messages when the resource is empty.

Repeated creation of namespace tidb-operator-e2e when following CONTRIBUTING.md

I am told to create namespace tidb-operator-e2e in CONTRIBUTING.md (https://github.com/pingcap/tidb-operator/blob/master/docs/CONTRIBUTING.md#run-e2e-tests), while the namespace tidb-operator-e2e has been created in dind-cluster-v1.10.sh before (https://github.com/pingcap/tidb-operator/blob/master/manifests/local-dind/dind-cluster-v1.10.sh#L1885), so manual creation may report AlreadyExistsError.
The namespace creation is redundant. Is it necessary to remove one of them?

Add revive linter to CI

We introduced Revive in # 12, but since the lint is broken, we haven't added it to our CI. So we should fix the lint problems and add lint to CI to force checking coding style.

PD can't be added to existing cluster.

When I scaled the PD cluster, the new PD member may not be added to existing cluster some time.

2018/10/17 07:03:02.799 util.go:62: [info] Welcome to Placement Driver (PD).
2018/10/17 07:03:02.799 util.go:63: [info] Release Version: v2.0.5
2018/10/17 07:03:02.799 util.go:64: [info] Git Commit Hash: b64716707b7279a4ae822be767085ff17b5f3fea
2018/10/17 07:03:02.800 util.go:65: [info] Git Branch: release-2.0
2018/10/17 07:03:02.800 util.go:66: [info] UTC Build Time:  2018-09-07 12:34:46
2018/10/17 07:03:02.800 metricutil.go:83: [info] disable Prometheus push client
2018/10/17 07:03:02.835 main.go:73: [fatal] join error etcdserver: unhealthy cluster
/home/jenkins/workspace/build_pd_2.0/go/src/github.com/pingcap/pd/pkg/etcdutil/etcdutil.go:82:
/home/jenkins/workspace/build_pd_2.0/go/src/github.com/pingcap/pd/server/join.go:119:

There is a no-name member:

{
  "header": {
    "cluster_id": 6612926505300202376
  },
  "members": [
    {
      "member_id": 1429482795436666386,
      "peer_urls": [
        "http://demo-pd-3.demo-pd-peer.tidb.svc:2380"
      ]
    },
    {
      "name": "demo-pd-2",
      "member_id": 3630075772193212308,
      "peer_urls": [
        "http://demo-pd-2.demo-pd-peer.tidb.svc:2380"
      ],
      "client_urls": [
        "http://demo-pd-2.demo-pd-peer.tidb.svc:2379"
      ]
    },
    {
      "name": "demo-pd-0",
      "member_id": 13229517333287924650,
      "peer_urls": [
        "http://demo-pd-0.demo-pd-peer.tidb.svc:2380"
      ],
      "client_urls": [
        "http://demo-pd-0.demo-pd-peer.tidb.svc:2379"
      ]
    },
    {
      "name": "demo-pd-1",
      "member_id": 13812728483101200181,
      "peer_urls": [
        "http://demo-pd-1.demo-pd-peer.tidb.svc:2380"
      ],
      "client_urls": [
        "http://demo-pd-1.demo-pd-peer.tidb.svc:2379"
      ]
    }
  ],
  "leader": {
    "name": "demo-pd-2",
    "member_id": 3630075772193212308,
    "peer_urls": [
      "http://demo-pd-2.demo-pd-peer.tidb.svc:2380"
    ],
    "client_urls": [
      "http://demo-pd-2.demo-pd-peer.tidb.svc:2379"
    ]
  },
  "etcd_leader": {
    "name": "demo-pd-2",
    "member_id": 3630075772193212308,
    "peer_urls": [
      "http://demo-pd-2.demo-pd-peer.tidb.svc:2380"
    ],
    "client_urls": [
      "http://demo-pd-2.demo-pd-peer.tidb.svc:2379"
    ]
  }
}

When i deleted this no-name member, it works:

$ curl -XDELETE 10.103.241.52:2379/pd/api/v1/members/id/1429482795436666386

cant find demo-tidb and demo-tikv

I follow the tutorial https://github.com/pingcap/tidb-operator/blob/master/docs/local-dind-tutorial.md to deploy a demo cluster, but finally can't find demo-tidb demo-tikv two resources, how can I create them manually?
here is my resource:

root@kube-master:/tidb-operator# kubectl get po -o wide --all-namespaces
NAMESPACE     NAME                                      READY     STATUS      RESTARTS   AGE       IP           NODE
kube-system   etcd-kube-master                          1/1       Running     0          32m       172.18.0.2   kube-master
kube-system   kube-apiserver-kube-master                1/1       Running     0          32m       172.18.0.2   kube-master
kube-system   kube-controller-manager-kube-master       1/1       Running     0          32m       172.18.0.2   kube-master
kube-system   kube-dns-64d6979467-kxlp2                 3/3       Running     0          33m       10.244.0.4   kube-master
kube-system   kube-flannel-ds-amd64-bjsmh               1/1       Running     0          33m       172.18.0.5   kube-node-3
kube-system   kube-flannel-ds-amd64-ctg7s               1/1       Running     0          33m       172.18.0.2   kube-master
kube-system   kube-flannel-ds-amd64-flzrw               1/1       Running     0          33m       172.18.0.4   kube-node-2
kube-system   kube-flannel-ds-amd64-w6nzt               1/1       Running     0          33m       172.18.0.3   kube-node-1
kube-system   kube-proxy-c7j87                          1/1       Running     0          33m       172.18.0.2   kube-master
kube-system   kube-proxy-dgxqs                          1/1       Running     0          33m       172.18.0.3   kube-node-1
kube-system   kube-proxy-jwwb7                          1/1       Running     0          33m       172.18.0.4   kube-node-2
kube-system   kube-proxy-w95hg                          1/1       Running     0          33m       172.18.0.5   kube-node-3
kube-system   kube-scheduler-kube-master                1/1       Running     0          32m       172.18.0.2   kube-master
kube-system   kubernetes-dashboard-68ddc89549-wjdz8     1/1       Running     0          33m       10.244.0.3   kube-master
kube-system   local-volume-provisioner-6kc27            1/1       Running     0          31m       10.244.2.3   kube-node-2
kube-system   local-volume-provisioner-d9d2q            1/1       Running     0          31m       10.244.1.3   kube-node-1
kube-system   local-volume-provisioner-gkh2q            1/1       Running     0          31m       10.244.3.3   kube-node-3
kube-system   registry-proxy-6snfq                      1/1       Running     0          31m       172.18.0.5   kube-node-3
kube-system   registry-proxy-gp67x                      1/1       Running     0          33m       172.18.0.4   kube-node-2
kube-system   registry-proxy-ssfw5                      1/1       Running     0          33m       172.18.0.2   kube-master
kube-system   registry-proxy-vw5g5                      1/1       Running     0          31m       172.18.0.3   kube-node-1
kube-system   tiller-deploy-7db5f8577d-stpdp            1/1       Running     0          27m       10.244.3.4   kube-node-3
tidb-admin    tidb-controller-manager-bcc66f746-mpkvj   1/1       Running     0          26m       10.244.2.4   kube-node-2
tidb          demo-monitor-6d549d487b-t6hsb             2/2       Running     0          23m       10.244.3.6   kube-node-3
tidb          demo-monitor-configurator-t7d2t           0/1       Completed   0          23m       10.244.2.6   kube-node-2
tidb          demo-pd-0                                 1/1       Running     1          23m       10.244.2.7   kube-node-2
tidb          demo-pd-1                                 1/1       Running     0          22m       10.244.1.6   kube-node-1
root@kube-master:/tidb-operator# helm list
NAME         	REVISION	UPDATED                 	STATUS  	CHART              	NAMESPACE
demo-tidb    	1       	Sat Sep 29 12:42:11 2018	DEPLOYED	tidb-cluster-0.1.0 	tidb
tidb-operator	1       	Sat Sep 29 12:38:54 2018	DEPLOYED	tidb-operator-0.1.0	tidb-admin

Google Cloud Shell tools may be out of date

It might make sense to add the following command to the start of the GKE tutorial:

sudo gcloud components update

In my case this updated me from kubectl 1.9.7 to 1.10.7. I didn't realize updates were available.

TiKV pod sidecar container pushgateway image in values.yaml is never used

TiKV pod sidecar container pushgateway in charts/tidb-cluster/values.yaml is not rendered in charts/tidb-cluster/templates/tidb-cluster.yaml.

tikvPromGateway:
image: prom/pushgateway:v0.3.1
# limits:
# cpu: 100m
# memory: 100Mi
# requests:
# cpu: 50m
# memory: 50Mi

This section needs to be added in charts/tidb-cluster/templates/tidb-cluster.yaml.

Besides the pushgateway image is never used in TiKV pod spec because controller.GetPushgatewayImage function is wrong.

Image: controller.GetPushgatewayImage(tc),

TiDB automatic failover

When a TiDB peer can't response for a while (can be obtained from tidb instance's /status API), operator should:

  1. increase the replicas to add a new TiDB peer
  2. decrease the replicas when all members are ready again

part of #47

PD automatic failover

When a PD peer is not health for a while (this can be obtained from PD Cluster's /pd/health API), operator should:

  1. mark this peer as failure
  2. invoke deleteMember api to delete this member from the pd cluster
  3. increase the replicas to add a new PD peer
  4. try to delete the PVC and Pod of this PD peer over and over, let StatefulSet create the new PD peer with the same ordinal, but not use the tombstone PV
  5. decrease the replicas when all members are ready again

part of #47

pd never upgrade when pd server is unavailable

Some times because of image version error or other misconfiguration, cause pd members starts up failed and the entire pd server is unavailable.

At this time, we hope to recover pd server by automatic upgrade, but according to the current pd manage process, the status synchronization of pd will return error and exit directly. As a result, the subsequent upgrade cannot be executed, the cluster will always block in and cannot be automatically restored except deleted pd statefulSet manually.

In order to avoid the above situation, we must modify the process of pd, do not exit directly when the synchronization pd status fails, but save and pass the flag of whether the state synchronization is successful, and continue to execute the following logic.

[Discussion] Potential name conflict `controller-manager`

Hi,

I am a TiDB contributor and interested in the development of TiDB operator. When I looked through the code, I found that the package name in cmd/ is controller-manager, while controller-manager has special meaning in Kubernetes community, IMO. cloud-controller-manager and controller-manager in K8s manages multiple controllers, and our CLI only creates one controller for tidbcluster CRD. Then I think it is better to name it to tidb-operator or tidb.

Failover feature improvement

Now, we will increase spec.Replicas when failure occurs. This is not the best practice. So we should:

  • don't need to modify the spec.PD.Replicas to achieve the failover feature
  • use RequeueError instead of update TidbCluster immediately

PR opened: #95

Tidb auto failover always add member when pumb failed

current the tidb auto-failover always add member if the pumb failed , but the tidb is still unvailable.

demo-tidb-0                       1/1       Running            2          4h
demo-tidb-1                       1/1       Running            2          4h
demo-tidb-10                      1/1       Running            2          2h
demo-tidb-11                      1/1       Running            2          2h
demo-tidb-12                      1/1       Running            2          2h
demo-tidb-13                      1/1       Running            2          2h
demo-tidb-14                      1/1       Running            2          2h
demo-tidb-15                      1/1       Running            2          2h
demo-tidb-16                      1/1       Running            2          2h
demo-tidb-17                      1/1       Running            2          2h
demo-tidb-18                      1/1       Running            2          2h
demo-tidb-19                      1/1       Running            2          2h
demo-tidb-2                       1/1       Running            2          3h
demo-tidb-20                      1/1       Running            2          1h
demo-tidb-21                      1/1       Running            2          1h
demo-tidb-22                      1/1       Running            2          1h
demo-tidb-23                      1/1       Running            2          1h
demo-tidb-24                      1/1       Running            2          1h
demo-tidb-25                      1/1       Running            2          1h
demo-tidb-26                      1/1       Running            2          1h
demo-tidb-27                      1/1       Running            0          20m
demo-tidb-28                      1/1       Running            0          20m
demo-tidb-29                      1/1       Running            0          15m
demo-tidb-3                       1/1       Running            2          3h
demo-tidb-30                      1/1       Running            0          15m
demo-tidb-31                      1/1       Running            0          10m
demo-tidb-32                      1/1       Running            1          10

operator need to identify the situation where tidb is unavailable due to pumb failure and in the situation auto-failover feature will stop.
in the other hand, tidb/tikv/pd should to limit auto-failover number, avoid unrestricted increase of members

Why do not create crd tidbclusters in code automatically?

During the process of deploying tidb-operator according to docs, i found that crd tidbclusters must be applied manually. Why do not create the crd automatically in code, just like what did in etcd-operator and prometheus-operator?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.