Code Monkey home page Code Monkey logo

ha-cockroachdb's Introduction

HA Cockroach DB

OCP 4.9.9 on AWS Based on HA Cockroach DB

TOC

Considerations

Since this test involves stopping instances in AWS, we have to be confident that the pod that we are sending requests from is not on an instance that goes down. Due to this, we will be deployering the crdb-tester pod with tolerations and a nodeSelector to schedule it on the master node.

Test 0:

OCP Install with CockroachDB, fail nodes

Scenerio: When we deploy a StatefulSet of 3 replicas and stop a node are we still able to read and write from the database?

Scenerio: When we deploy a StatefulSet of 3 replicas and stop two nodes are we still able to read and write from the database?

Setup:

Install CockroachDB

helm install cockroachdb -n cockroachdb charts/cockroachdb

Insert data into CockroachDB

kubectl exec -it pod/cockroachdb-0 -n cockroachdb -c cockroachdb -- cockroach sql --insecure --execute="CREATE TABLE roaches (name STRING, country STRING); INSERT INTO roaches VALUES ('American Cockroach', 'United States'), ('Brownbanded Cockroach', 'United States')"

Create a pod to communicate to the cockroach service

helm install crdb-tester charts/tester-pod

Stop 1 Instance in AWS
Write Test

kubectl exec -it crdb-tester -- cockroach sql --insecure --host=cockroachdb-public.cockroachdb.svc.cluster.local:26257 --execute="INSERT INTO roaches VALUES ('A', 'Apple'), ('B', 'Banana')"

output:

INSERT 2


Time: 860ms

Read Test

kubectl exec -it crdb-tester -- cockroach sql --insecure --host=cockroachdb-public.cockroachdb.svc.cluster.local:26257 --execute="SELECT * FROM roaches;"

output:

          name          |    country
------------------------+----------------
  American Cockroach    | United States
  Brownbanded Cockroach | United States
  A                     | Apple
  B                     | Banana
(4 rows)


Time: 7ms

Stop Another Instance in AWS
Write Test

kubectl exec -it crdb-tester -- cockroach sql --insecure --host=cockroachdb-public.cockroachdb.svc.cluster.local:26257 --execute="INSERT INTO roaches VALUES ('C', 'Candy'), ('D', 'Donut')"

output

ERROR: cannot dial server.
Is the server running?
If the server is running, check --host client-side and --advertise server-side.

dial tcp 172.30.252.79:26257: connect: no route to host
Failed running "sql"
command terminated with exit code 1

Read Test

kubectl exec -it crdb-tester -- cockroach sql --insecure --host=cockroachdb-public.cockroachdb.svc.cluster.local:26257 --execute="SELECT * FROM roaches;"

output

ERROR: cannot dial server.
Is the server running?
If the server is running, check --host client-side and --advertise server-side.

dial tcp 172.30.252.79:26257: connect: no route to host
Failed running "sql"
command terminated with exit code 1

Cleanup:

Start all instances.

Install CockroachDB

helm uninstall cockroachdb -n cockroachdb 

Make sure everything is gone

kubectl get pod,pvc -n cockroachdb 
kubectl delete pods,pvc --all -n cockroachdb

Delete tester pod

helm uninstall crdb-tester

Test 1

OCP Install with Node Health Check and Poison Pill

Scenerio: When the node is lost poison pill remediation is created and marks the node as scheduling disabled.

Setup:

Install CockroachDB

helm install cockroachdb -n cockroachdb charts/cockroachdb

Insert data into CockroachDB

kubectl exec -it pod/cockroachdb-0 -n cockroachdb -c cockroachdb -- cockroach sql --insecure --execute="CREATE TABLE roaches (name STRING, country STRING); INSERT INTO roaches VALUES ('American Cockroach', 'United States'), ('Brownbanded Cockroach', 'United States')"

Install NodeHealthCheck and PoisonPill

helm install nhc -n openshift-operators charts/nhc-operator 

Install Node Health Check and Poison Pill Configration

helm install node-health-check -n openshift-operators charts/node-health-check

Create a pod to communicate to the cockroach service

helm install crdb-tester charts/tester-pod

Replace autogenerated PoisonPillConfig

kubectl replace -f -<<EOF
apiVersion: poison-pill.medik8s.io/v1alpha1
kind: PoisonPillConfig
metadata:
  name: poison-pill-config
  namespace: openshift-operators
spec:
  apiCheckInterval: 15s
  apiServerTimeout: 5s
  isSoftwareRebootEnabled: true
  maxApiErrorThreshold: 3
  peerApiServerTimeout: 5s
  peerDialTimeout: 5s
  peerRequestTimeout: 5s
  peerUpdateInterval: 15m
  safeTimeToAssumeNodeRebootedSeconds: 10
  watchdogFilePath: /dev/watchdog
EOF

Stop Instances in AWS
Write Test

kubectl exec -it crdb-tester -- cockroach sql --insecure --host=cockroachdb-public.cockroachdb.svc.cluster.local:26257 --execute="INSERT INTO roaches VALUES ('C', 'Candy'), ('D', 'Donut')"

output

INSERT 2


Time: 22ms

Read Test

kubectl exec -it crdb-tester -- cockroach sql --insecure --host=cockroachdb-public.cockroachdb.svc.cluster.local:26257 --execute="SELECT * FROM roaches;"

output

          name          |    country
------------------------+----------------
  American Cockroach    | United States
  Brownbanded Cockroach | United States
  C                     | Candy
  D                     | Donut
(4 rows)


Time: 352ms

Cleanup:

Start all instances.

Install CockroachDB

helm uninstall cockroachdb -n cockroachdb 

Make sure everything is gone

kubectl get pod,pvc -n cockroachdb 

kubectl delete pod,pvc -n cockroachdb --all

Uninstall Install Node Health Check and Poison Pill Configration

helm uninstall node-health-check -n openshift-operators

Uninstall NodeHealthCheck and PoisonPill

helm uninstall nhc -n openshift-operators 

Delete tester pod

helm uninstall crdb-tester

Uninstall CSVs for NodeHealthCheck and PoisonPill

kubectl delete csv -n openshift-operators node-healthcheck-operator.v0.1.0 
kubectl delete csv -n openshift-operators poison-pill.v0.2.0 

kubectl delete ds poison-pill-ds -n openshift-operators

Test 2

OCP Install with Machine Health Check and Poison Pill

Scenerio: A node is not able to reach the api server due to transient failure. The poison pill remediation marked the node as SchedulingDisabled. But once the node was back, the node was eventually marked as Ready. The pod was rerun in the same node.

Setup:

Install CockroachDB

helm install cockroachdb -n cockroachdb charts/cockroachdb

Insert data into CockroachDB

kubectl exec -it pod/cockroachdb-0 -n cockroachdb -c cockroachdb -- cockroach sql --insecure --execute="CREATE TABLE roaches (name STRING, country STRING); INSERT INTO roaches VALUES ('American Cockroach', 'United States'), ('Brownbanded Cockroach', 'United States')"

Install MachineHealthCheck and PoisonPill

helm install mch -n openshift-operators charts/machine-health-check 

Install PoisonPillRemediationTemplate

kubectl apply -f -<<EOF
apiVersion: poison-pill.medik8s.io/v1alpha1
kind: PoisonPillRemediationTemplate
metadata:
  namespace: openshift-machine-api
  name: poison-pill-default-template
spec:
  template:
    spec: {}
EOF

Create a pod to communicate to the cockroach service

helm install crdb-tester charts/tester-pod

Replace autogenerated PoisonPillConfig

kubectl replace -f -<<EOF
apiVersion: poison-pill.medik8s.io/v1alpha1
kind: PoisonPillConfig
metadata:
  name: poison-pill-config
  namespace: openshift-operators
spec:
  apiCheckInterval: 15s
  apiServerTimeout: 5s
  isSoftwareRebootEnabled: true
  maxApiErrorThreshold: 3
  peerApiServerTimeout: 5s
  peerDialTimeout: 5s
  peerRequestTimeout: 5s
  peerUpdateInterval: 15m
  safeTimeToAssumeNodeRebootedSeconds: 10
  watchdogFilePath: /dev/watchdog
EOF

Stop Instance in AWS
Write Test

kubectl exec -it crdb-tester -- cockroach sql --insecure --host=cockroachdb-public.cockroachdb.svc.cluster.local:26257 --execute="INSERT INTO roaches VALUES ('C', 'Candy'), ('D', 'Donut')"

output

INSERT 2


Time: 22ms

Read Test

kubectl exec -it crdb-tester -- cockroach sql --insecure --host=cockroachdb-public.cockroachdb.svc.cluster.local:26257 --execute="SELECT * FROM roaches;"

output

          name          |    country
------------------------+----------------
  American Cockroach    | United States
  Brownbanded Cockroach | United States
  C                     | Candy
  D                     | Donut
(4 rows)


Time: 2ms

Cleanup:

Start all instances.

Uninstall CockroachDB

helm uninstall cockroachdb -n cockroachdb 

Make sure everything is gone

kubectl get pod,pvc -n cockroachdb 

kubectl delete pod,pvc -n cockroachdb --all

Uninstall MachineHealthCheck and PoisonPill

helm uninstall mch -n openshift-operators 

Delete pod

helm uninstall crdb-tester

Uninstall CSV for PoisonPill

kubectl delete csv -n openshift-operators poison-pill.v0.2.0 

kubectl delete ds poison-pill-ds -n openshift-operators

Uninstall PoisonPillRemediationTemplate

kubectl delete -f -<<EOF
apiVersion: poison-pill.medik8s.io/v1alpha1
kind: PoisonPillRemediationTemplate
metadata:
  namespace: openshift-machine-api
  name: poison-pill-default-template
spec:
  template:
    spec: {}
EOF

Remove PoisonPillConfig

kubectl delete -f -<<EOF
apiVersion: poison-pill.medik8s.io/v1alpha1
kind: PoisonPillConfig
metadata:
  name: poison-pill-config
  namespace: openshift-operators
spec:
  apiCheckInterval: 15s
  apiServerTimeout: 5s
  isSoftwareRebootEnabled: true
  maxApiErrorThreshold: 3
  peerApiServerTimeout: 5s
  peerDialTimeout: 5s
  peerRequestTimeout: 5s
  peerUpdateInterval: 15m
  safeTimeToAssumeNodeRebootedSeconds: 10
  watchdogFilePath: /dev/watchdog
EOF

Commands

Watch pod name, node of pod, and pod status in cockroachdb

kubectl get pod -o=custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase -n cockroachdb -w

Watch PoisonPillRemediation in all namespaces

kubectl get ppr -A -w

Watch nodes in cockroachdb

kubectl get nodes -n cockroachdb -w

Debug

Quickly check the controller managers

k logs deploy/poison-pill-controller-manager -n openshift-operators -c manager      


k logs deploy/node-healthcheck-operator-controller-manager -n openshift-operators -c manager

ha-cockroachdb's People

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.