gluster / gcs Goto Github PK
View Code? Open in Web Editor NEWCheck github.com/heketi, github.com/gluster/gluster-containers, or github.com/kadalu/kadalu as active alternatives
Home Page: https://gluster.org
License: Apache License 2.0
Check github.com/heketi, github.com/gluster/gluster-containers, or github.com/kadalu/kadalu as active alternatives
Home Page: https://gluster.org
License: Apache License 2.0
Failed to Get peers in the cluster
Wednesday 17 October 2018 15:10:05 +0530 (0:00:00.864) 0:12:36.698 *****
changed: [kube1]
TASK [GCS | GD2 Cluster | Set gd2_client_endpoint] *****************************
Wednesday 17 October 2018 15:10:06 +0530 (0:00:00.533) 0:12:37.232 *****
ok: [kube1]
TASK [GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready] **********
Wednesday 17 October 2018 15:10:06 +0530 (0:00:00.206) 0:12:37.439 *****
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (50 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (49 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (48 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (47 retries left).
ok: [kube1]
TASK [GCS | GD2 Cluster | Get peers in cluster] ********************************
Wednesday 17 October 2018 15:10:52 +0530 (0:00:45.728) 0:13:23.168 *****
fatal: [kube1]: FAILED! => {"changed": false, "content": "", "msg": "Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://10.233.32.64:24007/v1/peers"}
-> Created one pvc
-> Mounted that pvc to 100 app pods,
-> app pods creation was taking so much time
-> I have observed for around 15 mints still most of the app pods were in pending state.
-> Then i have logged in to gcs setup after one day.
-> I have observed below status, out of 100 app pods which all are placed under kube1 node went to evicted state after one day. i am suspecting its because of less resource in kube1, if that is the case we have to increase the storage space for kube nodes.
-> I will try to repro on other setup.
NAME READY STATUS RESTARTS AGE
pod/csi-attacher-glusterfsplugin-0 2/2 Running 0 2d18h
pod/csi-nodeplugin-glusterfsplugin-4m9vj 0/2 Evicted 0 7m34s
pod/csi-nodeplugin-glusterfsplugin-clqcf 2/2 Running 0 2d18h
pod/csi-nodeplugin-glusterfsplugin-msn95 2/2 Running 0 2d18h
pod/csi-provisioner-glusterfsplugin-0 2/2 Running 0 2d18h
pod/etcd-2jb6scnxhr 0/1 Evicted 0 20h
pod/etcd-896q4plsbx 0/1 Error 0 20h
pod/etcd-jzw57nfncl 0/1 Evicted 0 20h
pod/etcd-nbcpdpgx6k 0/1 Evicted 0 20h
pod/etcd-operator-7cb5bd459b-cstkq 1/1 Running 0 2d18h
pod/etcd-xpqldh767q 0/1 Completed 0 20h
pod/gluster-kube1-0 0/1 Pending 0 20h
pod/gluster-kube2-0 1/1 Running 224 2d18h
pod/gluster-kube3-0 1/1 Running 0 2d18h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/etcd ClusterIP None <none> 2379/TCP,2380/TCP 2d18h
service/etcd-client ClusterIP 10.233.33.39 <none> 2379/TCP 2d18h
service/glusterd2 ClusterIP None <none> 24007/TCP,24008/TCP 2d18h
service/glusterd2-client ClusterIP 10.233.12.238 <none> 24007/TCP 2d18h
service/glusterd2-client-nodeport NodePort 10.233.5.36 <none> 24007:31007/TCP 2d18h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/csi-nodeplugin-glusterfsplugin 3 3 2 3 2 <none> 2d18h
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/etcd-operator 1 1 1 1 2d18h
NAME DESIRED CURRENT READY AGE
replicaset.apps/etcd-operator-7cb5bd459b 1 1 1 2d18h
NAME DESIRED CURRENT AGE
statefulset.apps/csi-attacher-glusterfsplugin 1 1 2d18h
statefulset.apps/csi-provisioner-glusterfsplugin 1 1 2d18h
statefulset.apps/gluster-kube1 1 1 2d18h
statefulset.apps/gluster-kube2 1 1 2d18h
statefulset.apps/gluster-kube3 1 1 2d18h```
```[vagrant@kube1 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/atomicos-root 6.8G 6.3G 515M 93% /
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 1.9G 4.0M 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/vda1 297M 95M 202M 33% /boot
tmpfs 379M 0 379M 0% /run/user/1000
Now that people are starting to try out GCS, it would be helpful to have a document that walks them through various troubleshooting steps in case they encounter problems.
Topics:
TASK [GCS | Prometheus Operator | Wait for the Prometheus Operator to become ready] ***
Tuesday 13 November 2018 11:09:55 +0530 (0:00:01.169) 0:20:22.186 ******
FAILED - RETRYING: GCS | Prometheus Operator | Wait for the Prometheus Operator to become ready (50 retries left).
FAILED - RETRYING: GCS | Prometheus Operator | Wait for the Prometheus Operator to become ready (49 retries left).
changed: [kube1]
TASK [GCS | Prometheus Objects | Deploy services, ServiceMonitor and Prometheus Object] ***
Tuesday 13 November 2018 11:10:17 +0530 (0:00:22.256) 0:20:44.443 ******
fatal: [kube1]: FAILED! => {"changed": false, "msg": "error running kubectl (/usr/local/bin/kubectl apply --force --filename=/tmp/gcs-manifestsIsZBxI/gcs-prometheus-bundle.yml) command (rc=1), out='serviceaccount/prometheus created\nclusterrole.rbac.authorization.k8s.io/prometheus created\nclusterrolebinding.rbac.authorization.k8s.io/prometheus created\nservice/prometheus created\n', err='unable to recognize \"/tmp/gcs-manifestsIsZBxI/gcs-prometheus-bundle.yml\": no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"\nunable to recognize \"/tmp/gcs-manifestsIsZBxI/gcs-prometheus-bundle.yml\": no matches for kind \"Prometheus\" in version \"monitoring.coreos.com/v1\"\n'"}
to retry, use: --limit @/home/github.com/gluster/gcs/deploy/vagrant-playbook.retry
PLAY RECAP *********************************************************************
We need to document how we plan to integrate gluster-block w/ GCS in order to identify gaps that need to be closed for GCS-1.0.
Gluster pods are no longer DaemonSets, nor using host networking. Once we switch to block PVs, we can remove the nodeAffitity that locks the pods to a specific node. My understanding is that movable pods will not work for gluster-block currently.
Steps performed:-
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-45snb 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-pgp2w 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-s8g76 2/2 Running 0 2d21h
csi-provisioner-glusterfsplugin-0 2/2 Running 0 2d21h
etcd-4m7wv5fqk2 1/1 Running 0 2d21h
etcd-6mf2nsl2p4 1/1 Running 0 2d21h
etcd-lbmh9xjxm8 1/1 Running 0 2d21h
etcd-operator-7cb5bd459b-tddxt 1/1 Running 0 2d21h
gluster-kube1-0 1/1 Running 1 2d21h
gluster-kube2-0 1/1 Running 0 2d21h
gluster-kube3-0 1/1 Running 0 2d21h
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$ kubectl delete pods -n gcs gluster-kube1-0
pod "gluster-kube1-0" deleted
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-45snb 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-pgp2w 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-s8g76 2/2 Running 0 2d21h
csi-provisioner-glusterfsplugin-0 2/2 Running 0 2d21h
etcd-4m7wv5fqk2 1/1 Running 0 2d21h
etcd-6mf2nsl2p4 1/1 Running 0 2d21h
etcd-lbmh9xjxm8 1/1 Running 0 2d21h
etcd-operator-7cb5bd459b-tddxt 1/1 Running 0 2d21h
gluster-kube1-0 0/1 ContainerCreating 0 5s
gluster-kube2-0 1/1 Running 0 2d21h
gluster-kube3-0 1/1 Running 0 2d21h
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-45snb 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-pgp2w 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-s8g76 2/2 Running 0 2d21h
csi-provisioner-glusterfsplugin-0 2/2 Running 0 2d21h
etcd-4m7wv5fqk2 1/1 Running 0 2d21h
etcd-6mf2nsl2p4 1/1 Running 0 2d21h
etcd-lbmh9xjxm8 1/1 Running 0 2d21h
etcd-operator-7cb5bd459b-tddxt 1/1 Running 0 2d21h
gluster-kube1-0 1/1 Running 0 43s
gluster-kube2-0 1/1 Running 0 2d21h
gluster-kube3-0 1/1 Running 0 2d21h
[vagrant@kube1 ~]$
command terminated with exit code 1
[vagrant@kube1 ~]$ kubectl get pods -n gcs -w
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-45snb 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-pgp2w 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-s8g76 2/2 Running 0 2d21h
csi-provisioner-glusterfsplugin-0 2/2 Running 0 2d21h
etcd-4m7wv5fqk2 1/1 Running 0 2d21h
etcd-6mf2nsl2p4 1/1 Running 0 2d21h
etcd-lbmh9xjxm8 1/1 Running 0 2d21h
etcd-operator-7cb5bd459b-tddxt 1/1 Running 0 2d21h
gluster-kube1-0 1/1 Running 0 2m52s
gluster-kube2-0 1/1 Running 0 2d21h
gluster-kube3-0 1/1 Running 0 2d21h
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$ kubectl -n gcs -it exec gluster-kube1-0 -- /bin/bash
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]# glustercli peer list --endpoints="http://gluster-kube1-0.glusterd2.gcs:24007"
Failed to get Peers list
Failed to connect to glusterd. Please check if
- Glusterd is running(http://gluster-kube1-0.glusterd2.gcs:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]# glustercli volume list --endpoints="http://gluster-kube1-0.glusterd2.gcs:24007"
Error getting volumes list
Failed to connect to glusterd. Please check if
- Glusterd is running(http://gluster-kube1-0.glusterd2.gcs:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]# glustercli volume list --endpoints="http://gluster-kube2-0.glusterd2.gcs:24007"
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
| ID | NAME | TYPE | STATE | TRANSPORT | BRICKS |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
| 6cd58524-5172-4d9e-89ae-414bc338eba6 | pvc-f603ac47dcdc11e8 | Replicate | Started | tcp | 3 |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]# glustercli volume list --endpoints="http://gluster-kube3-0.glusterd2.gcs:24007"
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
| ID | NAME | TYPE | STATE | TRANSPORT | BRICKS |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
| 6cd58524-5172-4d9e-89ae-414bc338eba6 | pvc-f603ac47dcdc11e8 | Replicate | Started | tcp | 3 |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]# glustercli volume list -command terminated with exit code 137usterd2.gcs:24007"
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$ kubectl get pods -n gcs -w
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-45snb 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-pgp2w 2/2 Running 0 2d21h
csi-nodeplugin-glusterfsplugin-s8g76 2/2 Running 0 2d21h
csi-provisioner-glusterfsplugin-0 2/2 Running 0 2d21h
etcd-4m7wv5fqk2 1/1 Running 0 2d21h
etcd-6mf2nsl2p4 1/1 Running 0 2d21h
etcd-lbmh9xjxm8 1/1 Running 0 2d21h
etcd-operator-7cb5bd459b-tddxt 1/1 Running 0 2d21h
gluster-kube1-0 1/1 Running 1 4m
gluster-kube2-0 1/1 Running 0 2d21h
gluster-kube3-0 1/1 Running 0 2d21h
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$ kubectl describe pods -n gcs gluster-kube1-0
Name: gluster-kube1-0
Namespace: gcs
Priority: 0
PriorityClassName: <none>
Node: kube1/192.168.121.7
Start Time: Fri, 02 Nov 2018 05:39:18 +0000
Labels: app.kubernetes.io/component=glusterfs
app.kubernetes.io/name=glusterd2
app.kubernetes.io/part-of=gcs
controller-revision-hash=gluster-kube1-55bc79f94
statefulset.kubernetes.io/pod-name=gluster-kube1-0
Annotations: <none>
Status: Running
IP: 10.233.64.7
Controlled By: StatefulSet/gluster-kube1
Containers:
glusterd2:
Container ID: docker://a261c3bcb84f993948b0691e199396109985d1bd9d547250476168cfd01a9520
Image: docker.io/gluster/glusterd2-nightly
Image ID: docker-pullable://docker.io/gluster/glusterd2-nightly@sha256:06e42f3354bff80a724007dbc5442349c3a53d31eceb935fd6b3776d6cdcb0fa
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 02 Nov 2018 05:43:08 +0000
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Fri, 02 Nov 2018 05:39:48 +0000
Finished: Fri, 02 Nov 2018 05:43:04 +0000
Ready: True
Restart Count: 1
Liveness: http-get http://:24007/ping delay=10s timeout=1s period=60s #success=1 #failure=3
Environment:
GD2_ETCDENDPOINTS: http://etcd-client.gcs:2379
GD2_CLUSTER_ID: 27056e19-500a-4e7a-b5a9-71f461679196
GD2_CLIENTADDRESS: gluster-kube1-0.glusterd2.gcs:24007
GD2_ENDPOINTS: http://gluster-kube1-0.glusterd2.gcs:24007
GD2_PEERADDRESS: gluster-kube1-0.glusterd2.gcs:24008
GD2_RESTAUTH: false
Mounts:
/dev from gluster-dev (rw)
/run/lvm from gluster-lvm (rw)
/sys/fs/cgroup from gluster-cgroup (ro)
/usr/lib/modules from gluster-kmods (ro)
/var/lib/glusterd2 from glusterd2-statedir (rw)
/var/log/glusterd2 from glusterd2-logdir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-8s2lg (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
gluster-dev:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType:
gluster-cgroup:
Type: HostPath (bare host directory volume)
Path: /sys/fs/cgroup
HostPathType:
gluster-lvm:
Type: HostPath (bare host directory volume)
Path: /run/lvm
HostPathType:
gluster-kmods:
Type: HostPath (bare host directory volume)
Path: /usr/lib/modules
HostPathType:
glusterd2-statedir:
Type: HostPath (bare host directory volume)
Path: /var/lib/glusterd2
HostPathType: DirectoryOrCreate
glusterd2-logdir:
Type: HostPath (bare host directory volume)
Path: /var/log/glusterd2
HostPathType: DirectoryOrCreate
default-token-8s2lg:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-8s2lg
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30m default-scheduler Successfully assigned gcs/gluster-kube1-0 to kube1
Warning Unhealthy 27m (x3 over 29m) kubelet, kube1 Liveness probe failed: Get http://10.233.64.7:24007/ping: dial tcp 10.233.64.7:24007: connect: connection refused
Normal Pulling 26m (x2 over 30m) kubelet, kube1 pulling image "docker.io/gluster/glusterd2-nightly"
Normal Killing 26m kubelet, kube1 Killing container with id docker://glusterd2:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 26m (x2 over 30m) kubelet, kube1 Successfully pulled image "docker.io/gluster/glusterd2-nightly"
Normal Created 26m (x2 over 30m) kubelet, kube1 Created container
Normal Started 26m (x2 over 30m) kubelet, kube1 Started container
[vagrant@kube1 ~]$
If we want to deploy multiple gcs setups on same hypervisor, we need to edit vagrant file for providing new vm-names.
I think we can have some better option to provide vm-names, if we want to deploy multiple gcs setups on same hypervisor.
we need to add DRIVER_REG_SOCK_PATH
field in node plugin deploy CSI template to work with kubernetes 1.12.
Description:
Delete gd2 container from gcs setup, deleted gd2 container status showing as offline in gluster peer status.
How to repro:
-> Once gcs setup is ready, all peers will be in online state.
-> Delete any one gd2 container from gcs setup or reboot any worker node
-> GD2 container will be deleted and new GD2 container will get create automatically.
-> Then verify gluster peer status, newly created GD2 container showing status as online.
-> But deleted GD2 container status showing in gluster peer status as offline and if i am trying to remove that peer, not able to remove that peer as it is in offline state.
output:
+--------------------------------------+-------------------------+-------------------+-------------------+--------+-----+
| ID | NAME | CLIENT ADDRESSES | PEER ADDRESSES | ONLINE | PID |
+--------------------------------------+-------------------------+-------------------+-------------------+--------+-----+
| 00b3b28e-f945-4f54-8b28-5d3af879716c | glusterd2-cluster-sqwrd | 127.0.0.1:24007 | 10.244.2.7:24008 | no | |
| | | 10.244.2.7:24007 | | | |
| 125d50df-10fc-4b1d-97fb-ff0da39f5370 | glusterd2-cluster-59gjl | 127.0.0.1:24007 | 10.244.1.13:24008 | no | |
| | | 10.244.1.13:24007 | | | |
| 1b099128-3374-4582-a692-1a94f69e874a | glusterd2-cluster-cbqbx | 127.0.0.1:24007 | 10.244.3.5:24008 | no | |
| | | 10.244.3.5:24007 | | | |
| 2b6d33c7-64f7-4a65-a3b8-90dec34f39a2 | glusterd2-cluster-2qcgw | 127.0.0.1:24007 | 10.244.3.8:24008 | no | |
| | | 10.244.3.8:24007 | | | |
| 56205100-47d2-4762-9706-c1094c2bff34 | glusterd2-cluster-4wkgd | 127.0.0.1:24007 | 10.244.1.12:24008 | no | |
| | | 10.244.1.12:24007 | | | |
| 5e0eebab-3978-4a1d-b9e9-08bc7917a329 | glusterd2-cluster-sqwrd | 127.0.0.1:24007 | 10.244.2.13:24008 | yes | 23 |
| | | 10.244.2.13:24007 | | | |
| 7c217032-6c70-4964-8ec1-b25b006b4530 | glusterd2-cluster-59gjl | 127.0.0.1:24007 | 10.244.1.13:24008 | yes | 23 |
| | | 10.244.1.13:24007 | | | |
| 8dc69b03-d4f6-4b17-abea-4588da0ba844 | glusterd2-cluster-2qcgw | 127.0.0.1:24007 | 10.244.3.8:24008 | yes | 24 |
| | | 10.244.3.8:24007 | | | |
| 8f1df06d-131c-4827-9e01-0dc70e051901 | glusterd2-cluster-28tw6 | 127.0.0.1:24007 | 10.244.1.5:24008 | no | |
| | | 10.244.1.5:24007 | | | |
| e6dffc73-4aa3-4f77-b999-341c10d5b126 | glusterd2-cluster-8gxnz | 127.0.0.1:24007 | 10.244.2.4:24008 | no | |
| | | 10.244.2.4:24007 | | | |
+--------------------------------------+-------------------------+-------------------+-------------------+--------+-----+
currently, we are using kubespray for deployment of kubernetes
cons of using this are:
if we make use of current deployment scripts used for heketi-gd1 (https://github.com/gluster/gluster-kubernetes)
pros:
@atinmu @kshlm @JohnStrunk @obnoxxx want to know your thoughts on this.
-> Create two PVCs (PVC1, PVC2)
-> Mounted two PVCs to one app pod and ran I/O's on mount point.
-> Again above two PVCs mounted to 3 replica controller app pods and ran I/O's on both the mount points.
-> Deleted one replica controller app pod, rc app pod came up automatically with same mount points and no data loss found.
-> Then tried delete PVC1 which is in mounted state, PVC1 status went to terminating state.
-> Now deleted all app pods. then PVC1 deleted successfully.
-> After that one of my worker node went some bad state don't know the reason, which all pods are placed on that worker node went to below state.
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 3d
csi-nodeplugin-glusterfsplugin-btllk 2/2 Running 0 3d
csi-nodeplugin-glusterfsplugin-j6w8j 2/2 NodeLost 0 3d
csi-nodeplugin-glusterfsplugin-mvthq 2/2 Running 0 3d
csi-provisioner-glusterfsplugin-0 2/2 Unknown 0 3d
etcd-47vhqc75rl 1/1 Running 0 1h
etcd-bvvmmb7kzn 1/1 Unknown 0 4d
etcd-m4lskms5fb 1/1 Running 0 4d
etcd-njxn5qsr7h 1/1 Running 0 4d
etcd-operator-989bf8569-kctcd 1/1 Running 0 4d
glusterd2-cluster-2qcgw 1/1 Running 1 3d
glusterd2-cluster-59gjl 1/1 Running 1 3d
glusterd2-cluster-sqwrd 1/1 NodeLost 0 3d
-> Then i have rebooted the worker node, now node is up with proper condition, and which all pods are placed on this worker node came to running state.
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 3d
csi-nodeplugin-glusterfsplugin-btllk 2/2 Running 0 3d
csi-nodeplugin-glusterfsplugin-j6w8j 2/2 Running 2 3d
csi-nodeplugin-glusterfsplugin-mvthq 2/2 Running 0 3d
csi-provisioner-glusterfsplugin-0 2/2 Running 0 21m
etcd-47vhqc75rl 1/1 Running 0 2h
etcd-m4lskms5fb 1/1 Running 0 4d
etcd-njxn5qsr7h 1/1 Running 0 4d
etcd-operator-989bf8569-kctcd 1/1 Running 0 4d
glusterd2-cluster-2qcgw 1/1 Running 1 3d
glusterd2-cluster-59gjl 1/1 Running 1 3d
glusterd2-cluster-sqwrd 1/1 Running 6 3d
-> Logged in to one of the gd2 pod and verified existing volume status. all bricks are in offline state.
-> Now verified PVC status
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
glusterfs-csi-pv2 Bound pvc-0f903cb5bfe111e8 2Gi RWX glusterfs-csi 1d
-> Then deleted PVC successfully.
persistentvolumeclaim "glusterfs-csi-pv2" deleted
No resources found.
-> Verified pv status, PV not deleted.
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-0f903cb5bfe111e8 2Gi RWX Delete Released default/glusterfs-csi-pv2 glusterfs-csi 1d
-> Again logged into gd2 container, to verify volume exist or not .
-> volume is listing and volume state is STARTED state like below
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
| ID | NAME | TYPE | STATE | TRANSPORT | BRICKS |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
| 9b92d49c-4480-487e-a307-786aea601af1 | pvc-0f903cb5bfe111e8 | Replicate | Started | tcp | 3 |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
-> Now verified volume status, volume status is showing with bricks offline.
Volume : pvc-0f903cb5bfe111e8
+--------------------------------------+-------------+---------------------------------------------------------------------+--------+------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------+---------------------------------------------------------------------+--------+------+-----+
| 80f62a24-f830-46a0-84b5-865bf1304fe3 | 10.244.2.7 | /var/run/glusterd2/bricks/pvc-0f903cb5bfe111e8/subvol1/brick1/brick | false | 0 | 0 |
| a912e11f-6e61-490f-b7a9-227ec11299d3 | 10.244.1.13 | /var/run/glusterd2/bricks/pvc-0f903cb5bfe111e8/subvol1/brick2/brick | false | 0 | 0 |
| a3aefbae-a473-4ed3-afc5-473b23e74986 | 10.244.3.8 | /var/run/glusterd2/bricks/pvc-0f903cb5bfe111e8/subvol1/brick3/brick | false | 0 | 0 |
+--------------------------------------+-------------+---------------------------------------------------------------------+--------+------+-----+
Here i am observing two things:
deploy$ vagrant up
Bringing machine 'kube1' up with 'libvirt' provider...
Bringing machine 'kube2' up with 'libvirt' provider...
Bringing machine 'kube3' up with 'libvirt' provider...
==> kube3: An error occurred. The error will be shown after all tasks complete.
==> kube2: Box 'centos/atomic-host' could not be found. Attempting to find and install...
kube2: Box Provider: libvirt
kube2: Box Version: >= 0
==> kube1: Box 'centos/atomic-host' could not be found. Attempting to find and install...
kube1: Box Provider: libvirt
kube1: Box Version: >= 0
==> kube2: An error occurred. The error will be shown after all tasks complete.
==> kube1: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.An error occurred while executing the action on the 'kube1'
machine. Please handle this error then try again:The box 'centos/atomic-host' could not be found or
could not be accessed in the remote catalog. If this is a private
box on HashiCorp's Atlas, please verify you're logged in via
vagrant login
. Also, please double-check the name. The expanded
URL and error message are shown below:URL: ["https://atlas.hashicorp.com/centos/atomic-host"]
Error: The requested URL returned error: 404An error occurred while executing the action on the 'kube2'
machine. Please handle this error then try again:The box 'centos/atomic-host' could not be found or
could not be accessed in the remote catalog. If this is a private
box on HashiCorp's Atlas, please verify you're logged in via
vagrant login
. Also, please double-check the name. The expanded
URL and error message are shown below:URL: ["https://atlas.hashicorp.com/centos/atomic-host"]
Error: The requested URL returned error: 404An error occurred while executing the action on the 'kube3'
machine. Please handle this error then try again:There are errors in the configuration of this machine. Please fix
the following errors and try again:ansible remote provisioner:
- The following settings shouldn't exist: become
Current README talks about gluster-k8s having installation script.. Provide Install section in readme, and then also detail out what the repo is about. (or explain first, and then give detail about installation etc.). I prefer to have table of content etc, so people can choose to jump ahead if they know what the project is.
If we don't comment the below line in Vagrantfile, Vagrant up will fail.
Under gcs/deploy/Vagrantfile, if i don't comment this particular line "config.vagrant.plugins = ["vagrant-libvirt"]" vagrant up will fail.
File name : Vagrantfile
File path: gcs/deploy
After commenting this particular line "config.vagrant.plugins = ["vagrant-libvirt"]" vagrant up is success.
steps to reproduce
Logs
time="2018-10-29 08:58:56.982656" level=warning msg="tracing: One or more Jaeger endpoints not specified" jaegerAgentEndpoint= jaegerEndpoint= source="[tracing.go:40:tracing.InitJaegerExporter]"
time="2018-10-29 08:58:56.985588" level=fatal msg="failed to create gd2-muxsrv listener" error="listen tcp: lookup gluster-kube1-0.glusterd2.gcs on 10.233.0.3:53: no such host" source="[server.go:24:muxsrv.newMuxSrv]"
Looks like kubernetes networking issue
output:
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 3d1c57ee-8072-44a6-912e-8df32bc79ac2 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick1/brick | false | 0 | 0 |
| 49b4d49e-1c4d-4f26-9977-b1c181b89f55 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick2/brick | true | 43326 | 511 |
| 1ce6115b-f4c3-4d49-94f0-edb1edc13d58 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick3/brick | true | 43251 | 65 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
Log:
time="2018-10-31 09:13:33.990278" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick1/brick error="SearchByBrickPath: port for brick /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick1/brick not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-10-31 09:13:33.990778" level=info msg="client disconnected" address="10.233.65.9:977" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
~
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 1d
csi-nodeplugin-glusterfsplugin-6lmb5 2/2 Running 0 1d
csi-nodeplugin-glusterfsplugin-sjgjb 2/2 Running 0 1d
csi-nodeplugin-glusterfsplugin-t6vhd 2/2 Running 0 1d
csi-provisioner-glusterfsplugin-0 2/2 Running 0 1d
etcd-hp5qsghwnk 0/1 Completed 0 1d
etcd-operator-989bf8569-d9fpt 1/1 Running 1 1d
etcd-rz76vs6p77 0/1 Completed 0 1d
glusterd2-cluster-px8qh 0/1 CrashLoopBackOff 219 1d
glusterd2-cluster-sq2q7 0/1 CrashLoopBackOff 219 1d
glusterd2-cluster-wqj4l 1/1 Running 218 1d
It seems like one of the etcd container deleted, i am able to see only two etcd ocntainers which are in "completed state" and one etcd operator container which is in running sate.
GD2 containers keep on restarting. i think this because of etcd. Able to see below errors from glusted2.log
time="2018-10-05 14:03:20.249447" level=warning msg="could not read store config file, continuing with defaults" error="open /var/lib/glusterd2/store.toml: no such file or directory" source="[config.go:128:store.GetConfig]"
time="2018-10-05 14:03:25.250568" level=error msg="failed to start embedded store" error="context deadline exceeded" source="[embed.go:36:store.newEmbedStore]"
time="2018-10-05 14:03:25.250669" level=fatal msg="Failed to initialize store (etcd client)" error="context deadline exceeded" source="[main.go:101:main.main]"
~
I see an issue in ETCD operator to support 3.3.x version
All logs generated by GCS components must go to the container stdout so they can be picked up by the cluster logging infrastructure.
New GCS components such as gd2 and csi drivers are already sending their logs appropriately, so this item is targeted mainly at older gluster components. Specifically:
The current proposal is to run a sidecar container that contains rsyslog and use that to collect the logs and present them to stdout, most likely with some format modifications such that individual log streams are discernible. The majority (all?) of the above items have the ability to send to rsyslog.
Once this is implemented, we should no longer be writing log files to storage (ephemeral or otherwise) within a GCS related pod. All logging should go through the cluster logging infrastructure by way of container output.
(As we create individual issues to track work, they can be added here)
cold reset of the cluster leads to ETCD pods going in ERROR state.
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 20h
csi-nodeplugin-glusterfsplugin-dhzbh 2/2 Running 0 20h
csi-nodeplugin-glusterfsplugin-l54d4 2/2 Running 0 20h
csi-nodeplugin-glusterfsplugin-ww55c 2/2 Running 0 20h
csi-provisioner-glusterfsplugin-0 3/3 Running 0 20h
etcd-64t8sjpxvw 1/1 Running 0 20h
etcd-bg9zcfvbl2 1/1 Running 0 20h
etcd-operator-7cb5bd459b-rlwqx 1/1 Running 0 20h
etcd-q9skwdlmmb 1/1 Running 0 20h
gluster-kube1-0 1/1 Running 1 20h
gluster-kube2-0 1/1 Running 1 20h
gluster-kube3-0 1/1 Running 1 20h
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 2 20h
csi-nodeplugin-glusterfsplugin-dhzbh 2/2 Running 2 20h
csi-nodeplugin-glusterfsplugin-l54d4 2/2 Running 2 20h
csi-nodeplugin-glusterfsplugin-ww55c 2/2 Running 2 20h
csi-provisioner-glusterfsplugin-0 3/3 Running 4 20h
etcd-64t8sjpxvw 0/1 Error 0 20h
etcd-bg9zcfvbl2 0/1 Error 0 20h
etcd-operator-7cb5bd459b-rlwqx 1/1 Running 1 20h
etcd-q9skwdlmmb 0/1 Error 0 20h
gluster-kube1-0 1/1 Running 2 20h
gluster-kube2-0 1/1 Running 2 20h
gluster-kube3-0 1/1 Running 2 20h
[vagrant@kube1 ~]$
I have created gcs setup with latest repo using vagrant. i am seeing gd2 pod names like kube1-0, kube2-0, kube3-0. This will create little bit confusion instead of that better to start gd2 pod names with 'gluster'.
In the section app-deployment command is executed in gcs-venv.
This is not possible unless kubeconfig is copied.
Update the doc. such that
either login to master and run the app + gcs backed volume deployment
or
login to master and carry out app deployment.
Currently, we are using external ETCD in GCS, if kubernetes nodes get rebooted, all the pods will get restarted.
due to pod restart, if the ETCD pods in the clusters lose the quorum, ETCD operator will not be able to maintain the ETCD cluster and the ETCD pods won't come up automatically
ETCD operator issue: coreos/etcd-operator#1972
Pods after before node restart
[vagrant@kube1 ~]$ kubectl get po -ngcs
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 5m
csi-nodeplugin-glusterfsplugin-6mfkz 2/2 Running 0 4m59s
csi-nodeplugin-glusterfsplugin-9894b 2/2 Running 0 4m59s
csi-nodeplugin-glusterfsplugin-kg47n 2/2 Running 0 4m59s
csi-provisioner-glusterfsplugin-0 2/2 Running 0 4m59s
etcd-operator-7cb5bd459b-cvzpj 1/1 Running 0 13m
etcd-pdrj27zbz6 1/1 Running 0 11m
etcd-qnfdg7m4vm 1/1 Running 0 12m
etcd-skblg6rl8m 1/1 Running 0 11m
gluster-kube1-0 1/1 Running 0 10m
gluster-kube2-0 1/1 Running 0 10m
gluster-kube3-0 1/1 Running 0 10m
Pods status After node reboot
[vagrant@kube1 ~]$ kubectl get po -ngcs
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 2 8m17s
csi-nodeplugin-glusterfsplugin-6mfkz 2/2 Running 2 8m16s
csi-nodeplugin-glusterfsplugin-9894b 2/2 Running 2 8m16s
csi-nodeplugin-glusterfsplugin-kg47n 2/2 Running 2 8m16s
csi-provisioner-glusterfsplugin-0 2/2 Running 2 8m16s
etcd-operator-7cb5bd459b-cvzpj 1/1 Running 1 16m
etcd-pdrj27zbz6 0/1 Error 0 14m
etcd-qnfdg7m4vm 0/1 Error 0 15m
etcd-skblg6rl8m 0/1 Error 0 15m
gluster-kube1-0 1/1 Running 1 14m
gluster-kube2-0 1/1 Running 1 14m
gluster-kube3-0 1/1 Running 1 14m
Logs from ETCD operator
time="2018-11-05T05:20:45Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:20:53Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:01Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:09Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:17Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:25Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:33Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:41Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:49Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:57Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:05Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:13Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:21Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:29Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:37Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:45Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:53Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:23:01Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:23:09Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
As a first step at getting an e2e suite for GCS, we should have an automated job that (nightly?):
[root@rhsqa-virt05 rajesh-76]# ./prepare.sh
vagrant and vagrant-libvirt were found.
For easier operation, ensure that libvirt has been configured to work without passwords.
Ref: https://developer.fedoraproject.org/tools/vagrant/vagrant-libvirt.html
Ensuring kubespray is present
Creating a python virtualenv gcs-venv
Installing requirements into gcs-venv
**Could not open requirements file: [Errno 2] No such file or directory: './kubespray/requirements.txt'**
Virtualenv gcs-venv has been created
The virtualenv needs to be activated before doing any operations. Operations may fail if virtualenv is not activated.
To activate the virutalenv, run:
$ source gcs-venv/bin/activate
(gcs-venv) $
To deactivate an activated virtualenv, run:
(gcs-venv) $ deactivate
$
Note: The virtualenv should be activated for each shell session individually.
Mostly i found that once repo is cloned , if we copy data from deploy directory to new directory for first time then only we are facing this issue, second time onwards we are not facing this issue.
We already have the ability to create a kubernetes cluster.
We should also have a simple way of bringing up & deploying GCS on an OpenShift Origin, aka OKD, cluster. This will help us hammer out any unexpected differences between the two.
Need scc setup of hostpath plugin: Followed this doc to work it around.
the opc cluster is on aws: the node name is like ip-172-31-10-185.us-west-2.compute.internal
which causes problem for the statefulset. Created bz1643191
To workaround this issue: Use a simpler name for the statefulset
# git diff deploy-gcs.yml tasks/create-gd2-manifests.yml templates/gcs-manifests/gcs-gd2.yml.j2
diff --git a/deploy/deploy-gcs.yml b/deploy/deploy-gcs.yml
index 5efd085..fc0bac1 100644
--- a/deploy/deploy-gcs.yml
+++ b/deploy/deploy-gcs.yml
@@ -42,8 +42,10 @@
- name: GCS Pre | Manifests | Create GD2 manifests
include_tasks: tasks/create-gd2-manifests.yml
loop: "{{ groups['gcs-node'] }}"
loop_control:
+ index_var: index
loop_var: gcs_node
post_tasks:
diff --git a/deploy/tasks/create-gd2-manifests.yml b/deploy/tasks/create-gd2-manifests.yml
index d9a2d2d..4c015ef 100644
--- a/deploy/tasks/create-gd2-manifests.yml
+++ b/deploy/tasks/create-gd2-manifests.yml
@@ -3,6 +3,7 @@
- name: GCS Pre | Manifests | Create GD2 manifests for {{ gcs_node }} | Set fact kube_hostname
set_fact:
kube_hostname: "{{ gcs_node }}"
+ gcs_node_index: "{{ index }}"
- name: GCS Pre | Manifests | Create GD2 manifests for {{ gcs_node }} | Create gcs-gd2-{{ gcs_node }}.yml
template:
diff --git a/deploy/templates/gcs-manifests/gcs-gd2.yml.j2 b/deploy/templates/gcs-manifests/gcs-gd2.yml.j2
index fe48b35..3376b11 100644
--- a/deploy/templates/gcs-manifests/gcs-gd2.yml.j2
+++ b/deploy/templates/gcs-manifests/gcs-gd2.yml.j2
@@ -2,7 +2,7 @@
kind: StatefulSet
apiVersion: apps/v1
metadata:
- name: gluster-{{ kube_hostname }}
+ name: gluster-{{ gcs_node_index }}
namespace: {{ gcs_namespace }}
labels:
app.kubernetes.io/part-of: gcs
Wait for glusterd2-cluster to become ready
TASK [GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready] **********************************************************
Thursday 25 October 2018 18:39:50 +0000 (0:00:00.083) 0:00:54.274 ******
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (50 retries left).
# oc get pod
NAME READY STATUS RESTARTS AGE
etcd-6jmbmv6sw7 1/1 Running 0 23m
etcd-mvwq6c2w6f 1/1 Running 0 23m
etcd-n92rtb9wfr 1/1 Running 0 23m
etcd-operator-54bbdfc55d-mdvd9 1/1 Running 0 24m
gluster-0-0 1/1 Running 7 23m
gluster-1-0 1/1 Running 7 23m
gluster-2-0 1/1 Running 7 23m
# oc describe pod gluster-1-0
Name: gluster-1-0
Namespace: gcs
Priority: 0
PriorityClassName: <none>
Node: ip-172-31-59-125.us-west-2.compute.internal/172.31.59.125
Start Time: Thu, 25 Oct 2018 18:39:49 +0000
Labels: app.kubernetes.io/component=glusterfs
app.kubernetes.io/name=glusterd2
app.kubernetes.io/part-of=gcs
controller-revision-hash=gluster-1-598d756667
statefulset.kubernetes.io/pod-name=gluster-1-0
Annotations: openshift.io/scc=hostpath
Status: Running
IP: 172.21.0.15
Controlled By: StatefulSet/gluster-1
Containers:
glusterd2:
Container ID: docker://0433446ecbd7a25d5aa9f51f0bd5c3226090850b18d0e63d58d07e47c6fdd039
Image: docker.io/gluster/glusterd2-nightly:20180920
Image ID: docker-pullable://docker.io/gluster/glusterd2-nightly@sha256:7013c3de3ed2c8b9c380c58b7c331dfc70df39fe13faea653b25034545971072
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Thu, 25 Oct 2018 19:00:48 +0000
Finished: Thu, 25 Oct 2018 19:03:48 +0000
Ready: False
Restart Count: 7
Liveness: http-get http://:24007/ping delay=10s timeout=1s period=60s #success=1 #failure=3
Environment:
GD2_ETCDENDPOINTS: http://etcd-client.gcs:2379
GD2_CLUSTER_ID: dd68cd6b-b828-4c13-86a4-35c492b5d4c2
GD2_CLIENTADDRESS: gluster-ip-172-31-59-125.us-west-2.compute.internal-0.glusterd2.gcs:24007
GD2_PEERADDRESS: gluster-ip-172-31-59-125.us-west-2.compute.internal-0.glusterd2.gcs:24008
GD2_RESTAUTH: false
Mounts:
/dev from gluster-dev (rw)
/run/lvm from gluster-lvm (rw)
/sys/fs/cgroup from gluster-cgroup (ro)
/usr/lib/modules from gluster-kmods (ro)
/var/lib/glusterd2 from glusterd2-statedir (rw)
/var/log/glusterd2 from glusterd2-logdir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hvj7w (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
gluster-dev:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType:
gluster-cgroup:
Type: HostPath (bare host directory volume)
Path: /sys/fs/cgroup
HostPathType:
gluster-lvm:
Type: HostPath (bare host directory volume)
Path: /run/lvm
HostPathType:
gluster-kmods:
Type: HostPath (bare host directory volume)
Path: /usr/lib/modules
HostPathType:
glusterd2-statedir:
Type: HostPath (bare host directory volume)
Path: /var/lib/glusterd2
HostPathType: DirectoryOrCreate
glusterd2-logdir:
Type: HostPath (bare host directory volume)
Path: /var/log/glusterd2
HostPathType: DirectoryOrCreate
default-token-hvj7w:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hvj7w
Optional: false
QoS Class: BestEffort
Node-Selectors: node-role.kubernetes.io/compute=true
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 25m default-scheduler Successfully assigned gcs/gluster-1-0 to ip-172-31-59-125.us-west-2.compute.internal
Normal Pulling 25m kubelet, ip-172-31-59-125.us-west-2.compute.internal pulling image "docker.io/gluster/glusterd2-nightly:20180920"
Normal Pulled 24m kubelet, ip-172-31-59-125.us-west-2.compute.internal Successfully pulled image "docker.io/gluster/glusterd2-nightly:20180920"
Normal Created 16m (x4 over 24m) kubelet, ip-172-31-59-125.us-west-2.compute.internal Created container
Normal Started 16m (x4 over 24m) kubelet, ip-172-31-59-125.us-west-2.compute.internal Started container
Normal Killing 16m (x3 over 22m) kubelet, ip-172-31-59-125.us-west-2.compute.internal Killing container with id docker://glusterd2:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 16m (x3 over 22m) kubelet, ip-172-31-59-125.us-west-2.compute.internal Container image "docker.io/gluster/glusterd2-nightly:20180920" already present on machine
Warning Unhealthy 4m (x21 over 24m) kubelet, ip-172-31-59-125.us-west-2.compute.internal Liveness probe failed: Get http://172.21.0.15:24007/ping: dial tcp 172.21.0.15:24007: connect: connection refused
The PR #6 adds support to kubespray based k8s install. Would like to know what is the major difference between minikube and this, and what it takes to support minikube in future.
I've been looking into building a helm chart to deploy gcs.
My hope is to break the dependency on Ansible so it's easier to get it running on clusters that aren't started with the kubespray templates in this repo. That should make it easier for people to try out on their personal clusters.
I will update this issue as things progress. It anyone else is interested, I'm happy to collaborate.
I am aware of many discussions where we did consider using 'loopback' devices (losetup
) as a block device, where it just uses a file on glusterfs as backend.
This has both challenges and benefits. Happy to discuss this further as experimental option for GCS.
People with some thoughts on this, please share your opinion, observation, and requirements, so we can collect it, and see if we can come out with some design which is agreed from everyone!
@JohnStrunk @obnoxxx @raghavendra-talur @vbellur @pkalever @pranithk @ShyamsundarR @jarrpa @phlogistonjohn @humblec @lxbsz @aravindavk @Madhu-1 @atinmu @poornimag
We should go with replica 3 (1x3) volume type, and below config in my opinion in GCS. We will start with bare-minimum volume options, and then add features along the way.
For mvp0:
Brick Graph:
posix
access-control
locks
upcall
io-threads
selinux # do we need it?
index
io-stats
server
Client graph: (Goal, all applications should run, performance next).
client
replica
dht
io-stats
I will start another issue with Options available in each of these translators, and we have to agree to the default options on each of these.
@gluster/gluster-all
steps to reproduce:-
[root@gluster-kube1-0 /]# glustercli volume status --endpoints=http://gluster-kube3-0.glusterd2.gcs:24007
Volume : pvc-f603ac47dcdc11e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| ae28c880-6067-4a3f-adef-ecf20219066f | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick1/brick | true | 40314 | 114 |
| 3afb1083-9115-4ad5-b23d-2a08b32dd8cb | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick2/brick | true | 42432 | 300 |
| e41c0d0f-95d9-4139-866a-ec73aacabe3a | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick3/brick | true | 43123 | 114 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]# glustercli volume heal info pvc-f603ac47dcdc11e8 --endpoints=http://gluster-kube3-0.glusterd2.gcs:24007
Brick: gluster-kube3-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick1/brick
Status: Connected
entries: 0
Brick: gluster-kube1-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick2/brick
Status: Connected
entries: 0
Brick: gluster-kube2-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick3/brick
Status: Connected
entries: 0
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]# kill -9 300
[root@gluster-kube1-0 /]# glustercli volume status --endpoints=http://gluster-kube3-0.glusterd2.gcs:24007
Volume : pvc-f603ac47dcdc11e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 3afb1083-9115-4ad5-b23d-2a08b32dd8cb | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick2/brick | false | 0 | 0 |
| e41c0d0f-95d9-4139-866a-ec73aacabe3a | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick3/brick | true | 43123 | 114 |
| ae28c880-6067-4a3f-adef-ecf20219066f | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick1/brick | true | 40314 | 114 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube1-0 /]# glustercli volume heal info pvc-f603ac47dcdc11e8 --endpoints=http://gluster-kube3-0.glusterd2.gcs:24007
Failed to get heal info for volume pvc-f603ac47dcdc11e8
Response headers:
X-Gluster-Cluster-Id: 27056e19-500a-4e7a-b5a9-71f461679196
X-Gluster-Peer-Id: 6a4f1154-aaf0-41ed-b899-6a22f98bfef4
X-Request-Id: 4b6319ec-ac9c-4af0-9f93-5f1adf491d75
Response body:
strconv.ParseInt: parsing "-": invalid syntax
[root@kube1-0 /]# glustercli peer status
Failed to get Peers list
Failed to connect to glusterd. Please check if
โ glusterd2.service - GlusterD2, the management service for GlusterFS (pre-release)
Loaded: loaded (/usr/lib/systemd/system/glusterd2.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/glusterd2.service.d
โโoverride.conf
Active: active (running) since Tue 2018-10-09 13:07:30 UTC; 31min ago
Main PID: 21 (glusterd2)
CGroup: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc2de0d59_cbc3_11e8_8128_525400f652d0.slice/docker-a1f24ec1df72f23c60ddce138de88f38fe484febdcf3a41f710ac7461a88936a.scope/system.slice/glusterd2.service
โโ21 /usr/sbin/glusterd2 --config=/etc/glusterd2/glusterd2.toml
Oct 09 13:07:30 kube1-0 systemd[1]: Started GlusterD2, the management service for GlusterFS (pre-release).
Oct 09 13:07:30 kube1-0 systemd[1]: Starting GlusterD2, the management service for GlusterFS (pre-release)...
Vagrant up is failing with below error, it's not reproducible every time. Here my observation is that, with out running prepare.sh on base deploy directory. Copy data from deploy directory to new directory, run prepare.sh on new directory then if we perform "vagrant up" , at the end vagrant up is failing below error.
==> kube3: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.
An error occurred while executing the action on the 'kube3'
machine. Please handle this error then try again:
Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.
Continuation with #24
The out put in the gd2 pod showing two volume with same name
[root@kube1-0 /]# glustercli volume status --endpoints="http://kube1-0.glusterd2.gcs:24007"
Volume : pvc-044c90d4cd3411e8
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+------+-----+
| 22752a52-4be3-4ec2-beec-ad53290dfd3e | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick1/brick | false | 0 | 0 |
| 81d97c58-4f76-4c10-bcb7-1d64c552e515 | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick2/brick | false | 0 | 0 |
| 981ca2e3-e282-4073-b540-82a1bd849a5c | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick3/brick | false | 0 | 0 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+------+-----+
Volume : pvc-b7ef1f27cd3611e8
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| 318151a7-6b5d-41d4-ae77-78d7073bda4f | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-b7ef1f27cd3611e8/subvol1/brick2/brick | true | 49152 | 66 |
| 36c6fa77-1cb9-4140-bcea-06a5e0d50e72 | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-b7ef1f27cd3611e8/subvol1/brick3/brick | true | 49152 | 305 |
| 7c6a443b-dedf-4001-837d-10c482e12043 | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-b7ef1f27cd3611e8/subvol1/brick1/brick | true | 49152 | 306 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
Where as the one volume showing pid as null/0 .
[vagrant@kube1 ~]$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
gcs-pvc1 Bound pvc-b7ef1f27cd3611e8 2Gi RWX glusterfs-csi 11m
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
gcs-pvc0 Bound pvc-0908fe34-f46c-11e8-be25-52540049d944 2Gi RWX glusterfs-csi 46s
gcs-pvc1 Bound pvc-096ac87f-f46c-11e8-be25-52540049d944 2Gi RWX glusterfs-csi 46s
gcs-pvc10 Pending glusterfs-csi 39s
gcs-pvc11 Pending glusterfs-csi 37s
gcs-pvc12 Bound pvc-0ebfb47b-f46c-11e8-be25-52540049d944 2Gi RWX glusterfs-csi 37s
gcs-pvc13 Bound pvc-0f561bc5-f46c-11e8-be25-52540049d944 2Gi RWX glusterfs-csi 36s
gcs-pvc14 Pending glusterfs-csi 35s
gcs-pvc15 Pending glusterfs-csi 34s
gcs-pvc16 Pending glusterfs-csi 33s
gcs-pvc17 Bound pvc-11d38245-f46c-11e8-be25-52540049d944 2Gi RWX glusterfs-csi 32s
gcs-pvc18 Pending glusterfs-csi 31s
gcs-pvc19 Pending glusterfs-csi 30s
gcs-pvc2 Pending glusterfs-csi 45s
gcs-pvc3 Pending glusterfs-csi 45s
gcs-pvc4 Bound pvc-0a6800d6-f46c-11e8-be25-52540049d944 2Gi RWX glusterfs-csi 44s
gcs-pvc5 Bound pvc-0b09491f-f46c-11e8-be25-52540049d944 2Gi RWX glusterfs-csi 43s
gcs-pvc6 Pending glusterfs-csi 42s
gcs-pvc7 Bound pvc-0c2f0674-f46c-11e8-be25-52540049d944 2Gi RWX glusterfs-csi 41s
gcs-pvc8 Pending glusterfs-csi 40s
gcs-pvc9 Pending glusterfs-csi 40s
most of them going in pending state.
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 2d18h
csi-nodeplugin-glusterfsplugin-2ctl7 2/2 Running 0 2d18h
csi-nodeplugin-glusterfsplugin-kh4jb 2/2 Running 0 2d18h
csi-nodeplugin-glusterfsplugin-wvrld 2/2 Running 0 2d18h
csi-provisioner-glusterfsplugin-0 3/3 Running 1 2d18h
etcd-5nh7mdst6h 1/1 Running 0 2d18h
etcd-8psmj7t2gk 1/1 Running 0 2d18h
etcd-hsgdxsrk7j 1/1 Running 0 5m7s
etcd-operator-7cb5bd459b-vzvxn 1/1 Running 1 2d18h
gluster-kube1-0 1/1 Running 1 3m26s
gluster-kube2-0 1/1 Running 1 2d18h
gluster-kube3-0 1/1 Running 1 2d18h
Here we can see the etcd and gluster-kube1-0 pods both got killed and respin happened.
[241052.757011] Out of memory: Kill process 23065 (glusterd2) score 1291 or sacrifice child
[241052.759723] Killed process 12991 (lvcreate) total-vm:62140kB, anon-rss:3372kB, file-rss:1448kB, shmem-rss:0kB
[241078.802364] kubelet invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=-999
[241078.804948] kubelet cpuset=/ mems_allowed=0
[241078.806282] CPU: 0 PID: 15957 Comm: kubelet Kdump: loaded Tainted: G ------------ T 3.10.0-862.11.6.el7.x86_64 #1
[241078.809840] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
[241078.813709] Call Trace:
[241078.814496] [<ffffffffacd135d4>] dump_stack+0x19/0x1b
[241078.816075] [<ffffffffacd0e79f>] dump_header+0x90/0x229
[241078.817776] [<ffffffffac8dc63b>] ? cred_has_capability+0x6b/0x120
[241078.819822] [<ffffffffac79ac64>] oom_kill_process+0x254/0x3d0
[241078.822464] [<ffffffffac8dc71e>] ? selinux_capable+0x2e/0x40
[241078.824270] [<ffffffffac79b4a6>] out_of_memory+0x4b6/0x4f0
[241078.826074] [<ffffffffacd0f2a3>] __alloc_pages_slowpath+0x5d6/0x724
[241078.828047] [<ffffffffac7a17f5>] __alloc_pages_nodemask+0x405/0x420
[241078.830064] [<ffffffffac7ebf98>] alloc_pages_current+0x98/0x110
[241078.832282] [<ffffffffac797057>] __page_cache_alloc+0x97/0xb0
[241078.834345] [<ffffffffac799758>] filemap_fault+0x298/0x490
[241078.836144] [<ffffffffc053585f>] xfs_filemap_fault+0x5f/0xe0 [xfs]
[241078.838163] [<ffffffffac7c352a>] __do_fault.isra.58+0x8a/0x100
[241078.840691] [<ffffffffac7c3adc>] do_read_fault.isra.60+0x4c/0x1b0
[241078.842624] [<ffffffffac7c8484>] handle_pte_fault+0x2f4/0xd10
[241078.844459] [<ffffffffac7cae3d>] handle_mm_fault+0x39d/0x9b0
[241078.846793] [<ffffffffac8376b9>] ? dput+0x29/0x160
[241078.848351] [<ffffffffacd20557>] __do_page_fault+0x197/0x4f0
[241078.850512] [<ffffffffacd20996>] trace_do_page_fault+0x56/0x150
[241078.852863] [<ffffffffacd1ff22>] do_async_page_fault+0x22/0xf0
[241078.854659] [<ffffffffacd1c788>] async_page_fault+0x28/0x30
[241078.856921] Mem-Info:
[241078.857677] active_anon:370326 inactive_anon:1187 isolated_anon:0
active_file:4 inactive_file:2231 isolated_file:2
unevictable:6692 dirty:2 writeback:0 unstable:0
slab_reclaimable:11037 slab_unreclaimable:18435
mapped:2359 shmem:2023 pagetables:3707 bounce:0
free:13061 free_pcp:30 free_cma:0
[241078.868409] Node 0 DMA free:7632kB min:380kB low:472kB high:568kB active_anon:4408kB inactive_anon:104kB active_file:0kB inactive_file:28kB unevictable:300kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:300kB dirty:0kB writeback:0kB mapped:160kB shmem:156kB slab_reclaimable:192kB slab_unreclaimable:2648kB kernel_stack:112kB pagetables:72kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[241078.882955] lowmem_reserve[]: 0 1819 1819 1819
[241078.884613] Node 0 DMA32 free:44832kB min:44672kB low:55840kB high:67008kB active_anon:1476896kB inactive_anon:4644kB active_file:100kB inactive_file:12748kB unevictable:26468kB isolated(anon):0kB isolated(file):0kB present:2080592kB managed:1866644kB mlocked:26468kB dirty:8kB writeback:0kB mapped:9276kB shmem:7936kB slab_reclaimable:43956kB slab_unreclaimable:71092kB kernel_stack:15600kB pagetables:14756kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2 all_unreclaimable? no
[241078.900887] lowmem_reserve[]: 0 0 0 0
[241078.902897] Node 0 DMA: 192*4kB (UEM) 128*8kB (EM) 47*16kB (UEM) 9*32kB (UEM) 5*64kB (UE) 1*128kB (U) 1*256kB (M) 2*512kB (EM) 1*1024kB (E) 1*2048kB (M) 0*4096kB = 7632kB
[241078.908969] Node 0 DMA32: 114*4kB (UE) 628*8kB (EM) 2159*16kB (UEM) 100*32kB (UEM) 19*64kB (UEM) 2*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44696kB
[241078.913942] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[241078.916643] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[241078.920055] 7398 total pagecache pages
[241078.921395] 0 pages in swap cache
[241078.922354] Swap cache stats: add 0, delete 0, find 0/0
[241078.923969] Free swap = 0kB
[241078.925044] Total swap = 0kB
[241078.926414] 524146 pages RAM
[241078.927710] 0 pages HighMem/MovableOnly
[241078.929004] 53508 pages reserved
[241078.930111] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[241078.933211] [ 603] 0 603 28314 122 59 0 0 systemd-journal
[241078.935937] [ 625] 0 625 68697 404 33 0 0 lvmetad
[241078.938434] [ 645] 0 645 12002 452 25 0 -1000 systemd-udevd
[241078.943137] [ 716] 32 716 17820 138 39 0 0 rpcbind
[241078.946513] [ 718] 81 718 17683 206 36 0 -900 dbus-daemon
[241078.949782] [ 723] 999 723 135645 1381 63 0 0 polkitd
[241078.952510] [ 724] 0 724 114406 1313 86 0 0 NetworkManager
[241078.955281] [ 727] 0 727 5414 69 16 0 0 irqbalance
[241078.957896] [ 729] 0 729 6627 105 20 0 0 systemd-logind
[241078.960463] [ 732] 994 732 29953 118 28 0 0 chronyd
[241078.963597] [ 734] 0 734 25794 118 35 0 0 gssproxy
[241078.966319] [ 780] 0 780 120631 3452 99 0 0 tuned
[241078.968793] [ 785] 0 785 28718 258 57 0 -1000 sshd
[241078.971211] [ 799] 0 799 5948 41 14 0 0 rhsmcertd
[241078.973742] [ 815] 0 815 26849 499 53 0 0 dhclient
[241078.976644] [ 823] 0 823 5601 40 11 0 0 agetty
[241078.980776] [ 825] 0 825 5601 38 12 0 0 agetty
[241078.983429] [ 841] 0 841 8594 159 16 0 0 crond
[241078.986419] [11893] 0 11893 224318 8246 100 0 -999 dockerd-current
[241078.989582] [14287] 0 14287 5317 46 12 0 0 etcd
[241078.992319] [14289] 0 14289 55754 1489 31 0 0 docker-current
[241078.995718] [14298] 0 14298 68416 327 22 0 -500 docker-containe
[241078.998633] [14313] 0 14313 2652055 19995 71 0 0 etcd
[241079.001550] [15854] 0 15854 192929 20087 134 0 -999 kubelet
[241079.004817] [15958] 0 15958 68416 150 22 0 -999 docker-containe
[241079.007971] [15973] 0 15973 253 1 4 0 -998 pause
[241079.012490] [15994] 0 15994 52032 150 21 0 -999 docker-containe
[241079.016116] [16009] 0 16009 132732 4594 98 0 -998 hyperkube
[241079.018681] [16262] 0 16262 84800 647 23 0 -999 docker-containe
[241079.021555] [16277] 0 16277 253 1 4 0 -998 pause
[241079.024743] [16393] 0 16393 52032 153 20 0 -999 docker-containe
[241079.027552] [16409] 0 16409 198298 83046 264 0 868 hyperkube
[241079.030453] [16710] 0 16710 52032 148 21 0 -999 docker-containe
[241079.034266] [16727] 0 16727 253 1 4 0 -998 pause
[241079.038057] [16909] 0 16909 52032 143 21 0 -999 docker-containe
[241079.042089] [16925] 0 16925 253 1 4 0 -998 pause
[241079.047506] [18070] 0 18070 52032 139 21 0 -999 docker-containe
[241079.050431] [18087] 0 18087 253 1 4 0 -998 pause
[241079.052965] [18115] 0 18115 85152 137 23 0 -999 docker-containe
[241079.056642] [18130] 0 18130 61401 2553 39 0 967 flanneld
[241079.059493] [18161] 0 18161 68416 145 22 0 -999 docker-containe
[241079.062421] [18177] 0 18177 303 14 4 0 999 install-cni.sh
[241079.065448] [18875] 0 18875 68416 150 22 0 -999 docker-containe
[241079.068516] [18893] 0 18893 253 1 4 0 -998 pause
[241079.071095] [19016] 0 19016 68416 141 22 0 -999 docker-containe
[241079.073920] [19032] 0 19032 7609 941 19 0 1000 local-provision
[241079.076680] [19749] 0 19749 52032 146 21 0 -999 docker-containe
[241079.079652] [19767] 0 19767 253 1 4 0 -998 pause
[241079.083555] [19889] 0 19889 85152 139 24 0 -999 docker-containe
[241079.087124] [19909] 0 19909 35298 1980 30 0 962 coredns
[241079.089768] [22272] 0 22272 68416 145 22 0 -999 docker-containe
[241079.092865] [22288] 0 22288 253 1 4 0 -998 pause
[241079.096662] [22536] 0 22536 84800 180 23 0 -999 docker-containe
[241079.102521] [22551] 0 22551 253 1 4 0 -998 pause
[241079.106050] [23008] 0 23008 68416 137 22 0 -999 docker-containe
[241079.109054] [23023] 0 23023 10778 322 26 0 1000 systemd
[241079.113538] [23053] 0 23053 8836 614 24 0 1000 systemd-journal
[241079.116915] [23055] 0 23055 58430 9217 39 0 1000 gluster-exporte
[241079.120732] [23063] 32 23063 17305 135 37 0 1000 rpcbind
[241079.123536] [23065] 0 23065 279994 141995 337 0 1000 glusterd2
[241079.126930] [23134] 81 23134 14516 117 31 0 -900 dbus-daemon
[241079.129983] [23283] 0 23283 84800 660 22 0 -999 docker-containe
[241079.133300] [23298] 0 23298 2662238 26772 82 0 1000 etcd
[241079.135990] [26046] 0 26046 84800 650 22 0 -999 docker-containe
[241079.139008] [26061] 0 26061 253 1 4 0 -998 pause
[241079.142517] [26298] 0 26298 52384 143 22 0 -999 docker-containe
[241079.146212] [26313] 0 26313 7516 814 19 0 1000 driver-registra
[241079.149486] [26606] 0 26606 52384 1182 22 0 -999 docker-containe
[241079.153324] [26622] 0 26622 8190 989 20 0 1000 glusterfs-csi-d
[241079.156803] [27575] 0 27575 68416 184 22 0 -999 docker-containe
[241079.159842] [27590] 1000 27590 253 1 4 0 -998 pause
[241079.162735] [27749] 0 27749 52384 148 22 0 -999 docker-containe
[241079.165745] [27767] 0 27767 253 1 4 0 -998 pause
[241079.168493] [28322] 0 28322 68416 155 22 0 -999 docker-containe
[241079.171408] [28338] 472 28338 187896 2911 54 0 946 grafana-server
[241079.174277] [28462] 0 28462 68416 186 22 0 -999 docker-containe
[241079.178141] [28477] 1000 28477 27457 347 13 0 973 prometheus-conf
[241079.182720] [28614] 0 28614 68416 184 22 0 -999 docker-containe
[241079.186738] [28629] 1000 28629 2600 784 10 0 995 configmap-reloa
[241079.190928] [28692] 0 28692 35648 139 20 0 -999 docker-containe
[241079.195195] [28708] 1000 28708 38267 9428 55 0 999 prometheus
[241079.200457] [ 3095] 0 3095 261940 6651 56 0 -1000 dmeventd
[241079.204810] [ 1278] 0 1278 66103 144 21 0 -999 docker-containe
[241079.208967] [ 1293] 0 1293 2955 109 12 0 1000 bash
[241079.212351] [24029] 0 24029 82487 145 22 0 -999 docker-containe
[241079.217271] [24044] 0 24044 2955 98 11 0 1000 bash
[241079.220016] [32074] 0 32074 39688 335 78 0 0 sshd
[241079.222883] [32077] 1000 32077 39767 401 78 0 0 sshd
[241079.225605] [32078] 1000 32078 6775 987 15 0 0 bash
[241079.228841] [ 3680] 0 3680 299 1 5 0 999 sleep
[241079.232823] [ 9877] 0 9877 87558 930 52 0 1000 glusterfs
[241079.235885] [11311] 0 11311 214441 661 84 0 1000 glusterfsd
[241079.238633] [11726] 0 11726 84800 175 23 0 -999 docker-containe
[241079.241660] [11727] 0 11727 68152 154 22 0 -999 docker-containe
[241079.245132] [11761] 0 11761 115028 4151 98 0 912 hyperkube
[241079.248021] [11765] 0 11765 113243 4191 97 0 949 hyperkube
[241079.251843] [13250] 0 13250 15425 463 35 0 1000 lvm
[241079.255549] [13267] 1000 13267 67987 1238 44 0 0 kubectl
[241079.258248] [13295] 0 13295 33876 805 23 0 -999 docker-containe
[241079.261093] [13307] 0 13307 4030 23 9 0 -999 du
[241079.264160] Out of memory: Kill process 23065 (glusterd2) score 1291 or sacrifice child
[241079.266900] Killed process 23065 (glusterd2) total-vm:1119976kB, anon-rss:567980kB, file-rss:0kB, shmem-rss:0kB
[root@rhsqa-virt05 deploy2]# vagrant destroy -f
==> kube3: Removing domain...
==> kube2: Removing domain...
==> kube1: Removing domain...
GCS setup got deleted successfully
[root@rhsqa-virt05 deploy2]# vagrant up
Bringing machine 'kube1' up with 'libvirt' provider...
Bringing machine 'kube2' up with 'libvirt' provider...
Bringing machine 'kube3' up with 'libvirt' provider...
==> kube3: Creating image (snapshot of base box volume).
==> kube2: Creating image (snapshot of base box volume).
==> kube1: Creating image (snapshot of base box volume).
==> kube3: Creating domain with the following settings...
==> kube1: Creating domain with the following settings...
==> kube2: Creating domain with the following settings...
==> kube3: -- Name: akarsha-1_kube3
==> kube2: -- Name: akarsha-1_kube2
==> kube3: -- Domain type: kvm
==> kube2: -- Domain type: kvm
==> kube1: -- Name: akarsha-1_kube1
==> kube3: -- Cpus: 2
==> kube2: -- Cpus: 2
==> kube1: -- Domain type: kvm
==> kube3: -- Feature: acpi
==> kube2: -- Feature: acpi
==> kube3: -- Feature: apic
==> kube3: -- Feature: pae
==> kube2: -- Feature: apic
==> kube1: -- Cpus: 2
==> kube1: -- Feature: acpi
==> kube1: -- Feature: apic
==> kube1: -- Feature: pae
==> kube3: -- Memory: 2048M
==> kube1: -- Memory: 2048M
==> kube2: -- Feature: pae
==> kube3: -- Management MAC:
==> kube1: -- Management MAC:
==> kube2: -- Memory: 2048M
==> kube3: -- Loader:
==> kube1: -- Loader:
==> kube3: -- Base box: centos/atomic-host
==> kube2: -- Management MAC:
==> kube1: -- Base box: centos/atomic-host
==> kube2: -- Loader:
==> kube3: -- Storage pool: default
==> kube2: -- Base box: centos/atomic-host
==> kube1: -- Storage pool: default
==> kube3: -- Image: /var/lib/libvirt/images/akarsha-1_kube3.img (11G)
==> kube1: -- Image: /var/lib/libvirt/images/akarsha-1_kube1.img (11G)
==> kube2: -- Storage pool: default
==> kube3: -- Volume Cache: default
==> kube1: -- Volume Cache: default
==> kube2: -- Image: /var/lib/libvirt/images/akarsha-1_kube2.img (11G)
==> kube1: -- Kernel:
==> kube3: -- Kernel:
==> kube2: -- Volume Cache: default
==> kube1: -- Initrd:
==> kube2: -- Kernel:
==> kube3: -- Initrd:
==> kube1: -- Graphics Type: vnc
==> kube3: -- Graphics Type: vnc
==> kube2: -- Initrd:
==> kube1: -- Graphics Port: 5900
==> kube2: -- Graphics Type: vnc
==> kube3: -- Graphics Port: 5900
==> kube1: -- Graphics IP: 127.0.0.1
==> kube2: -- Graphics Port: 5900
==> kube1: -- Graphics Password: Not defined
==> kube3: -- Graphics IP: 127.0.0.1
==> kube2: -- Graphics IP: 127.0.0.1
==> kube3: -- Graphics Password: Not defined
==> kube1: -- Video Type: cirrus
==> kube2: -- Graphics Password: Not defined
==> kube3: -- Video Type: cirrus
==> kube1: -- Video VRAM: 9216
==> kube2: -- Video Type: cirrus
==> kube3: -- Video VRAM: 9216
==> kube2: -- Video VRAM: 9216
==> kube1: -- Sound Type:
==> kube3: -- Sound Type:
==> kube2: -- Sound Type:
==> kube1: -- Keymap: en-us
==> kube3: -- Keymap: en-us
==> kube2: -- Keymap: en-us
==> kube1: -- TPM Path:
==> kube3: -- TPM Path:
==> kube2: -- TPM Path:
==> kube3: -- Disks: vdb(qcow2,1024G), vdc(qcow2,1024G), vdd(qcow2,1024G)
==> kube1: -- Disks: vdb(qcow2,1024G), vdc(qcow2,1024G), vdd(qcow2,1024G)
==> kube3: -- Disk(vdb): /var/lib/libvirt/images/akarsha-1_kube3-vdb.qcow2
==> kube1: -- Disk(vdb): /var/lib/libvirt/images/akarsha-1_kube1-vdb.qcow2
==> kube3: -- Disk(vdc): /var/lib/libvirt/images/akarsha-1_kube3-vdc.qcow2
==> kube2: -- Disks: vdb(qcow2,1024G), vdc(qcow2,1024G), vdd(qcow2,1024G)
==> kube3: -- Disk(vdd): /var/lib/libvirt/images/akarsha-1_kube3-vdd.qcow2
==> kube1: -- Disk(vdc): /var/lib/libvirt/images/akarsha-1_kube1-vdc.qcow2
==> kube3: -- INPUT: type=mouse, bus=ps2
==> kube2: -- Disk(vdb): /var/lib/libvirt/images/akarsha-1_kube2-vdb.qcow2
==> kube1: -- Disk(vdd): /var/lib/libvirt/images/akarsha-1_kube1-vdd.qcow2
==> kube2: -- Disk(vdc): /var/lib/libvirt/images/akarsha-1_kube2-vdc.qcow2
==> kube1: -- INPUT: type=mouse, bus=ps2
==> kube2: -- Disk(vdd): /var/lib/libvirt/images/akarsha-1_kube2-vdd.qcow2
==> kube2: -- INPUT: type=mouse, bus=ps2
==> kube2: Creating shared folders metadata...
==> kube2: Starting domain.
==> kube3: Creating shared folders metadata...
==> kube3: Starting domain.
==> kube2: Waiting for domain to get an IP address...
==> kube1: Creating shared folders metadata...
==> kube3: Waiting for domain to get an IP address...
==> kube1: Starting domain.
==> kube1: Waiting for domain to get an IP address...
==> kube1: Waiting for SSH to become available...
kube1:
kube1: Vagrant insecure key detected. Vagrant will automatically replace
kube1: this with a newly generated keypair for better security.
==> kube2: Waiting for SSH to become available...
kube1:
kube1: Inserting generated public key within guest...
==> kube3: Waiting for SSH to become available...
kube2:
kube2: Vagrant insecure key detected. Vagrant will automatically replace
kube2: this with a newly generated keypair for better security.
kube1: Removing insecure key from the guest if it's present...
kube3:
kube3: Vagrant insecure key detected. Vagrant will automatically replace
kube3: this with a newly generated keypair for better security.
kube2:
kube2: Inserting generated public key within guest...
kube1: Key inserted! Disconnecting and reconnecting using new SSH key...
kube3:
kube3: Inserting generated public key within guest...
kube2: Removing insecure key from the guest if it's present...
==> kube1: Setting hostname...
kube3: Removing insecure key from the guest if it's present...
kube2: Key inserted! Disconnecting and reconnecting using new SSH key...
==> kube1: Configuring and enabling network interfaces...
==> kube2: Setting hostname...
kube3: Key inserted! Disconnecting and reconnecting using new SSH key...
==> kube3: Setting hostname...
==> kube2: Configuring and enabling network interfaces...
kube1: SSH address: 192.168.121.100:22
==> kube3: Configuring and enabling network interfaces...
kube1: SSH username: vagrant
kube1: SSH auth method: private key
kube2: SSH address: 192.168.121.102:22
kube2: SSH username: vagrant
kube2: SSH auth method: private key
kube3: SSH address: 192.168.121.162:22
kube3: SSH username: vagrant
kube3: SSH auth method: private key
==> kube3: Running provisioner: ansible...
Vagrant has automatically selected the compatibility mode '2.0'
according to the Ansible version installed (2.6.4).
Alternatively, the compatibility mode can be specified in your Vagrantfile:
https://www.vagrantup.com/docs/provisioning/ansible_common.html#compatibility_mode
==> kube3: Vagrant has detected a host range pattern in the `groups` option.
==> kube3: Vagrant doesn't fully check the validity of these parameters!
==> kube3:
==> kube3: Please check https://docs.ansible.com/ansible/intro_inventory.html#hosts-and-groups
==> kube3: for more information.
kube3: Running ansible-playbook...
ERROR! no action detected in task. This often indicates a misspelled module name, or incorrect module path.
The error appears to have been in '/root/gcs-3/deploy2/kubespray/roles/vault/handlers/main.yml': line 44, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: unseal vault
^ here
==> kube3: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.
An error occurred while executing the action on the 'kube3'
machine. Please handle this error then try again:
Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.
atin@dhcp35-96:~/codebase/myrepo/gcs/deploy$ ./prepare.sh
vagrant and vagrant-libvirt were found.
For easier operation, ensure that libvirt has been configured to work without passwords.
Ref: https://developer.fedoraproject.org/tools/vagrant/vagrant-libvirt.html
Ensuring kubespray is present
Creating a python virtualenv gcs-venv
Installing requirements into gcs-venv
requests 2.20.1 has requirement idna<2.8,>=2.5, but you'll have idna 2.8 which is incompatible.
Virtualenv gcs-venv has been created
The virtualenv needs to be activated before doing any operations. Operations may fail if virtualenv is not activated.
To activate the virutalenv, run:
$ source gcs-venv/bin/activate
(gcs-venv) $
To deactivate an activated virtualenv, run:
(gcs-venv) $ deactivate
$
Note: The virtualenv should be activated for each shell session individually.
I'm not sure if this error message is benign or not as the environment seems to be created?
After cloning with submodules, running prepare.sh, and activating environment, I'm seeing the following errors after running vagrant up:
(gcs-venv) [lbailey@windslash deploy]$ vagrant up
Bringing machine 'kube1' up with 'libvirt' provider...
Bringing machine 'kube2' up with 'libvirt' provider...
Bringing machine 'kube3' up with 'libvirt' provider...
==> kube2: An error occurred. The error will be shown after all tasks complete.
==> kube3: An error occurred. The error will be shown after all tasks complete.
==> kube1: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.
An error occurred while executing the action on the 'kube1'
machine. Please handle this error then try again:
There are errors in the configuration of this machine. Please fix
the following errors and try again:
vagrant:
* The following settings shouldn't exist: plugins
An error occurred while executing the action on the 'kube2'
machine. Please handle this error then try again:
There are errors in the configuration of this machine. Please fix
the following errors and try again:
vagrant:
* The following settings shouldn't exist: plugins
An error occurred while executing the action on the 'kube3'
machine. Please handle this error then try again:
There are errors in the configuration of this machine. Please fix
the following errors and try again:
vagrant:
* The following settings shouldn't exist: plugins
Running on Fedora 28.
In gluster/glusterd2#1324, it looks like DNS isn't ready by the time gd2 tries to resolve its own hostname.
We should add an init container like etcd does to wait for DNS.
Example:
$ kubectl -n gcs get po/etcd-sv9sxbvm7j -oyaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
...
initContainers:
- command:
- /bin/sh
- -c
- "\n\t\t\t\t\twhile ( ! nslookup etcd-sv9sxbvm7j.etcd.gcs.svc )\n\t\t\t\t\tdo\n\t\t\t\t\t\tsleep
2\n\t\t\t\t\tdone"
image: busybox:1.28.0-glibc
imagePullPolicy: IfNotPresent
name: check-dns
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
...
If we delete the etcd pod manually, new etcd pod not getting created automatically. I think etcd operator pod not taking care of etcd pods.
Logs
[root@gluster-kube1-0 /]# glustercli peer list --endpoints=http://gluster-kube1-0.glusterd2.gcs:24007
+--------------------------------------+-----------------+-------------------------------------+-------------------------------------+--------+-----+
| ID | NAME | CLIENT ADDRESSES | PEER ADDRESSES | ONLINE | PID |
+--------------------------------------+-----------------+-------------------------------------+-------------------------------------+--------+-----+
| 0ec39f93-e53d-4c39-b190-451647e8a2ad | gluster-kube2-0 | gluster-kube2-0.glusterd2.gcs:24007 | gluster-kube2-0.glusterd2.gcs:24008 | yes | 21 |
| 4ae03846-4985-4f72-9893-9e9ab16339fa | gluster-kube1-0 | gluster-kube1-0.glusterd2.gcs:24007 | gluster-kube1-0.glusterd2.gcs:24008 | yes | 21 |
| 8c180e06-e0ce-4c42-9775-474d60f718f6 | gluster-kube3-0 | gluster-kube3-0.glusterd2.gcs:24007 | gluster-kube3-0.glusterd2.gcs:24008 | no | |
+--------------------------------------+-----------------+-------------------------------------+-------------------------------------+--------+-----+
[root@gluster-kube1-0 /]#
[root@gluster-kube1-0 /]# glustercli volume stop pvc-9560daae-db4a-11e8-8bfc-525400b677a0 --endpoints=http://gluster-kube1-0.glusterd2.gcs:24007
Volume stop failed
Response headers:
X-Gluster-Cluster-Id: 55d7f9c6-b040-4f03-805f-7a2f6fe0789c
X-Gluster-Peer-Id: 4ae03846-4985-4f72-9893-9e9ab16339fa
X-Request-Id: 0abf77ea-52f8-46f9-bd4a-a8b933b9ee85
Response body:
node 8c180e06-e0ce-4c42-9775-474d60f718f6 is probably down
[root@gluster-kube1-0 /]# glustercli peer list --endpoints=http://gluster-kube3-0.glusterd2.gcs:24007
+--------------------------------------+-----------------+-------------------------------------+-------------------------------------+--------+-----+
| ID | NAME | CLIENT ADDRESSES | PEER ADDRESSES | ONLINE | PID |
+--------------------------------------+-----------------+-------------------------------------+-------------------------------------+--------+-----+
| 0ec39f93-e53d-4c39-b190-451647e8a2ad | gluster-kube2-0 | gluster-kube2-0.glusterd2.gcs:24007 | gluster-kube2-0.glusterd2.gcs:24008 | yes | 21 |
| 4ae03846-4985-4f72-9893-9e9ab16339fa | gluster-kube1-0 | gluster-kube1-0.glusterd2.gcs:24007 | gluster-kube1-0.glusterd2.gcs:24008 | yes | 21 |
| 8c180e06-e0ce-4c42-9775-474d60f718f6 | gluster-kube3-0 | gluster-kube3-0.glusterd2.gcs:24007 | gluster-kube3-0.glusterd2.gcs:24008 | no | |
+--------------------------------------+-----------------+-------------------------------------+-------------------------------------+--------+-----+
during volume stop its showing node gluster-kube3-0
is down , but am able to access glusterd2 node gluster-kube3-0
as glusterd2 in gluster-kube3-0
is running but its not marking itself as online in ETCD, am not seeing any failure in glusterd2 logs
This is to track the work of creating the next several (cross-repo) milestones for GCS and ensuring issues are created and properly assigned to those milestones.
Repos involved:
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 0 1h
csi-nodeplugin-glusterfsplugin-4cvkd 2/2 Running 0 1h
csi-nodeplugin-glusterfsplugin-m9z9n 2/2 Running 0 1h
csi-nodeplugin-glusterfsplugin-wclwr 2/2 Running 0 1h
csi-provisioner-glusterfsplugin-0 2/2 Running 0 1h
etcd-chvm79wqr4 1/1 Running 0 1h
etcd-ndccs6pkq7 1/1 Running 0 1h
etcd-operator-54bbdfc55d-vkxh6 1/1 Running 0 1h
etcd-rrfgwq5xkd 1/1 Running 0 3m
kube1-0 1/1 Running 0 1h
kube2-0 1/1 Running 0 1h
kube3-0 1/1 Running 0 1h
[root@kube1-0 /]# glustercli volume status --endpoints="http://kube2-0.glusterd2.gcs:24007"
Volume : pvc-350277cfcd3111e8un/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/bri
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| 1e9711eb-6ab9-4381-8f37-4fe929ae8e36 | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick1/brick | true | 49152 | 53 |
| 044ae8e2-4dcb-45f1-9e17-7fa0b1b8084b | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick2/brick | true | 49152 | 53 |
| ddc9610e-4216-4d96-a4e1-5558703d2f1a | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick3/brick | true | 49152 | 53 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@kube1-0 /]# kill -9 53
[root@kube1-0 /]#
[root@kube1-0 /]# glustercli volume status --endpoints="http://kube2-0.glusterd2.
Volume : pvc-350277cfcd3111e8
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| 1e9711eb-6ab9-4381-8f37-4fe929ae8e36 | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick1/brick | true | 49152 | 53 |
| 044ae8e2-4dcb-45f1-9e17-7fa0b1b8084b | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick2/brick | false | 0 | 0 |
| ddc9610e-4216-4d96-a4e1-5558703d2f1a | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick3/brick | true | 49152 | 53 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@kube1-0 /]#
delete the app pods and the pvc
[vagrant@kube1 ~]$ [vagrant@kube1 ~]$ kubectl delete pod redis1 pod "redis1" deleted [vagrant@kube1 ~]$ [vagrant@kube1 ~]$ [vagrant@kube1 ~]$ [vagrant@kube1 ~]$ kubectl -n gcs -it exec kube1-0 -- /bin/bash [root@kube1-0 /]# glustercli volume status --endpoints="http://kube2-0.glusterd2.gcs:24007" Volume : pvc-350277cfcd3111e8 +--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+ | BRICK ID | HOST | PATH | ONLINE | PORT | PID | +--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+ | 1e9711eb-6ab9-4381-8f37-4fe929ae8e36 | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick1/brick | true | 49152 | 53 | | 044ae8e2-4dcb-45f1-9e17-7fa0b1b8084b | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick2/brick | false | 0 | 0 | | ddc9610e-4216-4d96-a4e1-5558703d2f1a | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick3/brick | true | 49152 | 53 | +--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+ [root@kube1-0 /]# [root@kube1-0 /]# exit [vagrant@kube1 ~]$ kubectl get pods No resources found. [vagrant@kube1 ~]$ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE gcs-pvc1 Bound pvc-350277cfcd3111e8 2Gi RWX glusterfs-csi 19m [vagrant@kube1 ~]$ kubectl delete pvc gcs-pvc1 persistentvolumeclaim "gcs-pvc1" deleted [vagrant@kube1 ~]$ kubectl get pvc No resources found. [vagrant@kube1 ~]$ kubectl -n gcs -it exec kube1-0 -- /bin/bash [root@kube1-0 /]# glustercli volume status --endpoints="http://kube2-0.glusterd2.gcs:24007" No volumes found [root@kube1-0 /]#
delete the gd2 pod and wait for a new pod to spin
[vagrant@kube1 ~]$ kubectl delete -n gcs pods kube1-0 --grace-period=0
pod "kube1-0" deleted
[root@kube1-0 /]# glustercli volume status --endpoints="http://kube3-0.glusterd2.gcs:24007"
Volume : pvc-044c90d4cd3411e8
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| 981ca2e3-e282-4073-b540-82a1bd849a5c | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick3/brick | true | 49152 | 175 |
| 22752a52-4be3-4ec2-beec-ad53290dfd3e | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick1/brick | true | 49152 | 173 |
| 81d97c58-4f76-4c10-bcb7-1d64c552e515 | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick2/brick | false | 0 | 0 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@kube1-0 /]#
Currently by default with vagrant script we are getting 3 node gcs setup (gd2 container cluster). Its not possible to extending the peers with current gcs setup.
But in OCS we are able extend the peers for "gluster container cluster" using heketi commands.
Same we should be able to do it on gcs setup.
The cluster fails to deploy using vagrant:-
TASK [GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready] **********
Tuesday 27 November 2018 15:44:42 +0530 (0:00:00.190) 1:08:21.441 ******
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (50 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (49 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (48 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (47 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (46 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (45 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (44 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (43 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (42 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (41 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (40 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (39 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (38 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (37 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (36 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (35 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (34 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (33 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (32 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (31 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (30 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (29 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (28 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (27 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (26 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (25 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (24 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (23 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (22 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (21 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (20 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (19 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (18 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (17 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (16 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (15 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (14 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (13 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (12 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (11 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (10 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (9 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (8 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (7 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (6 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (5 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (4 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (3 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (2 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (1 retries left).
fatal: [kube1]: FAILED! => {"attempts": 50, "changed": false, "connection": "close", "content_length": "542", "content_type": "application/json; charset=UTF-8", "cookies": {}, "cookies_string": "", "date": "Tue, 27 Nov 2018 10:23:48 GMT", "json": [{"client-addresses": ["gluster-kube1-0.glusterd2.gcs:24007"], "id": "f71b18e2-badb-44f9-a3cb-ebaab3ea7911", "metadata": {"_zone": "f71b18e2-badb-44f9-a3cb-ebaab3ea7911"}, "name": "gluster-kube1-0", "online": true, "peer-addresses": ["gluster-kube1-0.glusterd2.gcs:24008"], "pid": 26}, {"client-addresses": ["gluster-kube3-0.glusterd2.gcs:24007"], "id": "ff7a5a54-ff8e-49c4-8a09-ac7ee9955787", "metadata": {"_zone": "ff7a5a54-ff8e-49c4-8a09-ac7ee9955787"}, "name": "gluster-kube3-0", "online": true, "peer-addresses": ["gluster-kube3-0.glusterd2.gcs:24008"], "pid": 25}], "msg": "OK (542 bytes)", "redirected": false, "status": 200, "url": "http://10.233.19.39:24007/v1/peers", "x_gluster_cluster_id": "348d6571-88a2-4231-9100-98ef06ad9a9c", "x_gluster_peer_id": "ff7a5a54-ff8e-49c4-8a09-ac7ee9955787", "x_request_id": "7c157d53-b8ed-4551-8a55-977731c39b7d"}
to retry, use: --limit @/root/gcs-latest/gcs-latest-karan/deploy/vagrant-playbook.retry
PLAY RECAP *********************************************************************
kube1 : ok=377 changed=114 unreachable=0 failed=1
kube2 : ok=4 changed=4 unreachable=0 failed=1
kube3 : ok=254 changed=71 unreachable=0 failed=0
Tuesday 27 November 2018 15:53:48 +0530 (0:09:05.631) 1:17:27.072 ******
===============================================================================
download : container_download | Download containers if pull is required or told to always pull (all nodes) 2455.31s
GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready -------- 545.63s
GCS | ETCD Operator | Wait for etcd-operator to be available ---------- 433.74s
Install packages ------------------------------------------------------ 153.73s
GCS | ETCD Cluster | Wait for etcd-cluster to become ready ------------ 143.99s
download : container_download | Download containers if pull is required or told to always pull (all nodes) - 127.84s
download : container_download | Download containers if pull is required or told to always pull (all nodes) -- 63.79s
download : container_download | Download containers if pull is required or told to always pull (all nodes) -- 53.10s
download : file_download | Download item ------------------------------- 50.40s
download : container_download | Download containers if pull is required or told to always pull (all nodes) -- 49.67s
download : container_download | Download containers if pull is required or told to always pull (all nodes) -- 37.31s
kubernetes/master : Master | wait for the apiserver to be running ------ 30.71s
etcd : Gen_certs | Write etcd master certs ----------------------------- 18.82s
Wait for host to be available ------------------------------------------ 16.46s
download : container_download | Download containers if pull is required or told to always pull (all nodes) -- 15.67s
GCS Pre | Manifests | Sync GCS manifests ------------------------------- 12.62s
etcd : Configure | Check if etcd cluster is healthy -------------------- 11.26s
etcd : reload etcd ----------------------------------------------------- 10.70s
docker : Docker | pause while Docker restarts -------------------------- 10.23s
kubernetes/master : Master | wait for kube-controller-manager ----------- 9.94s
==> kube3: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.
An error occurred while executing the action on the 'kube3'
machine. Please handle this error then try again:
Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.
(gcs-venv) [root@rhsqa-virt05 deploy]#
(gcs-venv) [root@rhsqa-virt05 deploy]#
(gcs-venv) [root@rhsqa-virt05 deploy]#
(gcs-venv) [root@rhsqa-virt05 deploy]#
(gcs-venv) [root@rhsqa-virt05 deploy]#
(gcs-venv) [root@rhsqa-virt05 deploy]#
(gcs-venv) [root@rhsqa-virt05 deploy]#
(gcs-venv) [root@rhsqa-virt05 deploy]# vagrant ssh kube1
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME READY STATUS RESTARTS AGE
etcd-ngw4zkd6g8 1/1 Running 0 15m
etcd-nv9n225zsd 1/1 Running 0 13m
etcd-operator-7cb5bd459b-c6zn8 1/1 Running 0 23m
etcd-r68w6jxjrh 1/1 Running 0 14m
gluster-kube1-0 1/1 Running 0 13m
gluster-kube2-0 0/1 Pending 0 13m
gluster-kube3-0 1/1 Running 0 13m
[vagrant@kube1 ~]$
The CSI pods aren't even deployed. and gluster pods going in pending state.
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME READY STATUS RESTARTS AGE
csi-attacher-glusterfsplugin-0 2/2 Running 1 15m
csi-nodeplugin-glusterfsplugin-9kcc4 2/2 Running 0 15m
csi-nodeplugin-glusterfsplugin-b55mn 2/2 Running 0 15m
csi-nodeplugin-glusterfsplugin-vsv4n 2/2 Running 0 15m
csi-provisioner-glusterfsplugin-0 2/2 Running 0 15m
etcd-27pmwf7bh8 1/1 Running 0 20m
etcd-b9qhntzfp9 1/1 Running 0 21m
etcd-operator-7cb5bd459b-ls6rl 1/1 Running 0 23m
etcd-sk7sqgj6qw 1/1 Running 0 22m
gluster-kube1-0 1/1 Running 0 20m
gluster-kube2-0 1/1 Running 1 20m
gluster-kube3-0 1/1 Running 0 20m
[vagrant@kube1 ~]$ kubectl describe pod -n gcs gluster-kube2-0
Name: gluster-kube2-0
Namespace: gcs
Priority: 0
PriorityClassName: <none>
Node: kube2/192.168.121.98
Start Time: Tue, 23 Oct 2018 07:54:18 +0000
Labels: app.kubernetes.io/component=glusterfs
app.kubernetes.io/name=glusterd2
app.kubernetes.io/part-of=gcs
controller-revision-hash=gluster-kube2-5c4565d64b
statefulset.kubernetes.io/pod-name=gluster-kube2-0
Annotations: <none>
Status: Running
IP: 10.233.64.5
Controlled By: StatefulSet/gluster-kube2
Containers:
glusterd2:
Container ID: docker://8455de0d8753111237d8032dc6ae5821d5ba43e7ce833620c41278ebe3caa92f
Image: docker.io/gluster/glusterd2-nightly:20180920
Image ID: docker-pullable://docker.io/gluster/glusterd2-nightly@sha256:7013c3de3ed2c8b9c380c58b7c331dfc70df39fe13faea653b25034545971072
Port: <none>
Host Port: <none>
State: Running
Started: Tue, 23 Oct 2018 07:58:21 +0000
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Tue, 23 Oct 2018 07:54:48 +0000
Finished: Tue, 23 Oct 2018 07:58:21 +0000
Ready: True
Restart Count: 1
Liveness: http-get http://:24007/ping delay=10s timeout=1s period=60s #success=1 #failure=3
Environment:
GD2_ETCDENDPOINTS: http://etcd-client.gcs:2379
GD2_CLUSTER_ID: b8250aa0-14d7-4bcb-9d8f-c23b4b0d736b
GD2_CLIENTADDRESS: gluster-kube2-0.glusterd2.gcs:24007
GD2_PEERADDRESS: gluster-kube2-0.glusterd2.gcs:24008
GD2_RESTAUTH: false
Mounts:
/dev from gluster-dev (rw)
/run/lvm from gluster-lvm (rw)
/sys/fs/cgroup from gluster-cgroup (ro)
/usr/lib/modules from gluster-kmods (ro)
/var/lib/glusterd2 from glusterd2-statedir (rw)
/var/log/glusterd2 from glusterd2-logdir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-jhxz7 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
gluster-dev:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType:
gluster-cgroup:
Type: HostPath (bare host directory volume)
Path: /sys/fs/cgroup
HostPathType:
gluster-lvm:
Type: HostPath (bare host directory volume)
Path: /run/lvm
HostPathType:
gluster-kmods:
Type: HostPath (bare host directory volume)
Path: /usr/lib/modules
HostPathType:
glusterd2-statedir:
Type: HostPath (bare host directory volume)
Path: /var/lib/glusterd2
HostPathType: DirectoryOrCreate
glusterd2-logdir:
Type: HostPath (bare host directory volume)
Path: /var/log/glusterd2
HostPathType: DirectoryOrCreate
default-token-jhxz7:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-jhxz7
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 23m default-scheduler Successfully assigned gcs/gluster-kube2-0 to kube2
Normal Pulling 23m kubelet, kube2 pulling image "docker.io/gluster/glusterd2-nightly:20180920"
Normal Pulled 23m kubelet, kube2 Successfully pulled image "docker.io/gluster/glusterd2-nightly:20180920"
Warning Unhealthy 20m (x3 over 22m) kubelet, kube2 Liveness probe failed: Get http://10.233.64.5:24007/ping: dial tcp 10.233.64.5:24007: connect: connection refused
Normal Created 19m (x2 over 23m) kubelet, kube2 Created container
Normal Started 19m (x2 over 23m) kubelet, kube2 Started container
Normal Killing 19m kubelet, kube2 Killing container with id docker://glusterd2:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 19m kubelet, kube2 Container image "docker.io/gluster/glusterd2-nightly:20180920" already present on machine
[vagrant@kube1 ~]$
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.