Code Monkey home page Code Monkey logo

gcs's People

Contributors

amarts avatar aravindavk avatar cloudbehl avatar johnstrunk avatar kotreshhr avatar kshlm avatar madhu-1 avatar rishubhjain avatar saravanastoragenetwork avatar sidharthanup avatar thotz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gcs's Issues

failed to deploy gcs

Failed to Get peers in the cluster

Wednesday 17 October 2018  15:10:05 +0530 (0:00:00.864)       0:12:36.698 ***** 
changed: [kube1]

TASK [GCS | GD2 Cluster | Set gd2_client_endpoint] *****************************
Wednesday 17 October 2018  15:10:06 +0530 (0:00:00.533)       0:12:37.232 ***** 
ok: [kube1]

TASK [GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready] **********
Wednesday 17 October 2018  15:10:06 +0530 (0:00:00.206)       0:12:37.439 ***** 
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (50 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (49 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (48 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (47 retries left).
ok: [kube1]

TASK [GCS | GD2 Cluster | Get peers in cluster] ********************************
Wednesday 17 October 2018  15:10:52 +0530 (0:00:45.728)       0:13:23.168 ***** 
fatal: [kube1]: FAILED! => {"changed": false, "content": "", "msg": "Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://10.233.32.64:24007/v1/peers"}

one of the node running under less storage, cause to some etcd pods, csi pods, gd2 pods went to evicted state

-> Created one pvc
-> Mounted that pvc to 100 app pods,
-> app pods creation was taking so much time
-> I have observed for around 15 mints still most of the app pods were in pending state.
-> Then i have logged in to gcs setup after one day.
-> I have observed below status, out of 100 app pods which all are placed under kube1 node went to evicted state after one day. i am suspecting its because of less resource in kube1, if that is the case we have to increase the storage space for kube nodes.

-> I will try to repro on other setup.

NAME                                       READY   STATUS      RESTARTS   AGE
pod/csi-attacher-glusterfsplugin-0         2/2     Running     0          2d18h
pod/csi-nodeplugin-glusterfsplugin-4m9vj   0/2     Evicted     0          7m34s
pod/csi-nodeplugin-glusterfsplugin-clqcf   2/2     Running     0          2d18h
pod/csi-nodeplugin-glusterfsplugin-msn95   2/2     Running     0          2d18h
pod/csi-provisioner-glusterfsplugin-0      2/2     Running     0          2d18h
pod/etcd-2jb6scnxhr                        0/1     Evicted     0          20h
pod/etcd-896q4plsbx                        0/1     Error       0          20h
pod/etcd-jzw57nfncl                        0/1     Evicted     0          20h
pod/etcd-nbcpdpgx6k                        0/1     Evicted     0          20h
pod/etcd-operator-7cb5bd459b-cstkq         1/1     Running     0          2d18h
pod/etcd-xpqldh767q                        0/1     Completed   0          20h
pod/gluster-kube1-0                        0/1     Pending     0          20h
pod/gluster-kube2-0                        1/1     Running     224        2d18h
pod/gluster-kube3-0                        1/1     Running     0          2d18h

NAME                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)               AGE
service/etcd                        ClusterIP   None            <none>        2379/TCP,2380/TCP     2d18h
service/etcd-client                 ClusterIP   10.233.33.39    <none>        2379/TCP              2d18h
service/glusterd2                   ClusterIP   None            <none>        24007/TCP,24008/TCP   2d18h
service/glusterd2-client            ClusterIP   10.233.12.238   <none>        24007/TCP             2d18h
service/glusterd2-client-nodeport   NodePort    10.233.5.36     <none>        24007:31007/TCP       2d18h

NAME                                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/csi-nodeplugin-glusterfsplugin   3         3         2       3            2           <none>          2d18h

NAME                            DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/etcd-operator   1         1         1            1           2d18h

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/etcd-operator-7cb5bd459b   1         1         1       2d18h

NAME                                               DESIRED   CURRENT   AGE
statefulset.apps/csi-attacher-glusterfsplugin      1         1         2d18h
statefulset.apps/csi-provisioner-glusterfsplugin   1         1         2d18h
statefulset.apps/gluster-kube1                     1         1         2d18h
statefulset.apps/gluster-kube2                     1         1         2d18h
statefulset.apps/gluster-kube3                     1         1         2d18h```

```[vagrant@kube1 ~]$ df -h
Filesystem                 Size  Used Avail Use% Mounted on
/dev/mapper/atomicos-root  6.8G  6.3G  515M  93% /
devtmpfs                   1.9G     0  1.9G   0% /dev
tmpfs                      1.9G     0  1.9G   0% /dev/shm
tmpfs                      1.9G  4.0M  1.9G   1% /run
tmpfs                      1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/vda1                  297M   95M  202M  33% /boot
tmpfs                      379M     0  379M   0% /run/user/1000

Troubleshooting guide

Now that people are starting to try out GCS, it would be helpful to have a document that walks them through various troubleshooting steps in case they encounter problems.

Topics:

  • How to tell if all components are "up and running"
  • Diagnosing where in the stack problems could be (Is component X properly communication w/ component Y?)
  • How to view logs for various problem classes:
    • If you can't provision a volume (PVC pending), check...
    • If you can't start a pod using the volume, check...
  • What info to provide when filing a bug, and how to get it.

failed to bring up GCS

TASK [GCS | Prometheus Operator | Wait for the Prometheus Operator to become ready] ***
Tuesday 13 November 2018  11:09:55 +0530 (0:00:01.169)       0:20:22.186 ****** 
FAILED - RETRYING: GCS | Prometheus Operator | Wait for the Prometheus Operator to become ready (50 retries left).
FAILED - RETRYING: GCS | Prometheus Operator | Wait for the Prometheus Operator to become ready (49 retries left).
changed: [kube1]

TASK [GCS | Prometheus Objects | Deploy services, ServiceMonitor and Prometheus Object] ***
Tuesday 13 November 2018  11:10:17 +0530 (0:00:22.256)       0:20:44.443 ****** 
fatal: [kube1]: FAILED! => {"changed": false, "msg": "error running kubectl (/usr/local/bin/kubectl apply --force --filename=/tmp/gcs-manifestsIsZBxI/gcs-prometheus-bundle.yml) command (rc=1), out='serviceaccount/prometheus created\nclusterrole.rbac.authorization.k8s.io/prometheus created\nclusterrolebinding.rbac.authorization.k8s.io/prometheus created\nservice/prometheus created\n', err='unable to recognize \"/tmp/gcs-manifestsIsZBxI/gcs-prometheus-bundle.yml\": no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"\nunable to recognize \"/tmp/gcs-manifestsIsZBxI/gcs-prometheus-bundle.yml\": no matches for kind \"Prometheus\" in version \"monitoring.coreos.com/v1\"\n'"}
	to retry, use: --limit @/home/github.com/gluster/gcs/deploy/vagrant-playbook.retry

PLAY RECAP *********************************************************************

Plan for deploying gluster-block in GCS

We need to document how we plan to integrate gluster-block w/ GCS in order to identify gaps that need to be closed for GCS-1.0.

Gluster pods are no longer DaemonSets, nor using host networking. Once we switch to block PVs, we can remove the nodeAffitity that locks the pods to a specific node. My understanding is that movable pods will not work for gluster-block currently.

GD2 pod automatically reboots.

Steps performed:-

  1. Created the GCS cluster:-
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-45snb   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-pgp2w   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-s8g76   2/2     Running   0          2d21h
csi-provisioner-glusterfsplugin-0      2/2     Running   0          2d21h
etcd-4m7wv5fqk2                        1/1     Running   0          2d21h
etcd-6mf2nsl2p4                        1/1     Running   0          2d21h
etcd-lbmh9xjxm8                        1/1     Running   0          2d21h
etcd-operator-7cb5bd459b-tddxt         1/1     Running   0          2d21h
gluster-kube1-0                        1/1     Running   1          2d21h
gluster-kube2-0                        1/1     Running   0          2d21h
gluster-kube3-0                        1/1     Running   0          2d21h
  1. deleted the gd2 pod
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl delete pods -n gcs gluster-kube1-0 
pod "gluster-kube1-0" deleted

[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS              RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running             0          2d21h
csi-nodeplugin-glusterfsplugin-45snb   2/2     Running             0          2d21h
csi-nodeplugin-glusterfsplugin-pgp2w   2/2     Running             0          2d21h
csi-nodeplugin-glusterfsplugin-s8g76   2/2     Running             0          2d21h
csi-provisioner-glusterfsplugin-0      2/2     Running             0          2d21h
etcd-4m7wv5fqk2                        1/1     Running             0          2d21h
etcd-6mf2nsl2p4                        1/1     Running             0          2d21h
etcd-lbmh9xjxm8                        1/1     Running             0          2d21h
etcd-operator-7cb5bd459b-tddxt         1/1     Running             0          2d21h
gluster-kube1-0                        0/1     ContainerCreating   0          5s
gluster-kube2-0                        1/1     Running             0          2d21h
gluster-kube3-0                        1/1     Running             0          2d21h
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl get pods -n gcs 
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-45snb   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-pgp2w   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-s8g76   2/2     Running   0          2d21h
csi-provisioner-glusterfsplugin-0      2/2     Running   0          2d21h
etcd-4m7wv5fqk2                        1/1     Running   0          2d21h
etcd-6mf2nsl2p4                        1/1     Running   0          2d21h
etcd-lbmh9xjxm8                        1/1     Running   0          2d21h
etcd-operator-7cb5bd459b-tddxt         1/1     Running   0          2d21h
gluster-kube1-0                        1/1     Running   0          43s
gluster-kube2-0                        1/1     Running   0          2d21h
gluster-kube3-0                        1/1     Running   0          2d21h
[vagrant@kube1 ~]$ 
  1. executed commands on gd2 pod by log in by using same end points of kube1:-
command terminated with exit code 1
[vagrant@kube1 ~]$ kubectl get pods -n gcs -w
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-45snb   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-pgp2w   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-s8g76   2/2     Running   0          2d21h
csi-provisioner-glusterfsplugin-0      2/2     Running   0          2d21h
etcd-4m7wv5fqk2                        1/1     Running   0          2d21h
etcd-6mf2nsl2p4                        1/1     Running   0          2d21h
etcd-lbmh9xjxm8                        1/1     Running   0          2d21h
etcd-operator-7cb5bd459b-tddxt         1/1     Running   0          2d21h
gluster-kube1-0                        1/1     Running   0          2m52s
gluster-kube2-0                        1/1     Running   0          2d21h
gluster-kube3-0                        1/1     Running   0          2d21h
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$  kubectl -n gcs -it exec gluster-kube1-0 -- /bin/bash
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# glustercli peer list --endpoints="http://gluster-kube1-0.glusterd2.gcs:24007"
Failed to get Peers list

Failed to connect to glusterd. Please check if
- Glusterd is running(http://gluster-kube1-0.glusterd2.gcs:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# glustercli volume list --endpoints="http://gluster-kube1-0.glusterd2.gcs:24007"
Error getting volumes list

Failed to connect to glusterd. Please check if
- Glusterd is running(http://gluster-kube1-0.glusterd2.gcs:24007) and reachable from this node.
- Make sure Endpoints specified in the command is valid
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 

  1. now executed commands using kube2 and kube 3 as endpoints. logged in from same kube1 pod.
[root@gluster-kube1-0 /]# glustercli volume list --endpoints="http://gluster-kube2-0.glusterd2.gcs:24007"
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
|                  ID                  |         NAME         |   TYPE    |  STATE  | TRANSPORT | BRICKS |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
| 6cd58524-5172-4d9e-89ae-414bc338eba6 | pvc-f603ac47dcdc11e8 | Replicate | Started | tcp       | 3      |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# glustercli volume list --endpoints="http://gluster-kube3-0.glusterd2.gcs:24007"
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
|                  ID                  |         NAME         |   TYPE    |  STATE  | TRANSPORT | BRICKS |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
| 6cd58524-5172-4d9e-89ae-414bc338eba6 | pvc-f603ac47dcdc11e8 | Replicate | Started | tcp       | 3      |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
[root@gluster-kube1-0 /]# 

  1. while excuting more commands the pods automatically restarts:-
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# glustercli volume list -command terminated with exit code 137usterd2.gcs:24007"
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl get pods -n gcs -w
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-45snb   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-pgp2w   2/2     Running   0          2d21h
csi-nodeplugin-glusterfsplugin-s8g76   2/2     Running   0          2d21h
csi-provisioner-glusterfsplugin-0      2/2     Running   0          2d21h
etcd-4m7wv5fqk2                        1/1     Running   0          2d21h
etcd-6mf2nsl2p4                        1/1     Running   0          2d21h
etcd-lbmh9xjxm8                        1/1     Running   0          2d21h
etcd-operator-7cb5bd459b-tddxt         1/1     Running   0          2d21h
gluster-kube1-0                        1/1     Running   1          4m
gluster-kube2-0                        1/1     Running   0          2d21h
gluster-kube3-0                        1/1     Running   0          2d21h
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl describe pods -n gcs gluster-kube1-0
Name:               gluster-kube1-0
Namespace:          gcs
Priority:           0
PriorityClassName:  <none>
Node:               kube1/192.168.121.7
Start Time:         Fri, 02 Nov 2018 05:39:18 +0000
Labels:             app.kubernetes.io/component=glusterfs
                    app.kubernetes.io/name=glusterd2
                    app.kubernetes.io/part-of=gcs
                    controller-revision-hash=gluster-kube1-55bc79f94
                    statefulset.kubernetes.io/pod-name=gluster-kube1-0
Annotations:        <none>
Status:             Running
IP:                 10.233.64.7
Controlled By:      StatefulSet/gluster-kube1
Containers:
  glusterd2:
    Container ID:   docker://a261c3bcb84f993948b0691e199396109985d1bd9d547250476168cfd01a9520
    Image:          docker.io/gluster/glusterd2-nightly
    Image ID:       docker-pullable://docker.io/gluster/glusterd2-nightly@sha256:06e42f3354bff80a724007dbc5442349c3a53d31eceb935fd6b3776d6cdcb0fa
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Fri, 02 Nov 2018 05:43:08 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Fri, 02 Nov 2018 05:39:48 +0000
      Finished:     Fri, 02 Nov 2018 05:43:04 +0000
    Ready:          True
    Restart Count:  1
    Liveness:       http-get http://:24007/ping delay=10s timeout=1s period=60s #success=1 #failure=3
    Environment:
      GD2_ETCDENDPOINTS:  http://etcd-client.gcs:2379
      GD2_CLUSTER_ID:     27056e19-500a-4e7a-b5a9-71f461679196
      GD2_CLIENTADDRESS:  gluster-kube1-0.glusterd2.gcs:24007
      GD2_ENDPOINTS:      http://gluster-kube1-0.glusterd2.gcs:24007
      GD2_PEERADDRESS:    gluster-kube1-0.glusterd2.gcs:24008
      GD2_RESTAUTH:       false
    Mounts:
      /dev from gluster-dev (rw)
      /run/lvm from gluster-lvm (rw)
      /sys/fs/cgroup from gluster-cgroup (ro)
      /usr/lib/modules from gluster-kmods (ro)
      /var/lib/glusterd2 from glusterd2-statedir (rw)
      /var/log/glusterd2 from glusterd2-logdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8s2lg (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  gluster-dev:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  
  gluster-cgroup:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/cgroup
    HostPathType:  
  gluster-lvm:
    Type:          HostPath (bare host directory volume)
    Path:          /run/lvm
    HostPathType:  
  gluster-kmods:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/lib/modules
    HostPathType:  
  glusterd2-statedir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/glusterd2
    HostPathType:  DirectoryOrCreate
  glusterd2-logdir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/glusterd2
    HostPathType:  DirectoryOrCreate
  default-token-8s2lg:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8s2lg
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  30m                default-scheduler  Successfully assigned gcs/gluster-kube1-0 to kube1
  Warning  Unhealthy  27m (x3 over 29m)  kubelet, kube1     Liveness probe failed: Get http://10.233.64.7:24007/ping: dial tcp 10.233.64.7:24007: connect: connection refused
  Normal   Pulling    26m (x2 over 30m)  kubelet, kube1     pulling image "docker.io/gluster/glusterd2-nightly"
  Normal   Killing    26m                kubelet, kube1     Killing container with id docker://glusterd2:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled     26m (x2 over 30m)  kubelet, kube1     Successfully pulled image "docker.io/gluster/glusterd2-nightly"
  Normal   Created    26m (x2 over 30m)  kubelet, kube1     Created container
  Normal   Started    26m (x2 over 30m)  kubelet, kube1     Started container
[vagrant@kube1 ~]$ 

Deleted gd2 container status showing as offline in gluster peer status

Description:
Delete gd2 container from gcs setup, deleted gd2 container status showing as offline in gluster peer status.

How to repro:
-> Once gcs setup is ready, all peers will be in online state.
-> Delete any one gd2 container from gcs setup or reboot any worker node
-> GD2 container will be deleted and new GD2 container will get create automatically.
-> Then verify gluster peer status, newly created GD2 container showing status as online.
-> But deleted GD2 container status showing in gluster peer status as offline and if i am trying to remove that peer, not able to remove that peer as it is in offline state.

output:

+--------------------------------------+-------------------------+-------------------+-------------------+--------+-----+
|                  ID                  |          NAME           | CLIENT ADDRESSES  |  PEER ADDRESSES   | ONLINE | PID |
+--------------------------------------+-------------------------+-------------------+-------------------+--------+-----+
| 00b3b28e-f945-4f54-8b28-5d3af879716c | glusterd2-cluster-sqwrd | 127.0.0.1:24007   | 10.244.2.7:24008  | no     |     |
|                                      |                         | 10.244.2.7:24007  |                   |        |     |
| 125d50df-10fc-4b1d-97fb-ff0da39f5370 | glusterd2-cluster-59gjl | 127.0.0.1:24007   | 10.244.1.13:24008 | no     |     |
|                                      |                         | 10.244.1.13:24007 |                   |        |     |
| 1b099128-3374-4582-a692-1a94f69e874a | glusterd2-cluster-cbqbx | 127.0.0.1:24007   | 10.244.3.5:24008  | no     |     |
|                                      |                         | 10.244.3.5:24007  |                   |        |     |
| 2b6d33c7-64f7-4a65-a3b8-90dec34f39a2 | glusterd2-cluster-2qcgw | 127.0.0.1:24007   | 10.244.3.8:24008  | no     |     |
|                                      |                         | 10.244.3.8:24007  |                   |        |     |
| 56205100-47d2-4762-9706-c1094c2bff34 | glusterd2-cluster-4wkgd | 127.0.0.1:24007   | 10.244.1.12:24008 | no     |     |
|                                      |                         | 10.244.1.12:24007 |                   |        |     |
| 5e0eebab-3978-4a1d-b9e9-08bc7917a329 | glusterd2-cluster-sqwrd | 127.0.0.1:24007   | 10.244.2.13:24008 | yes    |  23 |
|                                      |                         | 10.244.2.13:24007 |                   |        |     |
| 7c217032-6c70-4964-8ec1-b25b006b4530 | glusterd2-cluster-59gjl | 127.0.0.1:24007   | 10.244.1.13:24008 | yes    |  23 |
|                                      |                         | 10.244.1.13:24007 |                   |        |     |
| 8dc69b03-d4f6-4b17-abea-4588da0ba844 | glusterd2-cluster-2qcgw | 127.0.0.1:24007   | 10.244.3.8:24008  | yes    |  24 |
|                                      |                         | 10.244.3.8:24007  |                   |        |     |
| 8f1df06d-131c-4827-9e01-0dc70e051901 | glusterd2-cluster-28tw6 | 127.0.0.1:24007   | 10.244.1.5:24008  | no     |     |
|                                      |                         | 10.244.1.5:24007  |                   |        |     |
| e6dffc73-4aa3-4f77-b999-341c10d5b126 | glusterd2-cluster-8gxnz | 127.0.0.1:24007   | 10.244.2.4:24008  | no     |     |
|                                      |                         | 10.244.2.4:24007  |                   |        |     |
+--------------------------------------+-------------------------+-------------------+-------------------+--------+-----+

use gluster-kubernetes to gcs deployment

currently, we are using kubespray for deployment of kubernetes
cons of using this are:

  • kubespray is maintained by different teams (if we face any issue we need to wait for the new release)
  • kubespray currently does not support latest released kubernetes version( Kube 1.13.0) we have to wait for them to release a support for Kube latest version

if we make use of current deployment scripts used for heketi-gd1 (https://github.com/gluster/gluster-kubernetes)
pros:

  • we are not dependent on the third-party module (kubespray) to deploy GCS
  • we already have some test cases in the gluster-kubernetes repo (we can make use of it)

@atinmu @kshlm @JohnStrunk @obnoxxx want to know your thoughts on this.

Unexpected volume behaviors

-> Create two PVCs (PVC1, PVC2)
-> Mounted two PVCs to one app pod and ran I/O's on mount point.
-> Again above two PVCs mounted to 3 replica controller app pods and ran I/O's on both the mount points.
-> Deleted one replica controller app pod, rc app pod came up automatically with same mount points and no data loss found.
-> Then tried delete PVC1 which is in mounted state, PVC1 status went to terminating state.
-> Now deleted all app pods. then PVC1 deleted successfully.
-> After that one of my worker node went some bad state don't know the reason, which all pods are placed on that worker node went to below state.

NAME                                   READY     STATUS     RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2       Running    0          3d
csi-nodeplugin-glusterfsplugin-btllk   2/2       Running    0          3d
csi-nodeplugin-glusterfsplugin-j6w8j   2/2       NodeLost   0          3d
csi-nodeplugin-glusterfsplugin-mvthq   2/2       Running    0          3d
csi-provisioner-glusterfsplugin-0      2/2       Unknown    0          3d
etcd-47vhqc75rl                        1/1       Running    0          1h
etcd-bvvmmb7kzn                        1/1       Unknown    0          4d
etcd-m4lskms5fb                        1/1       Running    0          4d
etcd-njxn5qsr7h                        1/1       Running    0          4d
etcd-operator-989bf8569-kctcd          1/1       Running    0          4d
glusterd2-cluster-2qcgw                1/1       Running    1          3d
glusterd2-cluster-59gjl                1/1       Running    1          3d
glusterd2-cluster-sqwrd                1/1       NodeLost   0          3d

-> Then i have rebooted the worker node, now node is up with proper condition, and which all pods are placed on this worker node came to running state.

NAME                                   READY     STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2       Running   0          3d
csi-nodeplugin-glusterfsplugin-btllk   2/2       Running   0          3d
csi-nodeplugin-glusterfsplugin-j6w8j   2/2       Running   2          3d
csi-nodeplugin-glusterfsplugin-mvthq   2/2       Running   0          3d
csi-provisioner-glusterfsplugin-0      2/2       Running   0          21m
etcd-47vhqc75rl                        1/1       Running   0          2h
etcd-m4lskms5fb                        1/1       Running   0          4d
etcd-njxn5qsr7h                        1/1       Running   0          4d
etcd-operator-989bf8569-kctcd          1/1       Running   0          4d
glusterd2-cluster-2qcgw                1/1       Running   1          3d
glusterd2-cluster-59gjl                1/1       Running   1          3d
glusterd2-cluster-sqwrd                1/1       Running   6          3d

-> Logged in to one of the gd2 pod and verified existing volume status. all bricks are in offline state.
-> Now verified PVC status

NAME                STATUS    VOLUME                 CAPACITY   ACCESS MODES   STORAGECLASS    AGE
glusterfs-csi-pv2   Bound     pvc-0f903cb5bfe111e8   2Gi        RWX            glusterfs-csi   1d

-> Then deleted PVC successfully.

persistentvolumeclaim "glusterfs-csi-pv2" deleted
No resources found.

-> Verified pv status, PV not deleted.

NAME                   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                       STORAGECLASS    REASON    AGE
pvc-0f903cb5bfe111e8   2Gi        RWX            Delete           Released   default/glusterfs-csi-pv2   glusterfs-csi             1d

-> Again logged into gd2 container, to verify volume exist or not .
-> volume is listing and volume state is STARTED state like below

+--------------------------------------+----------------------+-----------+---------+-----------+--------+
|                  ID                  |         NAME         |   TYPE    |  STATE  | TRANSPORT | BRICKS |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
| 9b92d49c-4480-487e-a307-786aea601af1 | pvc-0f903cb5bfe111e8 | Replicate | Started | tcp       | 3      |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+

-> Now verified volume status, volume status is showing with bricks offline.

Volume : pvc-0f903cb5bfe111e8
+--------------------------------------+-------------+---------------------------------------------------------------------+--------+------+-----+
|               BRICK ID               |    HOST     |                                PATH                                 | ONLINE | PORT | PID |
+--------------------------------------+-------------+---------------------------------------------------------------------+--------+------+-----+
| 80f62a24-f830-46a0-84b5-865bf1304fe3 | 10.244.2.7  | /var/run/glusterd2/bricks/pvc-0f903cb5bfe111e8/subvol1/brick1/brick | false  |    0 |   0 |
| a912e11f-6e61-490f-b7a9-227ec11299d3 | 10.244.1.13 | /var/run/glusterd2/bricks/pvc-0f903cb5bfe111e8/subvol1/brick2/brick | false  |    0 |   0 |
| a3aefbae-a473-4ed3-afc5-473b23e74986 | 10.244.3.8  | /var/run/glusterd2/bricks/pvc-0f903cb5bfe111e8/subvol1/brick3/brick | false  |    0 |   0 |
+--------------------------------------+-------------+---------------------------------------------------------------------+--------+------+-----+

Here i am observing two things:

  1. PVC is deleted but PV is not deleted from kubernetes.
  2. Volume list is showing that volume in started state but volume status is showing all bricks are in offline state from gd2 container.

vagrant up throws error - centos/atomic-host could not be found

deploy$ vagrant up

Bringing machine 'kube1' up with 'libvirt' provider...
Bringing machine 'kube2' up with 'libvirt' provider...
Bringing machine 'kube3' up with 'libvirt' provider...
==> kube3: An error occurred. The error will be shown after all tasks complete.
==> kube2: Box 'centos/atomic-host' could not be found. Attempting to find and install...
kube2: Box Provider: libvirt
kube2: Box Version: >= 0
==> kube1: Box 'centos/atomic-host' could not be found. Attempting to find and install...
kube1: Box Provider: libvirt
kube1: Box Version: >= 0
==> kube2: An error occurred. The error will be shown after all tasks complete.
==> kube1: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.

An error occurred while executing the action on the 'kube1'
machine. Please handle this error then try again:

The box 'centos/atomic-host' could not be found or
could not be accessed in the remote catalog. If this is a private
box on HashiCorp's Atlas, please verify you're logged in via
vagrant login. Also, please double-check the name. The expanded
URL and error message are shown below:

URL: ["https://atlas.hashicorp.com/centos/atomic-host"]
Error: The requested URL returned error: 404

An error occurred while executing the action on the 'kube2'
machine. Please handle this error then try again:

The box 'centos/atomic-host' could not be found or
could not be accessed in the remote catalog. If this is a private
box on HashiCorp's Atlas, please verify you're logged in via
vagrant login. Also, please double-check the name. The expanded
URL and error message are shown below:

URL: ["https://atlas.hashicorp.com/centos/atomic-host"]
Error: The requested URL returned error: 404

An error occurred while executing the action on the 'kube3'
machine. Please handle this error then try again:

There are errors in the configuration of this machine. Please fix
the following errors and try again:

ansible remote provisioner:

  • The following settings shouldn't exist: become

Update README with proper installation steps

Current README talks about gluster-k8s having installation script.. Provide Install section in readme, and then also detail out what the repo is about. (or explain first, and then give detail about installation etc.). I prefer to have table of content etc, so people can choose to jump ahead if they know what the project is.

Vagrant up failure

If we don't comment the below line in Vagrantfile, Vagrant up will fail.

Under gcs/deploy/Vagrantfile, if i don't comment this particular line "config.vagrant.plugins = ["vagrant-libvirt"]" vagrant up will fail.

File name : Vagrantfile
File path: gcs/deploy

After commenting this particular line "config.vagrant.plugins = ["vagrant-libvirt"]" vagrant up is success.

Failed to access glusterd2 pod after delete gd2 pod

steps to reproduce

  • delete one of the glusterd2 pod
  • login to the newly created pod
  • access glustercli commands

Logs

time="2018-10-29 08:58:56.982656" level=warning msg="tracing: One or more Jaeger endpoints not specified" jaegerAgentEndpoint= jaegerEndpoint= source="[tracing.go:40:tracing.InitJaegerExporter]"
time="2018-10-29 08:58:56.985588" level=fatal msg="failed to create gd2-muxsrv listener" error="listen tcp: lookup gluster-kube1-0.glusterd2.gcs on 10.233.0.3:53: no such host" source="[server.go:24:muxsrv.newMuxSrv]"

Looks like kubernetes networking issue

After reboot of GD2 pod, rebooted pod brick went to offline

  1. Create pvc
  2. Login to GD2 pod which is having bricks.
  3. perfrom "reboot" command
  4. After reboot, rebooted gd2 pod bricks went to offline, those bricks not coming back to online.

output:

+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 3d1c57ee-8072-44a6-912e-8df32bc79ac2 | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick1/brick | false  |     0 |   0 |
| 49b4d49e-1c4d-4f26-9977-b1c181b89f55 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick2/brick | true   | 43326 | 511 |
| 1ce6115b-f4c3-4d49-94f0-edb1edc13d58 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick3/brick | true   | 43251 |  65 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+

Log:

time="2018-10-31 09:13:33.990278" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick1/brick error="SearchByBrickPath: port for brick /var/run/glusterd2/bricks/pvc-d3006e55dce511e8/subvol1/brick1/brick not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"
time="2018-10-31 09:13:33.990778" level=info msg="client disconnected" address="10.233.65.9:977" server=sunrpc source="[server.go:109:sunrpc.(*SunRPC).pruneConn]"
~

One of the etcd container got deleted automatically, two of the etcd container status went to completed state

  1. Create two pvcs, mounted one pvc to app pod.
  2. Ran I/O's on app pod
  3. Then delete the volume from GD2 container, volume deleted successfully
  4. Then try delete the pvc from kubernetes master node.
  5. PVC status went to terminating state.
  6. Then deleted the app pod where pvc is mounted.
  7. Afterwards PVC deleted successfully.
  8. Next day when i logged in to setup. Setup went to below state.
NAME                                   READY     STATUS             RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2       Running            0          1d
csi-nodeplugin-glusterfsplugin-6lmb5   2/2       Running            0          1d
csi-nodeplugin-glusterfsplugin-sjgjb   2/2       Running            0          1d
csi-nodeplugin-glusterfsplugin-t6vhd   2/2       Running            0          1d
csi-provisioner-glusterfsplugin-0      2/2       Running            0          1d
etcd-hp5qsghwnk                        0/1       Completed          0          1d
etcd-operator-989bf8569-d9fpt          1/1       Running            1          1d
etcd-rz76vs6p77                        0/1       Completed          0          1d
glusterd2-cluster-px8qh                0/1       CrashLoopBackOff   219        1d
glusterd2-cluster-sq2q7                0/1       CrashLoopBackOff   219        1d
glusterd2-cluster-wqj4l                1/1       Running            218        1d
  1. It seems like one of the etcd container deleted, i am able to see only two etcd ocntainers which are in "completed state" and one etcd operator container which is in running sate.

  2. GD2 containers keep on restarting. i think this because of etcd. Able to see below errors from glusted2.log

time="2018-10-05 14:03:20.249447" level=warning msg="could not read store config file, continuing with defaults" error="open /var/lib/glusterd2/store.toml: no such file or directory" source="[config.go:128:store.GetConfig]"
time="2018-10-05 14:03:25.250568" level=error msg="failed to start embedded store" error="context deadline exceeded" source="[embed.go:36:store.newEmbedStore]"
time="2018-10-05 14:03:25.250669" level=fatal msg="Failed to initialize store (etcd client)" error="context deadline exceeded" source="[main.go:101:main.main]"
~

Tracker: All logs to container stdout

Description

All logs generated by GCS components must go to the container stdout so they can be picked up by the cluster logging infrastructure.

New GCS components such as gd2 and csi drivers are already sending their logs appropriately, so this item is targeted mainly at older gluster components. Specifically:

  • FUSE mount logs
  • Brick logs
  • self-heal daemon
  • (any other process that normally writes into /var/log/gluster)

The current proposal is to run a sidecar container that contains rsyslog and use that to collect the logs and present them to stdout, most likely with some format modifications such that individual log streams are discernible. The majority (all?) of the above items have the ability to send to rsyslog.

Once this is implemented, we should no longer be writing log files to storage (ephemeral or otherwise) within a GCS related pod. All logging should go through the cluster logging infrastructure by way of container output.

Related items

(As we create individual issues to track work, they can be added here)

after the cluster is restarted the ETCD pods goes in ERROR state

cold reset of the cluster leads to ETCD pods going in ERROR state.

  1. create a GCS cluster
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          20h
csi-nodeplugin-glusterfsplugin-dhzbh   2/2     Running   0          20h
csi-nodeplugin-glusterfsplugin-l54d4   2/2     Running   0          20h
csi-nodeplugin-glusterfsplugin-ww55c   2/2     Running   0          20h
csi-provisioner-glusterfsplugin-0      3/3     Running   0          20h
etcd-64t8sjpxvw                        1/1     Running   0          20h
etcd-bg9zcfvbl2                        1/1     Running   0          20h
etcd-operator-7cb5bd459b-rlwqx         1/1     Running   0          20h
etcd-q9skwdlmmb                        1/1     Running   0          20h
gluster-kube1-0                        1/1     Running   1          20h
gluster-kube2-0                        1/1     Running   1          20h
gluster-kube3-0                        1/1     Running   1          20h
[vagrant@kube1 ~]$ 
  1. Cold reset the cluster
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   2          20h
csi-nodeplugin-glusterfsplugin-dhzbh   2/2     Running   2          20h
csi-nodeplugin-glusterfsplugin-l54d4   2/2     Running   2          20h
csi-nodeplugin-glusterfsplugin-ww55c   2/2     Running   2          20h
csi-provisioner-glusterfsplugin-0      3/3     Running   4          20h
etcd-64t8sjpxvw                        0/1     Error     0          20h
etcd-bg9zcfvbl2                        0/1     Error     0          20h
etcd-operator-7cb5bd459b-rlwqx         1/1     Running   1          20h
etcd-q9skwdlmmb                        0/1     Error     0          20h
gluster-kube1-0                        1/1     Running   2          20h
gluster-kube2-0                        1/1     Running   2          20h
gluster-kube3-0                        1/1     Running   2          20h
[vagrant@kube1 ~]$ 
  1. The other pods in GCS name space comes back in running state. But the etcd pods goes in ERROR state and unable to recover.

Gd2 pod names are kube1-0, kube2-0, kube3-0

I have created gcs setup with latest repo using vagrant. i am seeing gd2 pod names like kube1-0, kube2-0, kube3-0. This will create little bit confusion instead of that better to start gd2 pod names with 'gluster'.

app deployment from outside the cluster

In the section app-deployment command is executed in gcs-venv.
This is not possible unless kubeconfig is copied.

Update the doc. such that
either login to master and run the app + gcs backed volume deployment
or
login to master and carry out app deployment.

consider using Embeded ETCD in glusterd2

Currently, we are using external ETCD in GCS, if kubernetes nodes get rebooted, all the pods will get restarted.
due to pod restart, if the ETCD pods in the clusters lose the quorum, ETCD operator will not be able to maintain the ETCD cluster and the ETCD pods won't come up automatically

ETCD operator issue: coreos/etcd-operator#1972

Pods after before node restart

[vagrant@kube1 ~]$ kubectl get po -ngcs
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          5m
csi-nodeplugin-glusterfsplugin-6mfkz   2/2     Running   0          4m59s
csi-nodeplugin-glusterfsplugin-9894b   2/2     Running   0          4m59s
csi-nodeplugin-glusterfsplugin-kg47n   2/2     Running   0          4m59s
csi-provisioner-glusterfsplugin-0      2/2     Running   0          4m59s
etcd-operator-7cb5bd459b-cvzpj         1/1     Running   0          13m
etcd-pdrj27zbz6                        1/1     Running   0          11m
etcd-qnfdg7m4vm                        1/1     Running   0          12m
etcd-skblg6rl8m                        1/1     Running   0          11m
gluster-kube1-0                        1/1     Running   0          10m
gluster-kube2-0                        1/1     Running   0          10m
gluster-kube3-0                        1/1     Running   0          10m

Pods status After node reboot

[vagrant@kube1 ~]$ kubectl get po -ngcs
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   2          8m17s
csi-nodeplugin-glusterfsplugin-6mfkz   2/2     Running   2          8m16s
csi-nodeplugin-glusterfsplugin-9894b   2/2     Running   2          8m16s
csi-nodeplugin-glusterfsplugin-kg47n   2/2     Running   2          8m16s
csi-provisioner-glusterfsplugin-0      2/2     Running   2          8m16s
etcd-operator-7cb5bd459b-cvzpj         1/1     Running   1          16m
etcd-pdrj27zbz6                        0/1     Error     0          14m
etcd-qnfdg7m4vm                        0/1     Error     0          15m
etcd-skblg6rl8m                        0/1     Error     0          15m
gluster-kube1-0                        1/1     Running   1          14m
gluster-kube2-0                        1/1     Running   1          14m
gluster-kube3-0                        1/1     Running   1          14m

Logs from ETCD operator

time="2018-11-05T05:20:45Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:20:53Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:01Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:09Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:17Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:25Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:33Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:41Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:49Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:21:57Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:05Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:13Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:21Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:29Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:37Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:45Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:22:53Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:23:01Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster
time="2018-11-05T05:23:09Z" level=warning msg="all etcd pods are dead." cluster-name=etcd pkg=cluster

Skeleton E2E

As a first step at getting an e2e suite for GCS, we should have an automated job that (nightly?):

  • Brings up a kube cluster using this repo
  • Deploys GCS on that cluster
  • Verifies that the expected pods started
    • gluster (x3)
    • csi for gluster file (attacher, provisioner, node x3)

Sometimes prepare.sh script is failing

  1. git clone https://github.com/gluster/gcs gcs-8
  2. Under gcs-8 directory, create one more directory 'dir1'
  3. Then copy data from deploy directory, cp -r deploy/* dir1/
  4. Then run ./prepare.sh
[root@rhsqa-virt05 rajesh-76]# ./prepare.sh 
vagrant and vagrant-libvirt were found.
For easier operation, ensure that libvirt has been configured to work without passwords.
Ref: https://developer.fedoraproject.org/tools/vagrant/vagrant-libvirt.html

Ensuring kubespray is present
Creating a python virtualenv gcs-venv
Installing requirements into gcs-venv
 **Could not open requirements file: [Errno 2] No such file or directory: './kubespray/requirements.txt'**

Virtualenv gcs-venv has been created
The virtualenv needs to be activated before doing any operations. Operations may fail if virtualenv is not activated.

To activate the virutalenv, run:
$ source gcs-venv/bin/activate
(gcs-venv) $

To deactivate an activated virtualenv, run:
(gcs-venv) $ deactivate
$ 

Note: The virtualenv should be activated for each shell session individually.

Mostly i found that once repo is cloned , if we copy data from deploy directory to new directory for first time then only we are facing this issue, second time onwards we are not facing this issue.

Vagrant-based origin/OKD cluster

We already have the ability to create a kubernetes cluster.
We should also have a simple way of bringing up & deploying GCS on an OpenShift Origin, aka OKD, cluster. This will help us hammer out any unexpected differences between the two.

Failed to deploy on OCP 3.11

  1. Need scc setup of hostpath plugin: Followed this doc to work it around.

  2. the opc cluster is on aws: the node name is like ip-172-31-10-185.us-west-2.compute.internal which causes problem for the statefulset. Created bz1643191

To workaround this issue: Use a simpler name for the statefulset

# git diff deploy-gcs.yml tasks/create-gd2-manifests.yml templates/gcs-manifests/gcs-gd2.yml.j2
diff --git a/deploy/deploy-gcs.yml b/deploy/deploy-gcs.yml
index 5efd085..fc0bac1 100644
--- a/deploy/deploy-gcs.yml
+++ b/deploy/deploy-gcs.yml
@@ -42,8 +42,10 @@
 
         - name: GCS Pre | Manifests | Create GD2 manifests
           include_tasks: tasks/create-gd2-manifests.yml
           loop: "{{ groups['gcs-node'] }}"
           loop_control:
+            index_var: index
             loop_var: gcs_node
 
   post_tasks:
diff --git a/deploy/tasks/create-gd2-manifests.yml b/deploy/tasks/create-gd2-manifests.yml
index d9a2d2d..4c015ef 100644
--- a/deploy/tasks/create-gd2-manifests.yml
+++ b/deploy/tasks/create-gd2-manifests.yml
@@ -3,6 +3,7 @@
 - name: GCS Pre | Manifests | Create GD2 manifests for {{ gcs_node }} | Set fact kube_hostname
   set_fact:
     kube_hostname: "{{ gcs_node }}"
+    gcs_node_index: "{{ index }}"
 
 - name: GCS Pre | Manifests | Create GD2 manifests for {{ gcs_node }} | Create gcs-gd2-{{ gcs_node }}.yml
   template:
diff --git a/deploy/templates/gcs-manifests/gcs-gd2.yml.j2 b/deploy/templates/gcs-manifests/gcs-gd2.yml.j2
index fe48b35..3376b11 100644
--- a/deploy/templates/gcs-manifests/gcs-gd2.yml.j2
+++ b/deploy/templates/gcs-manifests/gcs-gd2.yml.j2
@@ -2,7 +2,7 @@
 kind: StatefulSet
 apiVersion: apps/v1
 metadata:
-  name: gluster-{{ kube_hostname }}
+  name: gluster-{{ gcs_node_index }}
   namespace: {{ gcs_namespace }}
   labels:
     app.kubernetes.io/part-of: gcs

  1. Then failed on Wait for glusterd2-cluster to become ready
TASK [GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready] **********************************************************
Thursday 25 October 2018  18:39:50 +0000 (0:00:00.083)       0:00:54.274 ****** 
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (50 retries left).

# oc get pod
NAME                             READY     STATUS    RESTARTS   AGE
etcd-6jmbmv6sw7                  1/1       Running   0          23m
etcd-mvwq6c2w6f                  1/1       Running   0          23m
etcd-n92rtb9wfr                  1/1       Running   0          23m
etcd-operator-54bbdfc55d-mdvd9   1/1       Running   0          24m
gluster-0-0                      1/1       Running   7          23m
gluster-1-0                      1/1       Running   7          23m
gluster-2-0                      1/1       Running   7          23m

# oc describe pod  gluster-1-0
Name:               gluster-1-0
Namespace:          gcs
Priority:           0
PriorityClassName:  <none>
Node:               ip-172-31-59-125.us-west-2.compute.internal/172.31.59.125
Start Time:         Thu, 25 Oct 2018 18:39:49 +0000
Labels:             app.kubernetes.io/component=glusterfs
                    app.kubernetes.io/name=glusterd2
                    app.kubernetes.io/part-of=gcs
                    controller-revision-hash=gluster-1-598d756667
                    statefulset.kubernetes.io/pod-name=gluster-1-0
Annotations:        openshift.io/scc=hostpath
Status:             Running
IP:                 172.21.0.15
Controlled By:      StatefulSet/gluster-1
Containers:
  glusterd2:
    Container ID:   docker://0433446ecbd7a25d5aa9f51f0bd5c3226090850b18d0e63d58d07e47c6fdd039
    Image:          docker.io/gluster/glusterd2-nightly:20180920
    Image ID:       docker-pullable://docker.io/gluster/glusterd2-nightly@sha256:7013c3de3ed2c8b9c380c58b7c331dfc70df39fe13faea653b25034545971072
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Thu, 25 Oct 2018 19:00:48 +0000
      Finished:     Thu, 25 Oct 2018 19:03:48 +0000
    Ready:          False
    Restart Count:  7
    Liveness:       http-get http://:24007/ping delay=10s timeout=1s period=60s #success=1 #failure=3
    Environment:
      GD2_ETCDENDPOINTS:  http://etcd-client.gcs:2379
      GD2_CLUSTER_ID:     dd68cd6b-b828-4c13-86a4-35c492b5d4c2
      GD2_CLIENTADDRESS:  gluster-ip-172-31-59-125.us-west-2.compute.internal-0.glusterd2.gcs:24007
      GD2_PEERADDRESS:    gluster-ip-172-31-59-125.us-west-2.compute.internal-0.glusterd2.gcs:24008
      GD2_RESTAUTH:       false
    Mounts:
      /dev from gluster-dev (rw)
      /run/lvm from gluster-lvm (rw)
      /sys/fs/cgroup from gluster-cgroup (ro)
      /usr/lib/modules from gluster-kmods (ro)
      /var/lib/glusterd2 from glusterd2-statedir (rw)
      /var/log/glusterd2 from glusterd2-logdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hvj7w (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  gluster-dev:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  
  gluster-cgroup:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/cgroup
    HostPathType:  
  gluster-lvm:
    Type:          HostPath (bare host directory volume)
    Path:          /run/lvm
    HostPathType:  
  gluster-kmods:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/lib/modules
    HostPathType:  
  glusterd2-statedir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/glusterd2
    HostPathType:  DirectoryOrCreate
  glusterd2-logdir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/glusterd2
    HostPathType:  DirectoryOrCreate
  default-token-hvj7w:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hvj7w
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>
Events:
  Type     Reason     Age                From                                                  Message
  ----     ------     ----               ----                                                  -------
  Normal   Scheduled  25m                default-scheduler                                     Successfully assigned gcs/gluster-1-0 to ip-172-31-59-125.us-west-2.compute.internal
  Normal   Pulling    25m                kubelet, ip-172-31-59-125.us-west-2.compute.internal  pulling image "docker.io/gluster/glusterd2-nightly:20180920"
  Normal   Pulled     24m                kubelet, ip-172-31-59-125.us-west-2.compute.internal  Successfully pulled image "docker.io/gluster/glusterd2-nightly:20180920"
  Normal   Created    16m (x4 over 24m)  kubelet, ip-172-31-59-125.us-west-2.compute.internal  Created container
  Normal   Started    16m (x4 over 24m)  kubelet, ip-172-31-59-125.us-west-2.compute.internal  Started container
  Normal   Killing    16m (x3 over 22m)  kubelet, ip-172-31-59-125.us-west-2.compute.internal  Killing container with id docker://glusterd2:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled     16m (x3 over 22m)  kubelet, ip-172-31-59-125.us-west-2.compute.internal  Container image "docker.io/gluster/glusterd2-nightly:20180920" already present on machine
  Warning  Unhealthy  4m (x21 over 24m)  kubelet, ip-172-31-59-125.us-west-2.compute.internal  Liveness probe failed: Get http://172.21.0.15:24007/ping: dial tcp 172.21.0.15:24007: connect: connection refused


Options to consume minikube

The PR #6 adds support to kubespray based k8s install. Would like to know what is the major difference between minikube and this, and what it takes to support minikube in future.

Helm chart to deploy gcs

I've been looking into building a helm chart to deploy gcs.

My hope is to break the dependency on Ansible so it's easier to get it running on clusters that aren't started with the kubespray templates in this repo. That should make it easier for people to try out on their personal clusters.

I will update this issue as things progress. It anyone else is interested, I'm happy to collaborate.

loopback 'block' CSI for GCS users

I am aware of many discussions where we did consider using 'loopback' devices (losetup) as a block device, where it just uses a file on glusterfs as backend.

This has both challenges and benefits. Happy to discuss this further as experimental option for GCS.

People with some thoughts on this, please share your opinion, observation, and requirements, so we can collect it, and see if we can come out with some design which is agreed from everyone!

@JohnStrunk @obnoxxx @raghavendra-talur @vbellur @pkalever @pranithk @ShyamsundarR @jarrpa @phlogistonjohn @humblec @lxbsz @aravindavk @Madhu-1 @atinmu @poornimag

RFE: Initial volume of Gluster in GCS

We should go with replica 3 (1x3) volume type, and below config in my opinion in GCS. We will start with bare-minimum volume options, and then add features along the way.

For mvp0:

Brick Graph:

posix
access-control
locks
upcall
io-threads
selinux # do we need it?
index
io-stats
server

Client graph: (Goal, all applications should run, performance next).

client
replica
dht
io-stats

I will start another issue with Options available in each of these translators, and we have to agree to the default options on each of these.

@gluster/gluster-all

heal info fails when one brick is killed in a volume

steps to reproduce:-

  1. create a PVC
  2. check the volume status & heal info when all bricks are up
[root@gluster-kube1-0 /]# glustercli volume status --endpoints=http://gluster-kube3-0.glusterd2.gcs:24007
Volume : pvc-f603ac47dcdc11e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| ae28c880-6067-4a3f-adef-ecf20219066f | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick1/brick | true   | 40314 | 114 |
| 3afb1083-9115-4ad5-b23d-2a08b32dd8cb | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick2/brick | true   | 42432 | 300 |
| e41c0d0f-95d9-4139-866a-ec73aacabe3a | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick3/brick | true   | 43123 | 114 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# glustercli volume heal info pvc-f603ac47dcdc11e8  --endpoints=http://gluster-kube3-0.glusterd2.gcs:24007
Brick: gluster-kube3-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick1/brick
Status: Connected
entries: 0

Brick: gluster-kube1-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick2/brick
Status: Connected
entries: 0

Brick: gluster-kube2-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick3/brick
Status: Connected
entries: 0

[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# 

  1. Kill one brick from the volume
[root@gluster-kube1-0 /]# kill -9 300    
  1. Check the status of volume and heal info
[root@gluster-kube1-0 /]# glustercli volume status --endpoints=http://gluster-kube3-0.glusterd2.gcs:24007
Volume : pvc-f603ac47dcdc11e8
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |             HOST              |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+
| 3afb1083-9115-4ad5-b23d-2a08b32dd8cb | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick2/brick | false  |     0 |   0 |
| e41c0d0f-95d9-4139-866a-ec73aacabe3a | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick3/brick | true   | 43123 | 114 |
| ae28c880-6067-4a3f-adef-ecf20219066f | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-f603ac47dcdc11e8/subvol1/brick1/brick | true   | 40314 | 114 |
+--------------------------------------+-------------------------------+---------------------------------------------------------------------+--------+-------+-----+

[root@gluster-kube1-0 /]# glustercli volume heal info pvc-f603ac47dcdc11e8  --endpoints=http://gluster-kube3-0.glusterd2.gcs:24007
Failed to get heal info for volume pvc-f603ac47dcdc11e8


Response headers:
X-Gluster-Cluster-Id: 27056e19-500a-4e7a-b5a9-71f461679196
X-Gluster-Peer-Id: 6a4f1154-aaf0-41ed-b899-6a22f98bfef4
X-Request-Id: 4b6319ec-ac9c-4af0-9f93-5f1adf491d75

Response body:
strconv.ParseInt: parsing "-": invalid syntax

glustercli commands are failing on gd2 containers

  1. Created gcs setup with latest repo.using vagrant
  2. once setup is up, all pods are running successfully
  3. Then i have logged in to one of the gd2 container.
  4. If i try to perform glustercli commands failing with below errors.

[root@kube1-0 /]# glustercli peer status
Failed to get Peers list

Failed to connect to glusterd. Please check if

  • Glusterd is running(http://127.0.0.1:24007) and reachable from this node.
  • Make sure Endpoints specified in the command is valid
  1. Actually glusterd2 services are running on gd2 container.
โ— glusterd2.service - GlusterD2, the management service for GlusterFS (pre-release)
   Loaded: loaded (/usr/lib/systemd/system/glusterd2.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/glusterd2.service.d
           โ””โ”€override.conf
   Active: active (running) since Tue 2018-10-09 13:07:30 UTC; 31min ago
 Main PID: 21 (glusterd2)
   CGroup: /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podc2de0d59_cbc3_11e8_8128_525400f652d0.slice/docker-a1f24ec1df72f23c60ddce138de88f38fe484febdcf3a41f710ac7461a88936a.scope/system.slice/glusterd2.service
           โ””โ”€21 /usr/sbin/glusterd2 --config=/etc/glusterd2/glusterd2.toml

Oct 09 13:07:30 kube1-0 systemd[1]: Started GlusterD2, the management service for GlusterFS (pre-release).
Oct 09 13:07:30 kube1-0 systemd[1]: Starting GlusterD2, the management service for GlusterFS (pre-release)...

Some times vagrant up is failing

Vagrant up is failing with below error, it's not reproducible every time. Here my observation is that, with out running prepare.sh on base deploy directory. Copy data from deploy directory to new directory, run prepare.sh on new directory then if we perform "vagrant up" , at the end vagrant up is failing below error.

==> kube3: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.

An error occurred while executing the action on the 'kube3'
machine. Please handle this error then try again:

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

GD2 volume not deleted gracefully

Continuation with #24
The out put in the gd2 pod showing two volume with same name

[root@kube1-0 /]# glustercli volume status --endpoints="http://kube1-0.glusterd2.gcs:24007"
Volume : pvc-044c90d4cd3411e8
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+------+-----+
|               BRICK ID               |         HOST          |                                PATH                                 | ONLINE | PORT | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+------+-----+
| 22752a52-4be3-4ec2-beec-ad53290dfd3e | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick1/brick | false  |    0 |   0 |
| 81d97c58-4f76-4c10-bcb7-1d64c552e515 | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick2/brick | false  |    0 |   0 |
| 981ca2e3-e282-4073-b540-82a1bd849a5c | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick3/brick | false  |    0 |   0 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+------+-----+
Volume : pvc-b7ef1f27cd3611e8
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |         HOST          |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| 318151a7-6b5d-41d4-ae77-78d7073bda4f | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-b7ef1f27cd3611e8/subvol1/brick2/brick | true   | 49152 |  66 |
| 36c6fa77-1cb9-4140-bcea-06a5e0d50e72 | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-b7ef1f27cd3611e8/subvol1/brick3/brick | true   | 49152 | 305 |
| 7c6a443b-dedf-4001-837d-10c482e12043 | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-b7ef1f27cd3611e8/subvol1/brick1/brick | true   | 49152 | 306 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+

Where as the one volume showing pid as null/0 .

  1. The kubectl showing only one PVC
[vagrant@kube1 ~]$ kubectl get pvc
NAME       STATUS    VOLUME                 CAPACITY   ACCESS MODES   STORAGECLASS    AGE
gcs-pvc1   Bound     pvc-b7ef1f27cd3611e8   2Gi        RWX            glusterfs-csi   11m
[vagrant@kube1 ~]$ 

issue2425.log

Glusterd2 & etcd pod crash with OOM, with multiple pvc in creation

  1. Created 20 pvc of 2 GB using script.
[vagrant@kube1 ~]$ kubectl get pvc
NAME        STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
gcs-pvc0    Bound     pvc-0908fe34-f46c-11e8-be25-52540049d944   2Gi        RWX            glusterfs-csi   46s
gcs-pvc1    Bound     pvc-096ac87f-f46c-11e8-be25-52540049d944   2Gi        RWX            glusterfs-csi   46s
gcs-pvc10   Pending                                                                        glusterfs-csi   39s
gcs-pvc11   Pending                                                                        glusterfs-csi   37s
gcs-pvc12   Bound     pvc-0ebfb47b-f46c-11e8-be25-52540049d944   2Gi        RWX            glusterfs-csi   37s
gcs-pvc13   Bound     pvc-0f561bc5-f46c-11e8-be25-52540049d944   2Gi        RWX            glusterfs-csi   36s
gcs-pvc14   Pending                                                                        glusterfs-csi   35s
gcs-pvc15   Pending                                                                        glusterfs-csi   34s
gcs-pvc16   Pending                                                                        glusterfs-csi   33s
gcs-pvc17   Bound     pvc-11d38245-f46c-11e8-be25-52540049d944   2Gi        RWX            glusterfs-csi   32s
gcs-pvc18   Pending                                                                        glusterfs-csi   31s
gcs-pvc19   Pending                                                                        glusterfs-csi   30s
gcs-pvc2    Pending                                                                        glusterfs-csi   45s
gcs-pvc3    Pending                                                                        glusterfs-csi   45s
gcs-pvc4    Bound     pvc-0a6800d6-f46c-11e8-be25-52540049d944   2Gi        RWX            glusterfs-csi   44s
gcs-pvc5    Bound     pvc-0b09491f-f46c-11e8-be25-52540049d944   2Gi        RWX            glusterfs-csi   43s
gcs-pvc6    Pending                                                                        glusterfs-csi   42s
gcs-pvc7    Bound     pvc-0c2f0674-f46c-11e8-be25-52540049d944   2Gi        RWX            glusterfs-csi   41s
gcs-pvc8    Pending                                                                        glusterfs-csi   40s
gcs-pvc9    Pending                                                                        glusterfs-csi   40s

most of them going in pending state.

  1. check the pods status
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   0          2d18h
csi-nodeplugin-glusterfsplugin-2ctl7   2/2     Running   0          2d18h
csi-nodeplugin-glusterfsplugin-kh4jb   2/2     Running   0          2d18h
csi-nodeplugin-glusterfsplugin-wvrld   2/2     Running   0          2d18h
csi-provisioner-glusterfsplugin-0      3/3     Running   1          2d18h
etcd-5nh7mdst6h                        1/1     Running   0          2d18h
etcd-8psmj7t2gk                        1/1     Running   0          2d18h
etcd-hsgdxsrk7j                        1/1     Running   0          5m7s
etcd-operator-7cb5bd459b-vzvxn         1/1     Running   1          2d18h
gluster-kube1-0                        1/1     Running   1          3m26s
gluster-kube2-0                        1/1     Running   1          2d18h
gluster-kube3-0                        1/1     Running   1          2d18h

Here we can see the etcd and gluster-kube1-0 pods both got killed and respin happened.

  1. Checking te dmesg logs
[241052.757011] Out of memory: Kill process 23065 (glusterd2) score 1291 or sacrifice child
[241052.759723] Killed process 12991 (lvcreate) total-vm:62140kB, anon-rss:3372kB, file-rss:1448kB, shmem-rss:0kB
[241078.802364] kubelet invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=-999
[241078.804948] kubelet cpuset=/ mems_allowed=0
[241078.806282] CPU: 0 PID: 15957 Comm: kubelet Kdump: loaded Tainted: G               ------------ T 3.10.0-862.11.6.el7.x86_64 #1
[241078.809840] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
[241078.813709] Call Trace:
[241078.814496]  [<ffffffffacd135d4>] dump_stack+0x19/0x1b
[241078.816075]  [<ffffffffacd0e79f>] dump_header+0x90/0x229
[241078.817776]  [<ffffffffac8dc63b>] ? cred_has_capability+0x6b/0x120
[241078.819822]  [<ffffffffac79ac64>] oom_kill_process+0x254/0x3d0
[241078.822464]  [<ffffffffac8dc71e>] ? selinux_capable+0x2e/0x40
[241078.824270]  [<ffffffffac79b4a6>] out_of_memory+0x4b6/0x4f0
[241078.826074]  [<ffffffffacd0f2a3>] __alloc_pages_slowpath+0x5d6/0x724
[241078.828047]  [<ffffffffac7a17f5>] __alloc_pages_nodemask+0x405/0x420
[241078.830064]  [<ffffffffac7ebf98>] alloc_pages_current+0x98/0x110
[241078.832282]  [<ffffffffac797057>] __page_cache_alloc+0x97/0xb0
[241078.834345]  [<ffffffffac799758>] filemap_fault+0x298/0x490
[241078.836144]  [<ffffffffc053585f>] xfs_filemap_fault+0x5f/0xe0 [xfs]
[241078.838163]  [<ffffffffac7c352a>] __do_fault.isra.58+0x8a/0x100
[241078.840691]  [<ffffffffac7c3adc>] do_read_fault.isra.60+0x4c/0x1b0
[241078.842624]  [<ffffffffac7c8484>] handle_pte_fault+0x2f4/0xd10
[241078.844459]  [<ffffffffac7cae3d>] handle_mm_fault+0x39d/0x9b0
[241078.846793]  [<ffffffffac8376b9>] ? dput+0x29/0x160
[241078.848351]  [<ffffffffacd20557>] __do_page_fault+0x197/0x4f0
[241078.850512]  [<ffffffffacd20996>] trace_do_page_fault+0x56/0x150
[241078.852863]  [<ffffffffacd1ff22>] do_async_page_fault+0x22/0xf0
[241078.854659]  [<ffffffffacd1c788>] async_page_fault+0x28/0x30
[241078.856921] Mem-Info:
[241078.857677] active_anon:370326 inactive_anon:1187 isolated_anon:0
 active_file:4 inactive_file:2231 isolated_file:2
 unevictable:6692 dirty:2 writeback:0 unstable:0
 slab_reclaimable:11037 slab_unreclaimable:18435
 mapped:2359 shmem:2023 pagetables:3707 bounce:0
 free:13061 free_pcp:30 free_cma:0
[241078.868409] Node 0 DMA free:7632kB min:380kB low:472kB high:568kB active_anon:4408kB inactive_anon:104kB active_file:0kB inactive_file:28kB unevictable:300kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:300kB dirty:0kB writeback:0kB mapped:160kB shmem:156kB slab_reclaimable:192kB slab_unreclaimable:2648kB kernel_stack:112kB pagetables:72kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[241078.882955] lowmem_reserve[]: 0 1819 1819 1819
[241078.884613] Node 0 DMA32 free:44832kB min:44672kB low:55840kB high:67008kB active_anon:1476896kB inactive_anon:4644kB active_file:100kB inactive_file:12748kB unevictable:26468kB isolated(anon):0kB isolated(file):0kB present:2080592kB managed:1866644kB mlocked:26468kB dirty:8kB writeback:0kB mapped:9276kB shmem:7936kB slab_reclaimable:43956kB slab_unreclaimable:71092kB kernel_stack:15600kB pagetables:14756kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2 all_unreclaimable? no
[241078.900887] lowmem_reserve[]: 0 0 0 0
[241078.902897] Node 0 DMA: 192*4kB (UEM) 128*8kB (EM) 47*16kB (UEM) 9*32kB (UEM) 5*64kB (UE) 1*128kB (U) 1*256kB (M) 2*512kB (EM) 1*1024kB (E) 1*2048kB (M) 0*4096kB = 7632kB
[241078.908969] Node 0 DMA32: 114*4kB (UE) 628*8kB (EM) 2159*16kB (UEM) 100*32kB (UEM) 19*64kB (UEM) 2*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44696kB
[241078.913942] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[241078.916643] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[241078.920055] 7398 total pagecache pages
[241078.921395] 0 pages in swap cache
[241078.922354] Swap cache stats: add 0, delete 0, find 0/0
[241078.923969] Free swap  = 0kB
[241078.925044] Total swap = 0kB
[241078.926414] 524146 pages RAM
[241078.927710] 0 pages HighMem/MovableOnly
[241078.929004] 53508 pages reserved
[241078.930111] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[241078.933211] [  603]     0   603    28314      122      59        0             0 systemd-journal
[241078.935937] [  625]     0   625    68697      404      33        0             0 lvmetad
[241078.938434] [  645]     0   645    12002      452      25        0         -1000 systemd-udevd
[241078.943137] [  716]    32   716    17820      138      39        0             0 rpcbind
[241078.946513] [  718]    81   718    17683      206      36        0          -900 dbus-daemon
[241078.949782] [  723]   999   723   135645     1381      63        0             0 polkitd
[241078.952510] [  724]     0   724   114406     1313      86        0             0 NetworkManager
[241078.955281] [  727]     0   727     5414       69      16        0             0 irqbalance
[241078.957896] [  729]     0   729     6627      105      20        0             0 systemd-logind
[241078.960463] [  732]   994   732    29953      118      28        0             0 chronyd
[241078.963597] [  734]     0   734    25794      118      35        0             0 gssproxy
[241078.966319] [  780]     0   780   120631     3452      99        0             0 tuned
[241078.968793] [  785]     0   785    28718      258      57        0         -1000 sshd
[241078.971211] [  799]     0   799     5948       41      14        0             0 rhsmcertd
[241078.973742] [  815]     0   815    26849      499      53        0             0 dhclient
[241078.976644] [  823]     0   823     5601       40      11        0             0 agetty
[241078.980776] [  825]     0   825     5601       38      12        0             0 agetty
[241078.983429] [  841]     0   841     8594      159      16        0             0 crond
[241078.986419] [11893]     0 11893   224318     8246     100        0          -999 dockerd-current
[241078.989582] [14287]     0 14287     5317       46      12        0             0 etcd
[241078.992319] [14289]     0 14289    55754     1489      31        0             0 docker-current
[241078.995718] [14298]     0 14298    68416      327      22        0          -500 docker-containe
[241078.998633] [14313]     0 14313  2652055    19995      71        0             0 etcd
[241079.001550] [15854]     0 15854   192929    20087     134        0          -999 kubelet
[241079.004817] [15958]     0 15958    68416      150      22        0          -999 docker-containe
[241079.007971] [15973]     0 15973      253        1       4        0          -998 pause
[241079.012490] [15994]     0 15994    52032      150      21        0          -999 docker-containe
[241079.016116] [16009]     0 16009   132732     4594      98        0          -998 hyperkube
[241079.018681] [16262]     0 16262    84800      647      23        0          -999 docker-containe
[241079.021555] [16277]     0 16277      253        1       4        0          -998 pause
[241079.024743] [16393]     0 16393    52032      153      20        0          -999 docker-containe
[241079.027552] [16409]     0 16409   198298    83046     264        0           868 hyperkube
[241079.030453] [16710]     0 16710    52032      148      21        0          -999 docker-containe
[241079.034266] [16727]     0 16727      253        1       4        0          -998 pause
[241079.038057] [16909]     0 16909    52032      143      21        0          -999 docker-containe
[241079.042089] [16925]     0 16925      253        1       4        0          -998 pause
[241079.047506] [18070]     0 18070    52032      139      21        0          -999 docker-containe
[241079.050431] [18087]     0 18087      253        1       4        0          -998 pause
[241079.052965] [18115]     0 18115    85152      137      23        0          -999 docker-containe
[241079.056642] [18130]     0 18130    61401     2553      39        0           967 flanneld
[241079.059493] [18161]     0 18161    68416      145      22        0          -999 docker-containe
[241079.062421] [18177]     0 18177      303       14       4        0           999 install-cni.sh
[241079.065448] [18875]     0 18875    68416      150      22        0          -999 docker-containe
[241079.068516] [18893]     0 18893      253        1       4        0          -998 pause
[241079.071095] [19016]     0 19016    68416      141      22        0          -999 docker-containe
[241079.073920] [19032]     0 19032     7609      941      19        0          1000 local-provision
[241079.076680] [19749]     0 19749    52032      146      21        0          -999 docker-containe
[241079.079652] [19767]     0 19767      253        1       4        0          -998 pause
[241079.083555] [19889]     0 19889    85152      139      24        0          -999 docker-containe
[241079.087124] [19909]     0 19909    35298     1980      30        0           962 coredns
[241079.089768] [22272]     0 22272    68416      145      22        0          -999 docker-containe
[241079.092865] [22288]     0 22288      253        1       4        0          -998 pause
[241079.096662] [22536]     0 22536    84800      180      23        0          -999 docker-containe
[241079.102521] [22551]     0 22551      253        1       4        0          -998 pause
[241079.106050] [23008]     0 23008    68416      137      22        0          -999 docker-containe
[241079.109054] [23023]     0 23023    10778      322      26        0          1000 systemd
[241079.113538] [23053]     0 23053     8836      614      24        0          1000 systemd-journal
[241079.116915] [23055]     0 23055    58430     9217      39        0          1000 gluster-exporte
[241079.120732] [23063]    32 23063    17305      135      37        0          1000 rpcbind
[241079.123536] [23065]     0 23065   279994   141995     337        0          1000 glusterd2
[241079.126930] [23134]    81 23134    14516      117      31        0          -900 dbus-daemon
[241079.129983] [23283]     0 23283    84800      660      22        0          -999 docker-containe
[241079.133300] [23298]     0 23298  2662238    26772      82        0          1000 etcd
[241079.135990] [26046]     0 26046    84800      650      22        0          -999 docker-containe
[241079.139008] [26061]     0 26061      253        1       4        0          -998 pause
[241079.142517] [26298]     0 26298    52384      143      22        0          -999 docker-containe
[241079.146212] [26313]     0 26313     7516      814      19        0          1000 driver-registra
[241079.149486] [26606]     0 26606    52384     1182      22        0          -999 docker-containe
[241079.153324] [26622]     0 26622     8190      989      20        0          1000 glusterfs-csi-d
[241079.156803] [27575]     0 27575    68416      184      22        0          -999 docker-containe
[241079.159842] [27590]  1000 27590      253        1       4        0          -998 pause
[241079.162735] [27749]     0 27749    52384      148      22        0          -999 docker-containe
[241079.165745] [27767]     0 27767      253        1       4        0          -998 pause
[241079.168493] [28322]     0 28322    68416      155      22        0          -999 docker-containe
[241079.171408] [28338]   472 28338   187896     2911      54        0           946 grafana-server
[241079.174277] [28462]     0 28462    68416      186      22        0          -999 docker-containe
[241079.178141] [28477]  1000 28477    27457      347      13        0           973 prometheus-conf
[241079.182720] [28614]     0 28614    68416      184      22        0          -999 docker-containe
[241079.186738] [28629]  1000 28629     2600      784      10        0           995 configmap-reloa
[241079.190928] [28692]     0 28692    35648      139      20        0          -999 docker-containe
[241079.195195] [28708]  1000 28708    38267     9428      55        0           999 prometheus
[241079.200457] [ 3095]     0  3095   261940     6651      56        0         -1000 dmeventd
[241079.204810] [ 1278]     0  1278    66103      144      21        0          -999 docker-containe
[241079.208967] [ 1293]     0  1293     2955      109      12        0          1000 bash
[241079.212351] [24029]     0 24029    82487      145      22        0          -999 docker-containe
[241079.217271] [24044]     0 24044     2955       98      11        0          1000 bash
[241079.220016] [32074]     0 32074    39688      335      78        0             0 sshd
[241079.222883] [32077]  1000 32077    39767      401      78        0             0 sshd
[241079.225605] [32078]  1000 32078     6775      987      15        0             0 bash
[241079.228841] [ 3680]     0  3680      299        1       5        0           999 sleep
[241079.232823] [ 9877]     0  9877    87558      930      52        0          1000 glusterfs
[241079.235885] [11311]     0 11311   214441      661      84        0          1000 glusterfsd
[241079.238633] [11726]     0 11726    84800      175      23        0          -999 docker-containe
[241079.241660] [11727]     0 11727    68152      154      22        0          -999 docker-containe
[241079.245132] [11761]     0 11761   115028     4151      98        0           912 hyperkube
[241079.248021] [11765]     0 11765   113243     4191      97        0           949 hyperkube
[241079.251843] [13250]     0 13250    15425      463      35        0          1000 lvm
[241079.255549] [13267]  1000 13267    67987     1238      44        0             0 kubectl
[241079.258248] [13295]     0 13295    33876      805      23        0          -999 docker-containe
[241079.261093] [13307]     0 13307     4030       23       9        0          -999 du
[241079.264160] Out of memory: Kill process 23065 (glusterd2) score 1291 or sacrifice child
[241079.266900] Killed process 23065 (glusterd2) total-vm:1119976kB, anon-rss:567980kB, file-rss:0kB, shmem-rss:0kB

Vagrant up is failing on same deploy directory after destroying gcs setup

Steps

  1. GCS setup was running
  2. Deleted GCS setup by running "vagrant destroy -f" from the directory where GCS setup was deployed
[root@rhsqa-virt05 deploy2]# vagrant destroy -f
==> kube3: Removing domain...
==> kube2: Removing domain...
==> kube1: Removing domain...

GCS setup got deleted successfully

  1. Removed hidden file "rm -rf .vagrant/"
  2. From same directory tried running "vagrant up" and it is failing with error.
[root@rhsqa-virt05 deploy2]# vagrant up
Bringing machine 'kube1' up with 'libvirt' provider...
Bringing machine 'kube2' up with 'libvirt' provider...
Bringing machine 'kube3' up with 'libvirt' provider...
==> kube3: Creating image (snapshot of base box volume).
==> kube2: Creating image (snapshot of base box volume).
==> kube1: Creating image (snapshot of base box volume).
==> kube3: Creating domain with the following settings...
==> kube1: Creating domain with the following settings...
==> kube2: Creating domain with the following settings...
==> kube3:  -- Name:              akarsha-1_kube3
==> kube2:  -- Name:              akarsha-1_kube2
==> kube3:  -- Domain type:       kvm
==> kube2:  -- Domain type:       kvm
==> kube1:  -- Name:              akarsha-1_kube1
==> kube3:  -- Cpus:              2
==> kube2:  -- Cpus:              2
==> kube1:  -- Domain type:       kvm
==> kube3:  -- Feature:           acpi
==> kube2:  -- Feature:           acpi
==> kube3:  -- Feature:           apic
==> kube3:  -- Feature:           pae
==> kube2:  -- Feature:           apic
==> kube1:  -- Cpus:              2
==> kube1:  -- Feature:           acpi
==> kube1:  -- Feature:           apic
==> kube1:  -- Feature:           pae
==> kube3:  -- Memory:            2048M
==> kube1:  -- Memory:            2048M
==> kube2:  -- Feature:           pae
==> kube3:  -- Management MAC:    
==> kube1:  -- Management MAC:    
==> kube2:  -- Memory:            2048M
==> kube3:  -- Loader:            
==> kube1:  -- Loader:            
==> kube3:  -- Base box:          centos/atomic-host
==> kube2:  -- Management MAC:    
==> kube1:  -- Base box:          centos/atomic-host
==> kube2:  -- Loader:            
==> kube3:  -- Storage pool:      default
==> kube2:  -- Base box:          centos/atomic-host
==> kube1:  -- Storage pool:      default
==> kube3:  -- Image:             /var/lib/libvirt/images/akarsha-1_kube3.img (11G)
==> kube1:  -- Image:             /var/lib/libvirt/images/akarsha-1_kube1.img (11G)
==> kube2:  -- Storage pool:      default
==> kube3:  -- Volume Cache:      default
==> kube1:  -- Volume Cache:      default
==> kube2:  -- Image:             /var/lib/libvirt/images/akarsha-1_kube2.img (11G)
==> kube1:  -- Kernel:            
==> kube3:  -- Kernel:            
==> kube2:  -- Volume Cache:      default
==> kube1:  -- Initrd:            
==> kube2:  -- Kernel:            
==> kube3:  -- Initrd:            
==> kube1:  -- Graphics Type:     vnc
==> kube3:  -- Graphics Type:     vnc
==> kube2:  -- Initrd:            
==> kube1:  -- Graphics Port:     5900
==> kube2:  -- Graphics Type:     vnc
==> kube3:  -- Graphics Port:     5900
==> kube1:  -- Graphics IP:       127.0.0.1
==> kube2:  -- Graphics Port:     5900
==> kube1:  -- Graphics Password: Not defined
==> kube3:  -- Graphics IP:       127.0.0.1
==> kube2:  -- Graphics IP:       127.0.0.1
==> kube3:  -- Graphics Password: Not defined
==> kube1:  -- Video Type:        cirrus
==> kube2:  -- Graphics Password: Not defined
==> kube3:  -- Video Type:        cirrus
==> kube1:  -- Video VRAM:        9216
==> kube2:  -- Video Type:        cirrus
==> kube3:  -- Video VRAM:        9216
==> kube2:  -- Video VRAM:        9216
==> kube1:  -- Sound Type:	
==> kube3:  -- Sound Type:	
==> kube2:  -- Sound Type:	
==> kube1:  -- Keymap:            en-us
==> kube3:  -- Keymap:            en-us
==> kube2:  -- Keymap:            en-us
==> kube1:  -- TPM Path:          
==> kube3:  -- TPM Path:          
==> kube2:  -- TPM Path:          
==> kube3:  -- Disks:         vdb(qcow2,1024G), vdc(qcow2,1024G), vdd(qcow2,1024G)
==> kube1:  -- Disks:         vdb(qcow2,1024G), vdc(qcow2,1024G), vdd(qcow2,1024G)
==> kube3:  -- Disk(vdb):     /var/lib/libvirt/images/akarsha-1_kube3-vdb.qcow2
==> kube1:  -- Disk(vdb):     /var/lib/libvirt/images/akarsha-1_kube1-vdb.qcow2
==> kube3:  -- Disk(vdc):     /var/lib/libvirt/images/akarsha-1_kube3-vdc.qcow2
==> kube2:  -- Disks:         vdb(qcow2,1024G), vdc(qcow2,1024G), vdd(qcow2,1024G)
==> kube3:  -- Disk(vdd):     /var/lib/libvirt/images/akarsha-1_kube3-vdd.qcow2
==> kube1:  -- Disk(vdc):     /var/lib/libvirt/images/akarsha-1_kube1-vdc.qcow2
==> kube3:  -- INPUT:             type=mouse, bus=ps2
==> kube2:  -- Disk(vdb):     /var/lib/libvirt/images/akarsha-1_kube2-vdb.qcow2
==> kube1:  -- Disk(vdd):     /var/lib/libvirt/images/akarsha-1_kube1-vdd.qcow2
==> kube2:  -- Disk(vdc):     /var/lib/libvirt/images/akarsha-1_kube2-vdc.qcow2
==> kube1:  -- INPUT:             type=mouse, bus=ps2
==> kube2:  -- Disk(vdd):     /var/lib/libvirt/images/akarsha-1_kube2-vdd.qcow2
==> kube2:  -- INPUT:             type=mouse, bus=ps2
==> kube2: Creating shared folders metadata...
==> kube2: Starting domain.
==> kube3: Creating shared folders metadata...
==> kube3: Starting domain.
==> kube2: Waiting for domain to get an IP address...
==> kube1: Creating shared folders metadata...
==> kube3: Waiting for domain to get an IP address...
==> kube1: Starting domain.
==> kube1: Waiting for domain to get an IP address...
==> kube1: Waiting for SSH to become available...
    kube1: 
    kube1: Vagrant insecure key detected. Vagrant will automatically replace
    kube1: this with a newly generated keypair for better security.
==> kube2: Waiting for SSH to become available...
    kube1: 
    kube1: Inserting generated public key within guest...
==> kube3: Waiting for SSH to become available...
    kube2: 
    kube2: Vagrant insecure key detected. Vagrant will automatically replace
    kube2: this with a newly generated keypair for better security.
    kube1: Removing insecure key from the guest if it's present...
    kube3: 
    kube3: Vagrant insecure key detected. Vagrant will automatically replace
    kube3: this with a newly generated keypair for better security.
    kube2: 
    kube2: Inserting generated public key within guest...
    kube1: Key inserted! Disconnecting and reconnecting using new SSH key...
    kube3: 
    kube3: Inserting generated public key within guest...
    kube2: Removing insecure key from the guest if it's present...
==> kube1: Setting hostname...
    kube3: Removing insecure key from the guest if it's present...
    kube2: Key inserted! Disconnecting and reconnecting using new SSH key...
==> kube1: Configuring and enabling network interfaces...
==> kube2: Setting hostname...
    kube3: Key inserted! Disconnecting and reconnecting using new SSH key...
==> kube3: Setting hostname...
==> kube2: Configuring and enabling network interfaces...
    kube1: SSH address: 192.168.121.100:22
==> kube3: Configuring and enabling network interfaces...
    kube1: SSH username: vagrant
    kube1: SSH auth method: private key
    kube2: SSH address: 192.168.121.102:22
    kube2: SSH username: vagrant
    kube2: SSH auth method: private key
    kube3: SSH address: 192.168.121.162:22
    kube3: SSH username: vagrant
    kube3: SSH auth method: private key
==> kube3: Running provisioner: ansible...
Vagrant has automatically selected the compatibility mode '2.0'
according to the Ansible version installed (2.6.4).

Alternatively, the compatibility mode can be specified in your Vagrantfile:
https://www.vagrantup.com/docs/provisioning/ansible_common.html#compatibility_mode

==> kube3: Vagrant has detected a host range pattern in the `groups` option.
==> kube3: Vagrant doesn't fully check the validity of these parameters!
==> kube3: 
==> kube3: Please check https://docs.ansible.com/ansible/intro_inventory.html#hosts-and-groups
==> kube3: for more information.
    kube3: Running ansible-playbook...
ERROR! no action detected in task. This often indicates a misspelled module name, or incorrect module path.

The error appears to have been in '/root/gcs-3/deploy2/kubespray/roles/vault/handlers/main.yml': line 44, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


- name: unseal vault
  ^ here

==> kube3: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.

An error occurred while executing the action on the 'kube3'
machine. Please handle this error then try again:

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

idna version incompatible issue while running prepare.sh

atin@dhcp35-96:~/codebase/myrepo/gcs/deploy$ ./prepare.sh
vagrant and vagrant-libvirt were found.
For easier operation, ensure that libvirt has been configured to work without passwords.
Ref: https://developer.fedoraproject.org/tools/vagrant/vagrant-libvirt.html

Ensuring kubespray is present
Creating a python virtualenv gcs-venv
Installing requirements into gcs-venv
requests 2.20.1 has requirement idna<2.8,>=2.5, but you'll have idna 2.8 which is incompatible.

Virtualenv gcs-venv has been created
The virtualenv needs to be activated before doing any operations. Operations may fail if virtualenv is not activated.

To activate the virutalenv, run:
$ source gcs-venv/bin/activate
(gcs-venv) $

To deactivate an activated virtualenv, run:
(gcs-venv) $ deactivate
$

Note: The virtualenv should be activated for each shell session individually.

I'm not sure if this error message is benign or not as the environment seems to be created?

Error on vagrant up: The following settings shouldn't exist: plugins

After cloning with submodules, running prepare.sh, and activating environment, I'm seeing the following errors after running vagrant up:

(gcs-venv) [lbailey@windslash deploy]$ vagrant up
Bringing machine 'kube1' up with 'libvirt' provider...
Bringing machine 'kube2' up with 'libvirt' provider...
Bringing machine 'kube3' up with 'libvirt' provider...
==> kube2: An error occurred. The error will be shown after all tasks complete.
==> kube3: An error occurred. The error will be shown after all tasks complete.
==> kube1: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.

An error occurred while executing the action on the 'kube1'
machine. Please handle this error then try again:

There are errors in the configuration of this machine. Please fix
the following errors and try again:

vagrant:
* The following settings shouldn't exist: plugins

An error occurred while executing the action on the 'kube2'
machine. Please handle this error then try again:

There are errors in the configuration of this machine. Please fix
the following errors and try again:

vagrant:
* The following settings shouldn't exist: plugins

An error occurred while executing the action on the 'kube3'
machine. Please handle this error then try again:

There are errors in the configuration of this machine. Please fix
the following errors and try again:

vagrant:
* The following settings shouldn't exist: plugins

Running on Fedora 28.

Add init container to glusterd2 pod to wait for DNS

In gluster/glusterd2#1324, it looks like DNS isn't ready by the time gd2 tries to resolve its own hostname.

We should add an init container like etcd does to wait for DNS.

Example:

$ kubectl -n gcs get po/etcd-sv9sxbvm7j -oyaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
...
  initContainers:
  - command:
    - /bin/sh
    - -c
    - "\n\t\t\t\t\twhile ( ! nslookup etcd-sv9sxbvm7j.etcd.gcs.svc )\n\t\t\t\t\tdo\n\t\t\t\t\t\tsleep
      2\n\t\t\t\t\tdone"
    image: busybox:1.28.0-glibc
    imagePullPolicy: IfNotPresent
    name: check-dns
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
...

Failed to delete volume from glusterd2 pod

Logs

[root@gluster-kube1-0 /]# glustercli peer list --endpoints=http://gluster-kube1-0.glusterd2.gcs:24007
+--------------------------------------+-----------------+-------------------------------------+-------------------------------------+--------+-----+
|                  ID                  |      NAME       |          CLIENT ADDRESSES           |           PEER ADDRESSES            | ONLINE | PID |
+--------------------------------------+-----------------+-------------------------------------+-------------------------------------+--------+-----+
| 0ec39f93-e53d-4c39-b190-451647e8a2ad | gluster-kube2-0 | gluster-kube2-0.glusterd2.gcs:24007 | gluster-kube2-0.glusterd2.gcs:24008 | yes    |  21 |
| 4ae03846-4985-4f72-9893-9e9ab16339fa | gluster-kube1-0 | gluster-kube1-0.glusterd2.gcs:24007 | gluster-kube1-0.glusterd2.gcs:24008 | yes    |  21 |
| 8c180e06-e0ce-4c42-9775-474d60f718f6 | gluster-kube3-0 | gluster-kube3-0.glusterd2.gcs:24007 | gluster-kube3-0.glusterd2.gcs:24008 | no     |     |
+--------------------------------------+-----------------+-------------------------------------+-------------------------------------+--------+-----+
[root@gluster-kube1-0 /]# 
[root@gluster-kube1-0 /]# glustercli volume stop pvc-9560daae-db4a-11e8-8bfc-525400b677a0 --endpoints=http://gluster-kube1-0.glusterd2.gcs:24007
Volume stop failed

Response headers:
X-Gluster-Cluster-Id: 55d7f9c6-b040-4f03-805f-7a2f6fe0789c
X-Gluster-Peer-Id: 4ae03846-4985-4f72-9893-9e9ab16339fa
X-Request-Id: 0abf77ea-52f8-46f9-bd4a-a8b933b9ee85

Response body:
node 8c180e06-e0ce-4c42-9775-474d60f718f6 is probably down
[root@gluster-kube1-0 /]# glustercli peer list --endpoints=http://gluster-kube3-0.glusterd2.gcs:24007
+--------------------------------------+-----------------+-------------------------------------+-------------------------------------+--------+-----+
|                  ID                  |      NAME       |          CLIENT ADDRESSES           |           PEER ADDRESSES            | ONLINE | PID |
+--------------------------------------+-----------------+-------------------------------------+-------------------------------------+--------+-----+
| 0ec39f93-e53d-4c39-b190-451647e8a2ad | gluster-kube2-0 | gluster-kube2-0.glusterd2.gcs:24007 | gluster-kube2-0.glusterd2.gcs:24008 | yes    |  21 |
| 4ae03846-4985-4f72-9893-9e9ab16339fa | gluster-kube1-0 | gluster-kube1-0.glusterd2.gcs:24007 | gluster-kube1-0.glusterd2.gcs:24008 | yes    |  21 |
| 8c180e06-e0ce-4c42-9775-474d60f718f6 | gluster-kube3-0 | gluster-kube3-0.glusterd2.gcs:24007 | gluster-kube3-0.glusterd2.gcs:24008 | no     |     |
+--------------------------------------+-----------------+-------------------------------------+-------------------------------------+--------+-----+

during volume stop its showing node gluster-kube3-0 is down , but am able to access glusterd2 node gluster-kube3-0

as glusterd2 in gluster-kube3-0 is running but its not marking itself as online in ETCD, am not seeing any failure in glusterd2 logs

Brick goes offline when created a new PVC

  1. Created a GCS cluster
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY     STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2       Running   0          1h
csi-nodeplugin-glusterfsplugin-4cvkd   2/2       Running   0          1h
csi-nodeplugin-glusterfsplugin-m9z9n   2/2       Running   0          1h
csi-nodeplugin-glusterfsplugin-wclwr   2/2       Running   0          1h
csi-provisioner-glusterfsplugin-0      2/2       Running   0          1h
etcd-chvm79wqr4                        1/1       Running   0          1h
etcd-ndccs6pkq7                        1/1       Running   0          1h
etcd-operator-54bbdfc55d-vkxh6         1/1       Running   0          1h
etcd-rrfgwq5xkd                        1/1       Running   0          3m
kube1-0                                1/1       Running   0          1h
kube2-0                                1/1       Running   0          1h
kube3-0                                1/1       Running   0          1h

  1. create a PVC mount it on a app pod.
[root@kube1-0 /]# glustercli volume status --endpoints="http://kube2-0.glusterd2.gcs:24007"
Volume : pvc-350277cfcd3111e8un/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/bri
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |         HOST          |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| 1e9711eb-6ab9-4381-8f37-4fe929ae8e36 | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick1/brick | true   | 49152 |  53 |
| 044ae8e2-4dcb-45f1-9e17-7fa0b1b8084b | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick2/brick | true   | 49152 |  53 |
| ddc9610e-4216-4d96-a4e1-5558703d2f1a | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick3/brick | true   | 49152 |  53 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
  1. Kill a brick from the gd2 pod
[root@kube1-0 /]# kill -9 53
[root@kube1-0 /]# 
[root@kube1-0 /]#  glustercli volume status --endpoints="http://kube2-0.glusterd2.
Volume : pvc-350277cfcd3111e8
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |         HOST          |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| 1e9711eb-6ab9-4381-8f37-4fe929ae8e36 | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick1/brick | true   | 49152 |  53 |
| 044ae8e2-4dcb-45f1-9e17-7fa0b1b8084b | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick2/brick | false  |     0 |   0 |
| ddc9610e-4216-4d96-a4e1-5558703d2f1a | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick3/brick | true   | 49152 |  53 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@kube1-0 /]# 

  1. delete the app pods and the pvc
    [vagrant@kube1 ~]$ [vagrant@kube1 ~]$ kubectl delete pod redis1 pod "redis1" deleted [vagrant@kube1 ~]$ [vagrant@kube1 ~]$ [vagrant@kube1 ~]$ [vagrant@kube1 ~]$ kubectl -n gcs -it exec kube1-0 -- /bin/bash [root@kube1-0 /]# glustercli volume status --endpoints="http://kube2-0.glusterd2.gcs:24007" Volume : pvc-350277cfcd3111e8 +--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+ | BRICK ID | HOST | PATH | ONLINE | PORT | PID | +--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+ | 1e9711eb-6ab9-4381-8f37-4fe929ae8e36 | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick1/brick | true | 49152 | 53 | | 044ae8e2-4dcb-45f1-9e17-7fa0b1b8084b | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick2/brick | false | 0 | 0 | | ddc9610e-4216-4d96-a4e1-5558703d2f1a | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-350277cfcd3111e8/subvol1/brick3/brick | true | 49152 | 53 | +--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+ [root@kube1-0 /]# [root@kube1-0 /]# exit [vagrant@kube1 ~]$ kubectl get pods No resources found. [vagrant@kube1 ~]$ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE gcs-pvc1 Bound pvc-350277cfcd3111e8 2Gi RWX glusterfs-csi 19m [vagrant@kube1 ~]$ kubectl delete pvc gcs-pvc1 persistentvolumeclaim "gcs-pvc1" deleted [vagrant@kube1 ~]$ kubectl get pvc No resources found. [vagrant@kube1 ~]$ kubectl -n gcs -it exec kube1-0 -- /bin/bash [root@kube1-0 /]# glustercli volume status --endpoints="http://kube2-0.glusterd2.gcs:24007" No volumes found [root@kube1-0 /]#

  2. delete the gd2 pod and wait for a new pod to spin

[vagrant@kube1 ~]$ kubectl delete -n gcs pods kube1-0 --grace-period=0
pod "kube1-0" deleted

  1. Create a new pvc and check the volume status in the gd2 pod
[root@kube1-0 /]# glustercli volume status --endpoints="http://kube3-0.glusterd2.gcs:24007"
Volume : pvc-044c90d4cd3411e8
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |         HOST          |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| 981ca2e3-e282-4073-b540-82a1bd849a5c | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick3/brick | true   | 49152 | 175 |
| 22752a52-4be3-4ec2-beec-ad53290dfd3e | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick1/brick | true   | 49152 | 173 |
| 81d97c58-4f76-4c10-bcb7-1d64c552e515 | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-044c90d4cd3411e8/subvol1/brick2/brick | false  |     0 |   0 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
[root@kube1-0 /]# 

Not able to extend the gcs cluster, Raising this issue as a enhancement

Currently by default with vagrant script we are getting 3 node gcs setup (gd2 container cluster). Its not possible to extending the peers with current gcs setup.

But in OCS we are able extend the peers for "gluster container cluster" using heketi commands.

Same we should be able to do it on gcs setup.

failed to deploy gcs cluster using vagrant.

The cluster fails to deploy using vagrant:-

TASK [GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready] **********
Tuesday 27 November 2018  15:44:42 +0530 (0:00:00.190)       1:08:21.441 ****** 
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (50 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (49 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (48 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (47 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (46 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (45 retries left).




FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (44 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (43 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (42 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (41 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (40 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (39 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (38 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (37 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (36 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (35 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (34 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (33 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (32 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (31 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (30 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (29 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (28 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (27 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (26 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (25 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (24 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (23 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (22 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (21 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (20 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (19 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (18 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (17 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (16 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (15 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (14 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (13 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (12 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (11 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (10 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (9 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (8 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (7 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (6 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (5 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (4 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (3 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (2 retries left).
FAILED - RETRYING: GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready (1 retries left).
fatal: [kube1]: FAILED! => {"attempts": 50, "changed": false, "connection": "close", "content_length": "542", "content_type": "application/json; charset=UTF-8", "cookies": {}, "cookies_string": "", "date": "Tue, 27 Nov 2018 10:23:48 GMT", "json": [{"client-addresses": ["gluster-kube1-0.glusterd2.gcs:24007"], "id": "f71b18e2-badb-44f9-a3cb-ebaab3ea7911", "metadata": {"_zone": "f71b18e2-badb-44f9-a3cb-ebaab3ea7911"}, "name": "gluster-kube1-0", "online": true, "peer-addresses": ["gluster-kube1-0.glusterd2.gcs:24008"], "pid": 26}, {"client-addresses": ["gluster-kube3-0.glusterd2.gcs:24007"], "id": "ff7a5a54-ff8e-49c4-8a09-ac7ee9955787", "metadata": {"_zone": "ff7a5a54-ff8e-49c4-8a09-ac7ee9955787"}, "name": "gluster-kube3-0", "online": true, "peer-addresses": ["gluster-kube3-0.glusterd2.gcs:24008"], "pid": 25}], "msg": "OK (542 bytes)", "redirected": false, "status": 200, "url": "http://10.233.19.39:24007/v1/peers", "x_gluster_cluster_id": "348d6571-88a2-4231-9100-98ef06ad9a9c", "x_gluster_peer_id": "ff7a5a54-ff8e-49c4-8a09-ac7ee9955787", "x_request_id": "7c157d53-b8ed-4551-8a55-977731c39b7d"}
	to retry, use: --limit @/root/gcs-latest/gcs-latest-karan/deploy/vagrant-playbook.retry

PLAY RECAP *********************************************************************
kube1                      : ok=377  changed=114  unreachable=0    failed=1   
kube2                      : ok=4    changed=4    unreachable=0    failed=1   
kube3                      : ok=254  changed=71   unreachable=0    failed=0   

Tuesday 27 November 2018  15:53:48 +0530 (0:09:05.631)       1:17:27.072 ****** 
=============================================================================== 
download : container_download | Download containers if pull is required or told to always pull (all nodes)  2455.31s
GCS | GD2 Cluster | Wait for glusterd2-cluster to become ready -------- 545.63s
GCS | ETCD Operator | Wait for etcd-operator to be available ---------- 433.74s
Install packages ------------------------------------------------------ 153.73s
GCS | ETCD Cluster | Wait for etcd-cluster to become ready ------------ 143.99s
download : container_download | Download containers if pull is required or told to always pull (all nodes) - 127.84s
download : container_download | Download containers if pull is required or told to always pull (all nodes) -- 63.79s
download : container_download | Download containers if pull is required or told to always pull (all nodes) -- 53.10s
download : file_download | Download item ------------------------------- 50.40s
download : container_download | Download containers if pull is required or told to always pull (all nodes) -- 49.67s
download : container_download | Download containers if pull is required or told to always pull (all nodes) -- 37.31s
kubernetes/master : Master | wait for the apiserver to be running ------ 30.71s
etcd : Gen_certs | Write etcd master certs ----------------------------- 18.82s
Wait for host to be available ------------------------------------------ 16.46s
download : container_download | Download containers if pull is required or told to always pull (all nodes) -- 15.67s
GCS Pre | Manifests | Sync GCS manifests ------------------------------- 12.62s
etcd : Configure | Check if etcd cluster is healthy -------------------- 11.26s
etcd : reload etcd ----------------------------------------------------- 10.70s
docker : Docker | pause while Docker restarts -------------------------- 10.23s
kubernetes/master : Master | wait for kube-controller-manager ----------- 9.94s
==> kube3: An error occurred. The error will be shown after all tasks complete.
An error occurred while executing multiple actions in parallel.
Any errors that occurred are shown below.

An error occurred while executing the action on the 'kube3'
machine. Please handle this error then try again:

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.
(gcs-venv) [root@rhsqa-virt05 deploy]# 
(gcs-venv) [root@rhsqa-virt05 deploy]# 
(gcs-venv) [root@rhsqa-virt05 deploy]# 
(gcs-venv) [root@rhsqa-virt05 deploy]# 
(gcs-venv) [root@rhsqa-virt05 deploy]# 
(gcs-venv) [root@rhsqa-virt05 deploy]# 
(gcs-venv) [root@rhsqa-virt05 deploy]# 
(gcs-venv) [root@rhsqa-virt05 deploy]# vagrant ssh kube1
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ 
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                             READY   STATUS    RESTARTS   AGE
etcd-ngw4zkd6g8                  1/1     Running   0          15m
etcd-nv9n225zsd                  1/1     Running   0          13m
etcd-operator-7cb5bd459b-c6zn8   1/1     Running   0          23m
etcd-r68w6jxjrh                  1/1     Running   0          14m
gluster-kube1-0                  1/1     Running   0          13m
gluster-kube2-0                  0/1     Pending   0          13m
gluster-kube3-0                  1/1     Running   0          13m
[vagrant@kube1 ~]$ 

The CSI pods aren't even deployed. and gluster pods going in pending state.

Initial probing fails on gd2 pod after bringing up fresh setup

  1. Create a new setup using vagrant.
  2. run command " kubectl get pods -n gcs"
[vagrant@kube1 ~]$ kubectl get pods -n gcs
NAME                                   READY   STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2     Running   1          15m
csi-nodeplugin-glusterfsplugin-9kcc4   2/2     Running   0          15m
csi-nodeplugin-glusterfsplugin-b55mn   2/2     Running   0          15m
csi-nodeplugin-glusterfsplugin-vsv4n   2/2     Running   0          15m
csi-provisioner-glusterfsplugin-0      2/2     Running   0          15m
etcd-27pmwf7bh8                        1/1     Running   0          20m
etcd-b9qhntzfp9                        1/1     Running   0          21m
etcd-operator-7cb5bd459b-ls6rl         1/1     Running   0          23m
etcd-sk7sqgj6qw                        1/1     Running   0          22m
gluster-kube1-0                        1/1     Running   0          20m
gluster-kube2-0                        1/1     Running   1          20m
gluster-kube3-0                        1/1     Running   0          20m

  1. on of the three gd2 pod always restarts in the setup without any operations run manually.
[vagrant@kube1 ~]$ kubectl describe pod -n gcs gluster-kube2-0
Name:               gluster-kube2-0
Namespace:          gcs
Priority:           0
PriorityClassName:  <none>
Node:               kube2/192.168.121.98
Start Time:         Tue, 23 Oct 2018 07:54:18 +0000
Labels:             app.kubernetes.io/component=glusterfs
                    app.kubernetes.io/name=glusterd2
                    app.kubernetes.io/part-of=gcs
                    controller-revision-hash=gluster-kube2-5c4565d64b
                    statefulset.kubernetes.io/pod-name=gluster-kube2-0
Annotations:        <none>
Status:             Running
IP:                 10.233.64.5
Controlled By:      StatefulSet/gluster-kube2
Containers:
  glusterd2:
    Container ID:   docker://8455de0d8753111237d8032dc6ae5821d5ba43e7ce833620c41278ebe3caa92f
    Image:          docker.io/gluster/glusterd2-nightly:20180920
    Image ID:       docker-pullable://docker.io/gluster/glusterd2-nightly@sha256:7013c3de3ed2c8b9c380c58b7c331dfc70df39fe13faea653b25034545971072
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Tue, 23 Oct 2018 07:58:21 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Tue, 23 Oct 2018 07:54:48 +0000
      Finished:     Tue, 23 Oct 2018 07:58:21 +0000
    Ready:          True
    Restart Count:  1
    Liveness:       http-get http://:24007/ping delay=10s timeout=1s period=60s #success=1 #failure=3
    Environment:
      GD2_ETCDENDPOINTS:  http://etcd-client.gcs:2379
      GD2_CLUSTER_ID:     b8250aa0-14d7-4bcb-9d8f-c23b4b0d736b
      GD2_CLIENTADDRESS:  gluster-kube2-0.glusterd2.gcs:24007
      GD2_PEERADDRESS:    gluster-kube2-0.glusterd2.gcs:24008
      GD2_RESTAUTH:       false
    Mounts:
      /dev from gluster-dev (rw)
      /run/lvm from gluster-lvm (rw)
      /sys/fs/cgroup from gluster-cgroup (ro)
      /usr/lib/modules from gluster-kmods (ro)
      /var/lib/glusterd2 from glusterd2-statedir (rw)
      /var/log/glusterd2 from glusterd2-logdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jhxz7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  gluster-dev:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  
  gluster-cgroup:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/cgroup
    HostPathType:  
  gluster-lvm:
    Type:          HostPath (bare host directory volume)
    Path:          /run/lvm
    HostPathType:  
  gluster-kmods:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/lib/modules
    HostPathType:  
  glusterd2-statedir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/glusterd2
    HostPathType:  DirectoryOrCreate
  glusterd2-logdir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/glusterd2
    HostPathType:  DirectoryOrCreate
  default-token-jhxz7:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-jhxz7
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  23m                default-scheduler  Successfully assigned gcs/gluster-kube2-0 to kube2
  Normal   Pulling    23m                kubelet, kube2     pulling image "docker.io/gluster/glusterd2-nightly:20180920"
  Normal   Pulled     23m                kubelet, kube2     Successfully pulled image "docker.io/gluster/glusterd2-nightly:20180920"
  Warning  Unhealthy  20m (x3 over 22m)  kubelet, kube2     Liveness probe failed: Get http://10.233.64.5:24007/ping: dial tcp 10.233.64.5:24007: connect: connection refused
  Normal   Created    19m (x2 over 23m)  kubelet, kube2     Created container
  Normal   Started    19m (x2 over 23m)  kubelet, kube2     Started container
  Normal   Killing    19m                kubelet, kube2     Killing container with id docker://glusterd2:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled     19m                kubelet, kube2     Container image "docker.io/gluster/glusterd2-nightly:20180920" already present on machine
[vagrant@kube1 ~]$ 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.