kadalu / kadalu Goto Github PK

A lightweight Persistent storage solution for Kubernetes / OpenShift / Nomad using GlusterFS in background. More information at https://kadalu.tech

Home Page: https://docs.kadalu.tech/k8s-storage/devel/quick-start/

License: Other

Makefile 2.13% Shell 10.19% Dockerfile 2.76% Python 70.81% Ruby 0.43% Jinja 3.03% HCL 2.19% Perl 8.25% Mustache 0.24%

gluster kubernetes csi-driver k8s k8s-sig-storage storage csi operator glusterfs openshift-storage

kadalu's Introduction

kaDalu

What is Kadalu ?

Kadalu is a project to provide Persistent Storage in container ecosystem (like kubernetes, openshift, RKE, etc etc). Kadalu operator deploys CSI pods, and gluster storage pods as per the config. You would get your PVs served through APIs implemented in CSI.

Get Started

Getting started is made easy to copy paste the below commands.

curl -fsSL https://github.com/kadalu/kadalu/releases/latest/download/install.sh | sudo bash -x
kubectl-kadalu version
kubectl kadalu install --type=$K8S_DIST

Where K8S_DIST can be one of below values and kubernetes being the default:

kubernetes
openshift
rke
microk8s

The above will deploy the latest version of kadalu operator and CSI pods. Once done, you can provide storage to kadalu operator to manage.

$ kubectl kadalu storage-add storage-pool-1 --device kube1:/dev/sdc

Note that, in above command, kube1 is the node which is providing /dev/sdc as a storage to kadalu. In your setup, this may be different.

If you made some errors in setup, and want to start fresh, check this cleanup script, and run it to remove kadalu namespace completely.

curl -s https://raw.githubusercontent.com/kadalu/kadalu/devel/extras/scripts/cleanup | bash

Reach out

Best is opening an issue in github.
Reach to us on Slack (Note, there would be no history) - https://kadalu.slack.com

Contributing

We would like your contributions to come as feedbacks, testing, development etc. See CONTRIBUTING for more details.

If you are interested in financial donation to the project, or to the developers, you can do so at our opencollective page. (We like github sponsors too, but its still in waiting list for an org in India).

Helm support

helm install kadalu --namespace kadalu --create-namespace https://github.com/kadalu/kadalu/releases/latest/download/kadalu-helm-chart.tgz --set-string kubernetesDistro=$K8S_DIST

Where K8S_DIST can be one of below values:

kubernetes
openshift
rke
microk8s

If --set-string isn't supplied kubernetes will be used as default.

For Kadalu Versions >= 0.8.5 helm chart is broken into subcharts, refer below for installation:

# Takes values as mentioned above
K8S_DIST=kubernetes
curl -sL https://github.com/kadalu/kadalu/releases/latest/download/kadalu-helm-chart.tgz -o /tmp/kadalu-helm-chart.tgz

# First install operator
helm install operator --namespace kadalu --create-namespace /tmp/kadalu-helm-chart.tgz --set operator.enabled=true --set global.kubernetesDistro=$K8S_DIST

# Incase of Kadalu upgrade verify pod eviction and no usage of Kadalu PVCs, for fresh installation just proceed after operator deployment
helm install csi-nodeplugin --namespace kadalu /tmp/kadalu-helm-chart.tgz --set csi-nodeplugin.enabled=true --set global.kubernetesDistro=$K8S_DIST

NOTE: We are still evolving with Helm chart based development, and happy to get contributions on the same.

Platform supports

We support x86_64 (amd64) by default (all releases, devel and latest tags), and from release 0.8.3 tag arm64 and arm/v7 is supported.

For any other platforms, we need users to confirm it works by building images locally. Once it works, we can include it in our automated scripts. You can confirm the build by command make release after checkout of the repository in the respective platform.

How to pronounce kadalu ?

One is free to pronounce 'kaDalu' as they wish. Below is a sample of how we pronounce it!

Request: If you like the project, give a github star :-)

kadalu's People

Contributors

Stargazers

Watchers

Forkers

amarts raghu999 edwberger papanito mykaul fs-system avaussant sac manvantarah sunnyku kiss-my sankarshanmukhopadhyay ffacp vatsa287 noshenxian samppam danielchudc bdudelsack j13tw shaneutt yuqaf1989 tuxalex schmitch kubernetes-arm robojones sbhnet parinay joibel naidu-kjml fuhrmannb ook bholmgren usrbinkat sputnik13 jimb0bmij leelavg manav-7 aravindavk hiteshmah-jan ornias1993 magicaljellybeans yugyama suryatmodulus geanttechnology cloud-native-collections tili41 kumargaurav522 aland-zhang dereulenspiegel gpmidi beardbucket gnossen abelard2008 vherrlein winterzhou zhengguo-jari matteomanzoni rafis pinkbluersglobal fastlorenzo danielzhanghl chrisphillips-cminion sasagarw mohitbis adithyaakrishna chuegel timoses iutx zhudongmei webclinic017 johalaoui scloud-swebz blampe skrobul mohammaddawoodshaik hunter86bg zoran-liu jeffguorg lucas-dclrcq nitramrc wdec sreeswetha45 ciscosystems flazcano yanbihui mariamatthews liyuntao yehaifeng jkonecny75 cloudian handrea2009 dellnoantechnp madoe jochenseeber

kadalu's Issues

RFE: Operator fails to run on Openshift due to permission failure

Looks like need to work on a correct scc context. Need expert advise here.

Issue with reusing same disk image with newer version

I have used a device which was used with a previous kadalu install (both used for testing), and when I try to use it again get below error:

[2020-01-13 09:36:06.149709] E [MSGID: 113063] [posix-common.c:663:posix_init] 0-storage-pool-1-posix: mismatching volume-id (1e67ddba-35e8-11ea-945f-0242ac110004) received. already is a part of volume 06be7d4a-31ec-11ea-9ed3-0242ac110004 
[2020-01-13 09:36:06.149789] E [MSGID: 101019] [xlator.c:626:xlator_init] 0-storage-pool-1-posix: Initialization of volume 'storage-pool-1-posix' failed, review your volfile again
[2020-01-13 09:36:06.149801] E [MSGID: 101066] [graph.c:361:glusterfs_graph_init] 0-storage-pool-1-posix: initializing translator failed

We should have an understanding how to use the data which has different glusterfs volume before. This may be expected behavior, but took some time for me to resolve. At least it would be great to add a check like these in kubectl_kadalu scripts so, users using that would get best experience.

Send constant CID (client-id) per deployment

Right now, the analytics we get is skewed as we send cid (ie, client-id) as the timestamp. Ideally it should be a random number (uuid?) constant for deployment.

This can be achieved by having a uuid generated if it doesn't exists in configmap, and while sending analytics, it should be used.

Install project dependencies

This project has dependencies it would be good to add those into a requirement.txt file and update Readme to install same to setup development environment

Scale Testing

Understanding how many PVs can be created with a node of 16GB RAM, 4TB disk etc.

Just measure the create rate
Also measure if I/O can also scale from all the pods.

Failed to provision volume with StorageClass "kadalu.replica3"

Hi,
I try to setup kadalu on aks cluster but csi-provisioner raise error like:

Warning  ProvisioningFailed    3s                 kadalu_csi-provisioner-0_27e261cc-0c6d-11ea-a863-7ec92e551436  failed to provision volume with StorageClass "kadalu.replica3": rpc error: code = Unknown desc = Exception calling application: [Errno 17] File exists: '/mnt/kadalu-storage-replica-pool-3'

Details:

I1121 14:50:40.022459       1 controller.go:926] provision "default/kadalu-pvc" class "kadalu.replica3": started
I1121 14:50:40.028583       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"kadalu-pvc", UID:"40c1247f-75dd-43da-b304-ed200e5874dd", APIVersion:"v1", ResourceVersion:"238587", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/kadalu-pvc"
W1121 14:50:40.033254       1 controller.go:685] Retrying syncing claim "default/kadalu-pvc" because failures 5 < threshold 15
E1121 14:50:40.033333       1 controller.go:700] error syncing claim "default/kadalu-pvc": failed to provision volume with StorageClass "kadalu.replica3": rpc error: code = Unknown desc = Exception calling application: [Errno 17] File exists: '/mnt/kadalu-storage-replica-pool-3'
I1121 14:50:40.033410       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"kadalu-pvc", UID:"40c1247f-75dd-43da-b304-ed200e5874dd", APIVersion:"v1", ResourceVersion:"238587", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "kadalu.replica3": rpc error: code = Unknown desc = Exception calling application: [Errno 17] File exists: '/mnt/kadalu-storage-replica-pool-3'


[2019-11-21 14:58:40,050] DEBUG [controllerserver - 42:CreateVolume] - Filters applied to choose storage   hostvol_type=Replica3
[2019-11-21 14:58:40,051] DEBUG [controllerserver - 47:CreateVolume] - Got list of hosting Volumes   volumes=kadalu-storage-replica-pool-3
[2019-11-21 14:58:40,051] ERROR [_server - 445:_call_behavior] - Exception calling application: [Errno 17] File exists: '/mnt/kadalu-storage-replica-pool-3'
Traceback (most recent call last):
  File "/usr/local/lib64/python3.7/site-packages/grpc/_server.py", line 435, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/kadalu/controllerserver.py", line 50, in CreateVolume
    hostvol = mount_and_select_hosting_volume(host_volumes, pvsize)
  File "/kadalu/volumeutils.py", line 122, in mount_and_select_hosting_volume
    mount_glusterfs(hvol, mntdir)
  File "/kadalu/volumeutils.py", line 420, in mount_glusterfs
    os.makedirs(target_path, exist_ok=True)
  File "/usr/lib64/python3.7/os.py", line 221, in makedirs
    mkdir(name, mode)
FileExistsError: [Errno 17] File exists: '/mnt/kadalu-storage-replica-pool-3'

My test configuration:

---
apiVersion: kadalu-operator.storage/v1alpha1
kind: KadaluStorage
metadata:
  # This will be used as name of PV Hosting Volume
  name: kadalu-storage-replica-pool-3
spec:
  type: Replica3  # Notice that this field tells kadalu operator to use replicate module.
  storage:
    - node: aks-aks-35064155-0
      path: /mnt
    - node: aks-aks-35064155-1
      path: /mnt
    - node: aks-aks-35064155-2
      path: /mnt
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: kadalu-pvc
spec:
  storageClassName: kadalu.replica3
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: volume-kadalu-access-pod
spec:
  volumes:
  - name: kadalu-pvc-volume
    persistentVolumeClaim:
      claimName: kadalu-pvc
  containers:
  - name: alpine
    image: ubuntu
    command: ['sh', '-c', 'while true; do sleep 3;done']
    volumeMounts:
    # Configuration volumes:
    - name: kadalu-pvc-volume
      mountPath: /mnt/kadalu-pvc-volume

I login into each of my aks node and there is not /mnt/kadalu-storage-replica-pool-3 directory path.
I also change the paths in KadaluStorage /mnt to be like /mnt/aaa, /mnt/bbb, /mnt/ccc, and start from clean environment, but the error is still the same.

Add CI/CD for replica3 volume

It is not possible to make sure we have 100% usecases covered manually for each PR. If a case is not covered in CI/CD, the usecase can't be supported.

We need to make sure we also test Replica3 in CI/CD. Basic things shouldn't come as bug to users like #65

mkdir /var/lib/kubelet/pods/xxxxx/volumes/kubernetes.io~csi/pvc-xxx/mount: file exists

I try to install nextcloud with mariadb with helm, using kadalu.replica1 as persistent storage. This fails with an error related to storage

pod/nextcloud-mariadb-master-0 
Error: failed to start container "mariadb": Error response from daemon: error while creating mount source path '/var/lib/kubelet/pods/733aabd1-ed37-42a6-beb2-3e29ee4c24d5/volumes/kubernetes.io~csi/pvc-e280b38a-edd4-495e-8bd5-09389bf58238/mount': 
mkdir /var/lib/kubelet/pods/733aabd1-ed37-42a6-beb2-3e29ee4c24d5/volumes/kubernetes.io~csi/pvc-e280b38a-edd4-495e-8bd5-09389bf58238/mount: file exists

Steps to reproduce

Install nextcloud using persistent volume claim when installing via helm chart

Create values.yaml defining persistence using ReadWriteMany (shows only relevant part)

 nextcloud:
   host: nc.xxxx.dev
   username: xxxx
   password: REDACTED
   update: 0
   datadir: /var/www/html/data
   tableprefix:

 internalDatabase:
   enabled: false
   name: nextcloud

 externalDatabase:
   enabled: false

 mariadb:
   enabled: true

   securityContext:
     fsGroup: 82
     runAsUser: 33

   db:
     name: nextcloud
     user: nextcloud
     password: xxx

   persistence:
     enabled: false
     accessMode: ReadWriteMany
     size: 8Gi

   master:
     persistence:
       accessModes: 
         - ReadWriteMany
   slave:
     persistence:
       accessModes:
         - ReadWriteMany

Install helm chart

helm install nextcloud stable/nextcloud -f nextcloud.values.yml -n nextcloud

Expected Result

No errors

Actual Result

0s          Warning   Failed                   pod/nextcloud-mariadb-master-0                             Error: failed to start container "mariadb": Error response from daemon: error while creating mount source path '/var/lib/kubelet/pods/733aabd1-ed37-42a6-beb2-3e29ee4c24d5/volumes/kubernetes.io~csi/pvc-e280b38a-edd4-495e-8bd5-09389bf58238/mount': mkdir /var/lib/kubelet/pods/733aabd1-ed37-42a6-beb2-3e29ee4c24d5/volumes/kubernetes.io~csi/pvc-e280b38a-edd4-495e-8bd5-09389bf58238/mount: file exists

Additional info

# kubectl -n nextcloud get pvc  
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
data-nextcloud-mariadb-master-0   Bound    pvc-e280b38a-edd4-495e-8bd5-09389bf58238   8Gi        RWX            kadalu.replica1   11m
data-nextcloud-mariadb-slave-0    Bound    pvc-4c97e858-ab2d-44df-92e1-29d22c9c9fe6   8Gi        RWX            kadalu.replica1   11m
nextcloud-nextcloud               Bound    pvc-f999ed74-c682-41e1-b9bf-d067b84cee4d   500Gi      RWX            kadalu.replica1   11m
# kubectl -n nextcloud get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS        CLAIM                                       STORAGECLASS      REASON   AGE
pvc-4c97e858-ab2d-44df-92e1-29d22c9c9fe6   8Gi        RWX            Delete           Bound         nextcloud/data-nextcloud-mariadb-slave-0    kadalu.replica1            11m
pvc-e280b38a-edd4-495e-8bd5-09389bf58238   8Gi        RWX            Delete           Bound         nextcloud/data-nextcloud-mariadb-master-0   kadalu.replica1            11m
pvc-f999ed74-c682-41e1-b9bf-d067b84cee4d   500Gi      RWX            Delete           Bound         nextcloud/nextcloud-nextcloud               kadalu.replica1            11m

RFE: make kadalu-server run on top of RWO PV itself.

Most of the cloud providers provide a decent RWO PV support. Can kadalu-server run on top of the RWO PV, instead of having a need for admin manually setup a block device, or an export directory.

Failed to generate grpc python code

/usr/bin/python3: Error while finding module specification for 'grpc_tools.protoc' (ModuleNotFoundError: No module named 'grpc_tools')

Add Developer readme which will help for new comers to setup the environment

Reduce Quotad logs

When a PV is provisioned, QuotaD crawls and sets the quota on each volume. It does it in a constant loop in 1second interval.

As we have a log there, it is flooding log files. Looks like we need to do it only when we do quota set first time, and not everytime.

[2019-12-17 08:32:28,322] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:29,334] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:30,345] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:31,357] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:32,370] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:33,382] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:34,395] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:35,407] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:36,420] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:37,453] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:38,465] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:39,503] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:40,512] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:41,522] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:42,533] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:43,541] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:44,570] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:45,581] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:46,590] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:47,600] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:48,618] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:49,628] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:50,645] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:51,658] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:52,672] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:53,686] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:54,718] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:55,731] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:56,742] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:57,756] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:58,768] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:32:59,785] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:33:00,800] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:33:01,812] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:33:02,823] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:33:03,836] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:33:04,858] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:33:05,869] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:33:06,881] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600
[2019-12-17 08:33:07,893] INFO [quotad - 76:handle_quota] - Quota Set	 path=/subvol/e4/1f/pvc-36de976c-261f-4239-b96e-924db2776ad9 size=629145600

Need to install some debugging tools till we reach v1.0

Right now, kadalu repositories are in development. While most of the things are still under development, we need tools like 'ping' (to verify if dns resolution is working), telnet etc.

issue to start gluster server pods

[amar@localhost kadalu]$ kubectl logs server-storage-pool-1-0-minikube-0 -nkadalu --all-containers
Traceback (most recent call last):
  File "/kadalu/server.py", line 33, in <module>
    start_server_process()
  File "/kadalu/server.py", line 20, in start_server_process
    glusterfsd.start()
  File "/kadalu/glusterfsd.py", line 128, in start
    create_and_mount_brick(brick_device, brick_path, brickfs)
  File "/kadalu/glusterfsd.py", line 117, in create_and_mount_brick
    execute("mount", "-oprjquota", brick_device, mountdir)
  File "/kadalu/kadalulib.py", line 60, in execute
    raise CommandException(proc.returncode, out.strip(), err.strip())
kadalulib.CommandException: [1] b'' b'mount: /bricks/storage-pool-1/data: /brickdev/disk1 is already mounted.'
[amar@localhost kadalu]$

This happened when I exported a 'file' as device.

    - node: minikube             # node name as shown in `kubectl get nodes`
      device: /home/export/disk1 # Device to provide storage to all PVs

storage export pods not created

I followed the guidelines but somehow the "storage export pods" are not created. This is my storage config which uses /dev/md3.

apiVersion: kadalu-operator.storage/v1alpha1
kind: KadaluStorage
metadata:
 # This will be used as name of PV Hosting Volume
  name: storage-pool-1
spec:
  type: Replica1
  storage:
    - node: node1
      device: /dev/md3
    #- node: node2
    #  device: /dev/md3

but no pods

# kubectl get pods -nkadalu  
NAME                        READY   STATUS    RESTARTS   AGE
csi-nodeplugin-t4drf        3/3     Running   0          14h
csi-nodeplugin-xbvdq        3/3     Running   0          14h
csi-provisioner-0           4/4     Running   0          14h
operator-68649f4bb6-wz5ph   1/1     Running   0          14h

pvc thus stay in state pending

# kubectl apply -f sample-pvc.yml 
persistentvolumeclaim/pv1 created
# kubectl get pvc
NAME   STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS      AGE
pv1    Pending                                      kadalu.replica1   12m

Operator logs

I've first applied a wrong config - 2 nodes configured instead 1. However I corrected it and re-applied again. There is nothing else logged after that

[2020-01-04 20:22:53,218] INFO [main - 405:deploy_csi_pods] - Deployed CSI Pods	 manifest=/kadalu/templates/csi.yaml
[2020-01-04 20:22:53,412] DEBUG [rest - 219:request] - response body: {"kind":"StorageClassList","apiVersion":"storage.k8s.io/v1","metadata":{"selfLink":"/apis/storage.k8s.io/v1/storageclasses","resourceVersion":"436497"},"items":[]}
[2020-01-04 20:22:54,538] INFO [main - 445:deploy_storage_class] - Deployed StorageClass	 manifest=/kadalu/templates/storageclass.yaml
[2020-01-05 10:11:14,330] DEBUG [main - 368:crd_watch] - Event	 operation=ADDED object={'apiVersion': 'kadalu-operator.storage/v1alpha1', 'kind': 'KadaluStorage', 'metadata': {'annotations': {'kubectl.kubernetes.io/last-applied-configuration': '{"apiVersion":"kadalu-operator.storage/v1alpha1","kind":"KadaluStorage","metadata":{"annotations":{},"name":"storage-pool-1","namespace":"default"},"spec":{"storage":[{"device":"/dev/md3","node":"node1"},{"device":"/dev/md3","node":"5.9.87.14"}],"type":"Replica1"}}\n'}, 'creationTimestamp': '2020-01-05T10:11:14Z', 'generation': 1, 'name': 'stor5.9.87.14age-pool-1', 'namespace': 'default', 'resourceVersion': '560337', 'selfLink': '/apis/kadalu-operator.storage/v1alpha1/namespaces/default/kadalustorages/storage-pool-1', 'uid': '78ccca1e-a4a4-4b2c-a553-15399b7cbdec'}, 'spec': {'storage': [{'device': '/dev/md3', 'node': 'node1'}, {'device': '/dev/md3', 'node': 'node2'}], 'type': 'Replica1'}}
[2020-01-05 10:11:14,330] ERROR [main - 109:validate_volume_request] - Invalid number of storage directories/devices specified
[2020-01-05 10:11:14,331] DEBUG [main - 275:handle_added] - validation of volume request failed	 yaml={'apiVersion': 'kadalu-operator.storage/v1alpha1', 'kind': 'KadaluStorage', 'metadata': {'annotations': {'kubectl.kubernetes.io/last-applied-configuration': '{"apiVersion":"kadalu-operator.storage/v1alpha1","kind":"KadaluStorage","metadata":{"annotations":{},"name":"storage-pool-1","namespace":"default"},"spec":{"storage":[{"device":"/dev/md3","node":"node1"},{"device":"/dev/md3","node":"node2"}],"type":"Replica1"}}\n'}, 'creationTimestamp': '2020-01-05T10:11:14Z', 'generation': 1, 'name': 'storage-pool-1', 'namespace': 'default', 'resourceVersion': '560337', 'selfLink': '/apis/kadalu-operator.storage/v1alpha1/namespaces/default/kadalustorages/storage-pool-1', 'uid': '78ccca1e-a4a4-4b2c-a553-15399b7cbdec'}, 'spec': {'storage': [{'device': '/dev/md3', 'node': 'node1'}, {'device': '/dev/md3', 'node': 'node2'}], 'type': 'Replica1'}}
[2020-01-05 10:12:53,170] DEBUG [main - 368:crd_watch] - Event	 operation=MODIFIED object={'apiVersion': 'kadalu-operator.storage/v1alpha1', 'kind': 'KadaluStorage', 'metadata': {'annotations': {'kubectl.kubernetes.io/last-applied-configuration': '{"apiVersion":"kadalu-operator.storage/v1alpha1","kind":"KadaluStorage","metadata":{"annotations":{},"name":"storage-pool-1","namespace":"default"},"spec":{"storage":[{"device":"/dev/md3","node":"node1"}],"type":"Replica1"}}\n'}, 'creationTimestamp': '2020-01-05T10:11:14Z', 'generation': 2, 'name': 'storage-pool-1', 'namespace': 'default', 'resourceVersion': '560583', 'selfLink': '/apis/kadalu-operator.storage/v1alpha1/namespaces/default/kadalustorages/storage-pool-1', 'uid': '78ccca1e-a4a4-4b2c-a553-15399b7cbdec'}, 'spec': {'storage': [{'device': '/dev/md3', 'node': 'node1'}], 'type': 'Replica1'}}
[2020-01-05 10:12:53,170] WARNING [main - 332:handle_modified] - MODIFIED handle called, but not implemented

my disk configuration

#75 sudo fdisk -l
Disk /dev/sda: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: ST33000650NS    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 7DC8C4C7-714F-4628-89AB-A0F2995A369F

Device          Start        End    Sectors  Size Type
/dev/sda1        4096   33558527   33554432   16G Linux RAID
/dev/sda2    33558528   34607103    1048576  512M Linux RAID
/dev/sda3    34607104 2182090751 2147483648    1T Linux RAID
/dev/sda4  2182090752 5860533134 3678442383  1.7T Linux RAID
/dev/sda5        2048       4095       2048    1M BIOS boot

Partition table entries are not in disk order.


Disk /dev/sdb: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: ST33000650NS    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 3EAD1041-D4F8-4465-9184-B452F51E8531

Device          Start        End    Sectors  Size Type
/dev/sdb1        4096   33558527   33554432   16G Linux RAID
/dev/sdb2    33558528   34607103    1048576  512M Linux RAID
/dev/sdb3    34607104 2182090751 2147483648    1T Linux RAID
/dev/sdb4  2182090752 5860533134 3678442383  1.7T Linux RAID
/dev/sdb5        2048       4095       2048    1M BIOS boot

Partition table entries are not in disk order.


Disk /dev/md0: 16 GiB, 17162043392 bytes, 33519616 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/md1: 511 MiB, 535822336 bytes, 1046528 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/md3: 1.7 TiB, 1883227226112 bytes, 3678178176 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/md2: 1023.9 GiB, 1099376361472 bytes, 2147219456 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Use readinessProbe and livelinessProbe for Kadalu Pods

Readiness Probe helps Kubernetes to know that these Pods are ready. This can be done by creating a file after process start in startup.sh(https://github.com/kadalu/kadalu/pull/97/files#diff-ad46fbb7f3f2af7be618f73637c5389aR23)

readinessProbe:
    exec:
        command:
            - cat
            - /kadalu/started
  initialDelaySeconds: 5
  periodSeconds: 1
  failureThreshold: 300

Liveliness probe is to restart Kadalu Pod/container based on health. (Liveliness probe requirements TBD)

If glusterfsd is hung
Disk failure
Brick pods are not reachable from csi pods?

Configure monitoring

Using prometheus, also needs grafana dashboard to view these metrics.

Reduce the size of the images

Currently some image sizes are 200MB, 400MB and 600MB... looks like we can reduce the size by having a opinionated stack.

release v0.2 with glusterfs-7.0

GlusterFS 7.0 is in RC state now. Once the release is out, it would be good to make a release.

Also, it would be great to have a release with glusterfs-7.0 available to kadalu users.

Analytics of usage

It would be great to add analytics on usage (specially just the downloads), and details on which version it uses etc. With this information, it is easier to support users, and also provide features in particular version.

consider using fuse's write-cache on CSI mount.

Fuse's write-cache would give much better performance than that of glusterfs's internal caching. It would be good to provide that here.

Enhancement: backup options for PVs

Right now, no specific designs in place. We can consider multiple options:

Use Gluster's geo-replication.
Considering we use backend differently, use reflink, and transfer from the clone.
Use tools like https://backube.github.io/snapscheduler/ and backup natively in k8s way.
Shift to using ZFS / btrfs, and use their snapshots, and transfer data.
validate possibilities of xfs_dump to get incremental backup

Keeping the issue open for capturing attention of people passing through, and those who have suggestions. This is not in immediate Roadmap, but surely one thing we want to have if community/users are interested. Express what you think, and we can collaboratively get to a great solution 👍

FAQ and Troubleshooting guide

Prepare a Troubleshooting guide for Kadalu.

When a storage pod is down, how will mounts behave?
One storage pod/node is down in Replica3 volume
PV is created, but my application pod is not coming up
PVC status is still showing "Pending"
Understanding all Log files
I added storage, but not showing when I run kubectl get pods -n kadalu
How to remove the storage?

More examples needed.

Hi,
Could you provide more examples, what to do when we have more nodes, and how to configure kadalu to have at least one mirroring data in case of some crash?

General some descriptions regarding data recovery plan or redeploying kadalu would be useful.

Best regards.

RFE: PV resize support (through CSI module)

Decide and design possible resize feature.

Will add more details soon. Ping here if you are interested to pick it up

Kadalu csi driver performance in context of kernel < 4.17

Hello.
As I play strongly about rook vs kadalu setup I notice that CSI driver in rook are affected by very low performance bug describe in rook github.

To solve that they require to switch to kernel >= 4.17 that support quotas.

Unfortunate azure just have in plans to upgrade theirs kernels, as we can read here

So is it something to worry regarding kernel < 4.17 and kadalu?

Creation of `ReatWriteOnce` fails with `mounter.SetupAt failed: rpc error: code = Unknown desc = Exception calling application: [32] b`

Claiming volumes with ReatWriteOnce fails with

MountVolume.SetUp failed for volume "pvc-81e1f518-0249-41e8-b2d3-d9027e7c00bd" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = Unknown desc = Exception calling application: [32] b'' b'mount: /var/lib/kubelet/pods/2611b8bf-ce6d-451a-adf8-f69d00cf0f1c/volumes/kubernetes.io~csi/pvc-81e1f518-0249-41e8-b2d3-d9027e7c00bd/mount: mount(2) system call failed: Transport endpoint is not connected.'

Steps to reproduce

Install prometheus using persistent volume claim when installing via helm chart

Create values.yaml defining persistence using ReadWriteOnce

...
  persistentVolume:
    enabled: true
    accessModes:
      - ReadWriteOnce
...

Install helm chart

helm install prometheus stable/prometheus -f prometheus.values.yml --namespace prometheus

Expected Result

No errors

Actual Result

MountVolume.SetUp failed for volume "pvc-81e1f518-0249-41e8-b2d3-d9027e7c00bd" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = Unknown desc = Exception calling application: [1] b'' b''
Unable to attach or mount volumes: unmounted volumes=[storage-volume], unattached volumes=[config-volume storage-volume prometheus-alertmanager-token-dhs7w]: timed out waiting for the condition
MountVolume.SetUp failed for volume "pvc-81e1f518-0249-41e8-b2d3-d9027e7c00bd" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = Unknown desc = Exception calling application: [32] b'' b'mount: /var/lib/kubelet/pods/2611b8bf-ce6d-451a-adf8-f69d00cf0f1c/volumes/kubernetes.io~csi/pvc-81e1f518-0249-41e8-b2d3-d9027e7c00bd/mount: mount(2) system call failed: Transport endpoint is not connected.'
MountVolume.SetUp failed for volume "pvc-81e1f518-0249-41e8-b2d3-d9027e7c00bd" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = Unknown desc = Exception calling application: [32] b'' b"mount: /var/lib/kubelet/pods/2611b8bf-ce6d-451a-adf8-f69d00cf0f1c/volumes/kubernetes.io~csi/pvc-81e1f518-0249-41e8-b2d3-d9027e7c00bd/mount: can't read superblock on /dev/loop0."
Unable to attach or mount volumes: unmounted volumes=[storage-volume], unattached volumes=[prometheus-alertmanager-token-dhs7w config-volume storage-volume]: timed out waiting for the condition
MountVolume.SetUp failed for volume "pvc-81e1f518-0249-41e8-b2d3-d9027e7c00bd" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = Unknown desc = Exception calling application: [32] b'' b"mount: /var/lib/kubelet/pods/2611b8bf-ce6d-451a-adf8-f69d00cf0f1c/volumes/kubernetes.io~csi/pvc-81e1f518-0249-41e8-b2d3-d9027e7c00bd/mount: can't read superblock on /dev/loop1."
Unable to attach or mount volumes: unmounted volumes=[storage-volume], unattached volumes=[storage-volume prometheus-alertmanager-token-dhs7w config-volume]: timed out waiting for the condition

Additional info

Changing the access mode to ReadWriteMany works fine

# kubectl get pvc -n prometheus
NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
prometheus-alertmanager   Bound    pvc-4309b9c4-22d7-434b-9f6b-8c4a9bace025   2Gi        RWX            kadalu.replica1   101m
prometheus-server         Bound    pvc-042277ec-abee-4e6b-a5e7-5d13f7959f07   8Gi        RWX            kadalu.replica1   101m
# kubectl get events -n prometheus
LAST SEEN   TYPE      REASON                   OBJECT                                   MESSAGE
<unknown>   Normal    Scheduled                pod/prometheus-server-76b7cf695-5qls6    Successfully assigned prometheus/prometheus-server-76b7cf695-5qls6 to node001
21m         Normal    SuccessfulAttachVolume   pod/prometheus-server-76b7cf695-5qls6    AttachVolume.Attach succeeded for volume "pvc-042277ec-abee-4e6b-a5e7-5d13f7959f07"
21m         Normal    Pulling                  pod/prometheus-server-76b7cf695-5qls6    Pulling image "jimmidyson/configmap-reload:v0.2.2"
21m         Normal    Pulled                   pod/prometheus-server-76b7cf695-5qls6    Successfully pulled image "jimmidyson/configmap-reload:v0.2.2"
21m         Normal    Created                  pod/prometheus-server-76b7cf695-5qls6    Created container prometheus-server-configmap-reload
21m         Normal    Started                  pod/prometheus-server-76b7cf695-5qls6    Started container prometheus-server-configmap-reload
21m         Normal    Pulling                  pod/prometheus-server-76b7cf695-5qls6    Pulling image "prom/prometheus:v2.13.1"
21m         Normal    Pulled                   pod/prometheus-server-76b7cf695-5qls6    Successfully pulled image "prom/prometheus:v2.13.1"
20m         Normal    Created                  pod/prometheus-server-76b7cf695-5qls6    Created container prometheus-server
20m         Normal    Started                  pod/prometheus-server-76b7cf695-5qls6    Started container prometheus-server

DNS-1123 label: spec.hostname: Invalid value

When following tutorials, the operator goes into CrashLoopBackOff

Logs from operator:

[2019-11-06 06:25:23,240] INFO [main - 93:get_brick_device_dir] - {'device': '/dev/sdb', 'node': 'kubenode1.dev.lan'}
[2019-11-06 06:25:23,240] INFO [main - 95:get_brick_device_dir] - /dev/sdb
Traceback (most recent call last):
  File "/kadalu/main.py", line 344, in <module>
    main()
  File "/kadalu/main.py", line 339, in main
    crd_watch(core_v1_client, k8s_client)
  File "/kadalu/main.py", line 253, in crd_watch
    handle_added(core_v1_client, obj)
  File "/kadalu/main.py", line 201, in handle_added
    deploy_server_pods(obj)
  File "/kadalu/main.py", line 169, in deploy_server_pods
    execute(KUBECTL_CMD, "create", "-f", filename)
  File "/kadalu/kadalulib.py", line 51, in execute
    raise CommandException(proc.returncode, out.strip(), err.strip())
kadalulib.CommandException: [1] b'' b'Error from server (AlreadyExists): error when creating "/kadalu/templates/server.yaml": statefulsets.apps "server-storage-pool-1-kubenode1.dev.lan" already exists'

when describing the operator-initiated statefulset:

<clip>
 Warning  FailedCreate  8m25s (x18 over 19m)  statefulset-controller  create Pod server-storage-pool-1-kubenode1.dev.lan-0 in StatefulSet server-storage-pool-1-kubenode1.dev.lan failed error: Pod "server-storage-pool-1-kubenode1.dev.lan-0" is invalid: spec.hostname: Invalid value: "server-storage-pool-1-kubenode1.dev.lan-0": a DNS-1123 label must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?')```

make replica3 volumes use 'read-hash-mode 3' option.

For the kadalu usecase, considering it may run mostly in cloud setup, using 'read-hash-mode 3' option for replicate xlator would save lot of money (in bandwidth). Considering the option will be available in glusterfs-7.0, it would be good to have the option when we will have CSI driver updated to use latest veresion.

Check Details here about this feature/option.

Test: add a test for setup and cleanup in a loop

This helps to makes sure, the cleanup script doesn't leave any resource pending.

That way, we are sure of what resources we are generating.

Move repo to its own separate org

Considering this project has many potential to growth, and more user participations in future, it would be great to move the repo to its own org, so one can collaborate better.

If it is OK for you, please consider moving the repo to https://github.com/kadalu

That way, project can have its own docs, webpage, and multiple repos.

RFE: provide helm chart

helm charts are a good way to have more reach in k8s world. Would be good to have a helm chart implemented for kadalu project.

RFE: Can kadalu operator take external gluster server and manage only CSI?

There are many possible setup where Gluster servers can be outside the k8s cluster, and there may be need for just providing CSI driver in k8s. Can the operator handle it?

I am thinking of config file to say something like:

# File: storage-config.yaml
---
apiVersion: kadalu-operator.storage/v1alpha1
kind: KadaluStorage
metadata:
  # This will be used as name of PV Hosting Volume
  name: storage-pool-1
spec:
  type: External   #Here, volume type can be anything
  storage:
    - node: node1
      volume: volume_name # Volume name
      options:
          - backup_volfile_server: node2, node3
             log_level: DEBUG

We can recommend to run the quota setting script as hook script in servers for such users, as we can't control setting of quota.

Need more logging in sidecar container (quotad)

kubectl -nkadalu logs server-storage-pool-1-worker0-0 -c quotad

doesn't give any outputs. If it has successfully started, it should say "started successfully".

Use sidecar container for Logging

CSI Pods mounts Gluster volumes and those logs are not in stderr of csi container because these mounts are child processes in CSI pods.

Use another sidecar container which watches on the log files created by these mounts and log to stderr.

Fix pylint issues

current codebase is having lint issues, need to check all modules for pylint issues in Travis CI

************* Module utils
csi/utils.py:1:0: C0111: Missing module docstring (missing-docstring)
csi/utils.py:11:0: C0103: Constant name "glusterfs_cmd" doesn't conform to UPPER_CASE naming style (invalid-name)
csi/utils.py:12:0: C0103: Constant name "info_dir" doesn't conform to UPPER_CASE naming style (invalid-name)
csi/utils.py:13:0: C0103: Constant name "volfiles_dir" doesn't conform to UPPER_CASE naming style (invalid-name)
csi/utils.py:14:0: C0103: Constant name "templates_dir" doesn't conform to UPPER_CASE naming style (invalid-name)
csi/utils.py:15:0: C0103: Constant name "reserved_size_percentage" doesn't conform to UPPER_CASE naming style (invalid-name)
csi/utils.py:20:0: C0111: Missing class docstring (missing-docstring)
csi/utils.py:24:0: C0111: Missing function docstring (missing-docstring)
csi/utils.py:28:0: C0111: Missing function docstring (missing-docstring)
csi/utils.py:29:4: C0103: Variable name "p" doesn't conform to snake_case naming style (invalid-name)
csi/utils.py:37:0: C0111: Missing function docstring (missing-docstring)
csi/utils.py:40:28: C0103: Variable name "f" doesn't conform to snake_case naming style (invalid-name)
csi/utils.py:52:32: C0103: Variable name "f" doesn't conform to snake_case naming style (invalid-name)
csi/utils.py:58:0: C0111: Missing function docstring (missing-docstring)
csi/utils.py:75:0: C0111: Missing function docstring (missing-docstring)
csi/utils.py:84:7: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
csi/utils.py:91:0: C0111: Missing function docstring (missing-docstring)
csi/utils.py:92:4: C0103: Variable name "st" doesn't conform to snake_case naming style (invalid-name)

-----------------------------------
Your code has been rated at 6.90/10

Documentation for gluster part missing

I read trough your documents but for me it is still not clear what is required or has to be done in terms of gluster.
Does glusterd need to be running on the host? What are the pre-requirements?

In your blog entry you say

expose the storage in one of the nodes of k8s, and provide storage config file to k8s.

Maybe you can elaborate this a bit more.

Thanks

Centralized logging

k8s takes care of having all '/dev/stdout' logs in a centralized format. Currently glusterfsd logs are all going to stdout, but the CSI logs are not going to /dev/stdout. Need to get them in central place, so debugging would be easy.

Start 'shd' container if 'Replica3' is choosen type.

Currently in server/server.py there is no self-heal daemon started. Would be great to have it running.

RFE: thin-arbiter (metro cluster) support

GlusterFS-7 is releasing with thin-arbiter support. Once released, if kadalu supports thin-arbiter cluster support, it will help many users who have only 2 data centers.

Setup CI/CD for PR validations

Travis, which does complete 'build', deploy, and testing of all the pods and CSI involved in project.

Update available size in Storage under lock

When a PV is created(subdir/virtblock), Free size is updated to a file. Use fcntl lock while updating the file to prevent wrong updates during parallel PV creation.

https://github.com/aravindavk/kadalu/blob/master/csi/volumeutils.py#L106-L115

While providing PVC as storage for the server pod 'node' shouldn't be mandatory

But if I remove the 'node' part it is erroring out

Ref #90 .

We need to fix it.

Use Type hints for all Python code

Type hints are available in Python 3(https://www.python.org/dev/peps/pep-0484). We can use that in all Python files.

Type hints will not add any runtime overhead, since Type checks to be done with the external tool mypy (pip3 install mypy and check with mypy <python-file>)

Example type hints:

def int2str(val: int) -> str:
    if val > 10:
        return val              # Error: Int is returned here
    return "%d" % val

Quotad error logs in kadalu server pod

Looks like the issue started happening once quota package is being done. Not sure yet.

Traceback (most recent call last):
  File "/kadalu/server.py", line 33, in <module>
    start_server_process()
  File "/kadalu/server.py", line 26, in start_server_process
    import quotad
  File "/kadalu/quotad.py", line 9, in <module>
    from .kadalulib import execute, logf, CommandException, \
ImportError: attempted relative import with no known parent package

Suggestion: Create a separate base-image

As of now, we use fedora30 as base-image for all our containers. And after that, each containers are prepared separately when we do make build-containers.

On a developer's laptop this is fine, but our CI/CD tests are taking 6 mins (out of 9 mins for total run) to build the containers.

One possibility is, we can have a separate docker file for base image, and have it uploaded at 'kadalu/base-image', and use the same and copy the files over in every container preparation.

That way, we can save time and possible errors of missing out something basic in base-image as the code gets repeated in 3 places in Dockerfile as of now.

Error when using file as a device

Saw these errors.. the brick directory was not created.

[2019-12-16 06:01:42.728811] E [MSGID: 113042] [posix-helpers.c:2264:posix_disk_space_check] 0-storage-pool-test-posix: statvfs failed on /bricks/storage-pool-test/data/brick [No such file or directory]
[2019-12-16 06:01:54.728403] W [MSGID: 113075] [posix-helpers.c:2075:posix_fs_health_check] 0-storage-pool-test-posix: open_for_write() on /bricks/storage-pool-test/data/brick/.glusterfs/health_check returned [No such file or directory]
[2019-12-16 06:01:54.728511] M [MSGID: 113075] [posix-helpers.c:2149:posix_health_check_thread_proc] 0-storage-pool-test-posix: health-check failed, going down
[2019-12-16 06:01:54.728624] M [MSGID: 113075] [posix-helpers.c:2167:posix_health_check_thread_proc] 0-storage-pool-test-posix: still alive! -> SIGTERM

enhancement: Add 'install' option to kubectl_kadalu

With kubectl-kadalu package, there are possibilities for us to further reduce the 'dependency' of knowing kubernetes for starting kadalu.

One such thing is, removing the need to run kubectl create -f https://kadalu.io/kadalu-operator.yaml, instead make it kubectl kadalu install [--version 1.0.0] or something similar. That way, all user need to do is pip3 install kubectl_kadalu and just use kubectl kadalu subcommands to get started.

cleanup stuck

While developing, I keep running ./extras/scripts/cleanup to cleanup kadalu namespace, and restart everything.

Most of the times, I get to below state, after which I have to kill the pod with --force.

[amar@localhost kadalu]$ kubectl get pods -nkadalu
NAME                                 READY   STATUS        RESTARTS   AGE
server-storage-pool-1-0-minikube-0   0/2     Terminating   0          65m

Would be good to figure out the reason.

Test: install operator from 0.4 version, and upgrade to latest (any release 0.5/0.6 etc)

Ideally, the upgrade should be smooth, without loosing any information about previously created PVs, storage config etc. The process of upgrading the operator as of now is like below:

check if operator and other csi drivers are all running.
just do kubectl apply -f new-operator-yaml-file
see that all the new versions should be installed now.

Correct the steps, and document the steps if some other procedures are required for upgrade.

logging sidecar not picking the latest log file

After merging #101, if we add a new storage to pool, the CSI mount logs are not logged to the stdout.

Expectation is, all of the client logs are in stdout so that can be persisted, and used in debugging without logging into the pods.