The discoblocks from ondat

Generate default storageclass

Storageclass name should be optional, and would be nice to create default storageclass per csi driver

validate service connection: CRI v1 runtime API is not implemented for endpoint

Kuttl test fails on github action:

2023-01-09T12:57:28.766132542Z stderr F ++ crictl --runtime-endpoint unix:///run/containerd/containerd.sock inspect --output go-template --template '{{.info.pid}}' c6498557205f69edad77ac4c8bdd929eef3032d9a6b32ecbfd7a5ebc37ba0972
2023-01-09T12:57:28.791778323Z stderr F time="2023-01-09T12:57:28Z" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
2023-01-09T12:57:28.79406194Z stderr F + PID=

But works well on my local box, so maybe there is some version mismatch.

Make metrics ports configurable

Replace Busybox in mount job

Busybox has lots of tools and it increases the attack surface. Need to replace it with specific commands maybe in a custom image.

Tail of StorageClass and finalizer

Discoblocks creates storageclass on topology bases. The created StorageClass has finalizer just like any other. But during DiskConfig deletion we remove finalizers of SCs. When customer recreates the config with same name and storageclass, Discoblocks re-uses the existing SC, but don't append finalizer! Not a big drama, just we loose the protection of those SCs.

Write documentation about Ondat integration

It would be nice to try and document how to create a single disk EBS backend for Ondat

Support scratch images

Current solution expects, mkdir, mknod and mount commands are available in the container. But this isn't the case all the time.

https://github.com/ondat/discoblocks/blob/main/drivers/ebs.csi.aws.com/main.go#L66-L69

Solve rotation of metrics cert

Currently Discoblocks uses one single cert, it creates secret per namespace once and doesn't take care on cert rotation or change.

Update documentation with new disk creation

New disk creation and all the related CRD changes must be documented somewhere. Related PR: #32

Fix code scanning alert - Pinned-Dependencies

Tracking issue for:

https://github.com/ondat/discoblocks/security/code-scanning/95

Detect repeated errors and pause autoscaling

Would be nice to detect upscaling issues and automatically pause autoscaling if we were not able to did it for a few times.

Automated release pipeline

It would be nice to create a pipeline to automatically publish images, manifests or other release related artifacts.

Low coolDown should kills provisioning

The system uses coolDown time for many wait operation, so if the value is too low it kills the context of provisioning. Because the relation of timeout and cooldown makes sense and low cooldown doesn't makes sense at all, i suggest to validate and decline low values

Produce metrics of discoblocks operations

Currently, only the log tells if something fails. It would be nice to produce metrics about the different operations, including failed and successful tasks. Kubernetes events also would be nice.

[TASK] Create architecture diagram

Context

We have had design sessions and requirements gathering meetings to pin down the requirements and design. We have decided to use the CSI driver route to make it possible for users to declaratively add disk from the backend from day 0, and also on day 2 for operations.

Acceptance Criteria

Create an architecture diagram according to https://www.notion.so/ondat/Technical-Requirements-b3d3cb6842d64bf6ada3fc70dd3b646e

Ondat driver isn't production ready

Ondat driver finds new disk by listing /var/lib/storageos by time. It picks the latest, but this should lead to concurrency issues in production. Also would be nice to make /var/lib/storageos configurable.

Replace nixery image for mount and resize job

Nixery image has been used for mount, resize jobs and metric ssidecar. For production it would be nice to replace it with some production ready solution.

Support autoscaling with pod.Spec.HostPID

Autoscaling and Pod.Spec.HostPID is not supported together in Discoblocks. We have to figure out how to find, format, mount, and resize a filesystem when hostPid is true.

Make socket paths configurable

Currently containerd and docker sockets are hard-coded into mount and resize job. Would be nice to turn them configurable.

Make volume monitor polling period configurable

The polling interval has been hard-coded into the codebase. It would be nice to make it configurable

Fix code scanning alert - Pinned-Dependencies

Tracking issue for:

https://github.com/ondat/discoblocks/security/code-scanning/96

Automatic snapshot feature

An auto snapshot of volumes looks a really low-hanging fruit

Support HTTPS for node exporter metrics

Currently, the node exporter exposes HTTP only. It would be nice to use a secure connection and/or a network policy to protect metrics endpoint.

Ensure resize job works well

Currently, the first PVC of a PVC group (/mountpoint-1, /mountpoint-2, /mountpoint-N) is resized by the CSI driver, and others via resize job.

It would be nice to double-check what happens when an additional volume needs to be resized after a restart. I guess CSI driver also
resizes the volume. What would happen with our resize job? Does it fail? Does it necessary at all? Should we detect this case to avoid unnecessary job execution?

Change %d to optional in mount path

%d in mount point makes Discoblock un-flexible. Would be nice to turn it into optional.

Find the best way to filter persistent volumes

PVC controller uses cached client with a custom indexer to find persistent volumes by claim name. Maybe (this needs to be proven) an uncached client should have better performance.

Proper health check instead of healthz.Ping

It would be nice to have a proper health check instead of healthz.Ping

Mutation webhook vs. side effects

Currently, the pod mutation webhook has side effects, but we tell Kubernetes it hasn't. Would be nice to set side effect of mutationWebhookConfiguration to Some

Fix code scanning alert - Incorrect conversion between integer types

Tracking issue for:

https://github.com/ondat/discoblocks/security/code-scanning/15

Replace Ondat version from develop to a GA release

Currently only develop version of Ondat supports online resize. Once we release the feature it would be nice to use GA release instead of develop in e2e tests.

ReadWriteDeamon vs. PVC finalizers

ReadWriteDeamon mode gives back the same volume if exists per node. If some deletes the diskconfig and then restarts daemonset pods, Disco block gives the volume back, but that volume doesn't have finalizer (because of the delete), so auto scaling would be skipped.

Make metrics service optional

Currently, service for metrics has been created for each pod and volume monitor uses service endpoints to fetch metrics of disks. Would be nice to make service optional (for customer usage) and change monitor to use pod ip instead.

Limit filesystem reports to discoblocks

Currently, node exporter exposes all the drives. It would be nice to limit drives with ignored-mount-points options

Update conditions of PVC

Only PVC names are updated at PVC.Status. It would be nice to update Conditions too

Create new disk on case of maximum size

Once disk capacity is maximum per PVC would be nice to create a new disk.

Initial creation and scaling are solved on the Kube native way, but a new disk for a running pod isn't trivial.

We need to:

create volume manually via CSI driver
format/mount drive
create PV and PVC in Kubernetes
ensure disks are exists after pod restart

As i see this project does something similar with persistent volumes: https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner

Full support of daemonsets

Currently, there are two availability options, readwriteonce, readwritesame. In the case of daemonset readwritesame isn't an option, because all pods are scheduled to the same node. On the other hand with readwriteonce, the daemonset pods get fresh new PVCs, so they lose connection with the volume after the restart. It would be nice to support daemonsets.

Watch pod deletion and umount volumes

Currently, volumes created by Discoblocks are mounted forever. So termination of PersistentVolume objects stuck forever.

Switch to telnets from plain text telnet

Currently, volume info travels in plain text. Would be nice some encryption. Also would be nice to use RBAC proxy to protect endpoint.

Socket mounts are hard coded

Currently, both containerd and docker sockets are hard-coded into the mount and resize jobs. It works nicely but creates unnecessary directories on the host if any of the sockets are missing. It would be nice to mount only the socket available on the host.

ondat / discoblocks Goto Github PK

discoblocks's People

Contributors

Stargazers

Watchers

Forkers

discoblocks's Issues

Context

Acceptance Criteria

Recommend Projects

Recommend Topics

Recommend Org