kubernetes-csi / external-attacher Goto Github PK
View Code? Open in Web Editor NEWSidecar container that watches Kubernetes VolumeAttachment objects and triggers ControllerPublish/Unpublish against a CSI endpoint
License: Apache License 2.0
Sidecar container that watches Kubernetes VolumeAttachment objects and triggers ControllerPublish/Unpublish against a CSI endpoint
License: Apache License 2.0
Currently when changing k8s API Objects the external attacher uses "Update" calls. These are less efficient than patching, they could also be problematic if there are small object changes in between Get and Update (accidental revert is possible).
They could also cause issue with mismatched API versions (this should not happen but we still want to minimize breakage/risk).
Specify the number of work threads, currently it is set 10 in the code.
CSI has supported raw block volumes since version 0.1, and Kubernetes since v1.9, but the external-attacher cannot attach a block volume.
When attaching raw block volumes we don't even see an error on the external-attacher logs saying it cannot handle it, it just goes ahead assuming it's a mount type attach.
Checked out a fresh master branch and ran "make test":
### test-vendor:
Repo uses 'dep' for vendoring.
# vendor is out of sync:
github.com/googleapis/gnostic: hash of vendored tree not equal to digest in Gopkg.lock
make: *** [release-tools/build.make:132: test-vendor] Error 1
$ dep version
dep:
version : v0.5.1
build date : 2019-03-11
git hash : faa6189
go version : go1.12
go compiler : gc
platform : linux/amd64
features : ImportDuringSolve=false
According to the CSI spec, the CO (or its components) shall invoke this RPC (controllerprobe) to determine readiness of the service
.
A CSI driver that follows the language of the spec will fail if a probe RPC is not invoked prior to other operations. If the CSI language is correct, then this constitutes a bug here.
As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.
The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".
Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)
Thanks so much, let me know if you have any questions.
(This issue was generated from a tool, apologies for any weirdness.)
[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md
We can also remove the fallback on Node annotation at the same time. This change will require a major version bump because we would have to bump the min K8s version to 1.17 when the object was introduced.
We should make this change at least 1, if not 2 releases before the v1beta gets removed from Kubernetes.
/lifecycle frozen
Hi everyone,
I am running into a bit of a weird issue.
I have one ppc64le machine in which external-attached builds without a problem:
$ git clone https://github.com/kubernetes-csi/external-attacher.git
$ cd external-attacher
$ make (there is an error reported trying to build for windows, but the linux build works)
$ ls bin/
csi-attacher
$ go env
GOARCH="ppc64le"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="ppc64le"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/root/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_ppc64le"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/root/external-attacher/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build135377313=/tmp/go-build -gno-record-gcc-switches"
The client-go library is:
$ ls $GOPATH/pkg/mod/k8s.io/ | grep client
[email protected]
However the exact same make command in a x86_64 machine produces the following:
$ make
mkdir -p bin
CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-X main.version=v2.0.0-0-g27b83ca9 -extldflags "-static"' -o ./bin/csi-attacher ./cmd/csi-attacher
go: creating new go.mod: module github.com/kubernetes-csi/external-attacher
go: copying requirements from Gopkg.lock
# k8s.io/client-go/rest
../go/pkg/mod/k8s.io/[email protected]+incompatible/rest/request.go:598:31: not enough arguments in call to watch.NewStreamWatcher
have (*versioned.Decoder)
want (watch.Decoder, watch.Reporter)
make: *** [build-csi-attacher] Error 2
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/johngouf/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/johngouf/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/johngouf/external-attacher/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build193746378=/tmp/go-build -gno-record-gcc-switches"
The client-go library is:
$ ls $GOPATH/pkg/mod/k8s.io/ | grep client
[email protected]+incompatible
Any idea what I am doing wrong? Is it a bug?
Thanks!
CSI 1.0 decorates sensitive fields with csi_secret
. Let's take advantage of this feature to programmatically ensure no sensitive fields are ever logged by this side car container.
If node with volumes managed by CSI plugin is removed from Kubernetes cluster before volumes are detached then detaching will newer succeed.
function csiDetach (in pkg/controller/csi_handler.go +296 (master)) is failing always because it tries to get nodeID from kubernetes api and as node does not anymore exists it fails.
This behavior was produced in Openstack with Cinder-CSI plugin during scale down of worker nodes. As there Openstack cloud provider deleted the node from Kubernetes cluster when it found out that the virtual machine did not existed anymore.
Volume detach should not automatically fail if node where it was attached isn't anymore part of the kubernetes cluster. It should still try to detach the volume. The nodeID could be passed as empty string and then the actual detaching implementation could then decide if it possible to detach the volume or if it might already be detached.
In case of Openstack volumes are automatically detached when virtual machine is deleted.
It's still using the alpha CRD instead of the beta in-tree API
/help
It's still using the beta object. I think we can safely make this change without a major version bump since v1.VolumeAttachment was added in 1.13 which is below our min supported version.
It is preferable to use a context instead of a stop channel when controlling the lifetime of a function call (either long-lived goroutine or just RPC calls). Context makes it easier to propagate cancellation and still allow operations with short timeouts in a long-lived routine.
Additionally, our code will be easier to test if functions like Run()
wait for their child goroutines to exit before returning themselves. This provides tests with a way to guarantee that all side-effects of a particular call have been executed.
This is an analysis of the following test flake: https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-serial/4398
Kubelet sees new plugin and registers it:
I1128 12:33:16.974128 6672 csi_plugin.go:116] kubernetes.io/csi: Trying to validate a new CSI Driver with name: pd.csi.storage.gke.io endpoint: /var/lib/kubelet/plugins/pd.csi.storage.gke.io/csi.sock versions: 1.0.0, foundInDeprecatedDir: false
I1128 12:33:16.974342 6672 csi_plugin.go:137] kubernetes.io/csi: Register new plugin with name: pd.csi.storage.gke.io at endpoint: /var/lib/kubelet/plugins/pd.csi.storage.gke.io/csi.sock
attach-detach controller is waiting for attach:
E1128 12:34:31.327964 1 csi_attacher.go:227] kubernetes.io/csi: attachment for projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002 failed: cannot find NodeID for driver "pd.csi.storage.gke.io" for node "bootstrap-e2e-minion-group-r7dh"
E1128 12:34:31.328073 1 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002\"" failed. No retries permitted until 2018-11-28 12:36:33.328041594 +0000 UTC m=+4563.274204929 (durationBeforeRetry 2m2s). Error: "AttachVolume.Attach failed for volume \"pvc-a2f50d29-f309-11e8-802b-42010a800002\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002\") from node \"bootstrap-e2e-minion-group-r7dh\" : cannot find NodeID for driver \"pd.csi.storage.gke.io\" for node \"bootstrap-e2e-minion-group-r7dh\""
external-attacher finally sees the annotation and attaches the volume:
I1128 12:35:35.359295 1 csi_handler.go:193] NodeID annotation added to "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:35:35.365952 1 csi_handler.go:203] VolumeAttachment "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea" updated with finalizer and/or NodeID annotation
I1128 12:35:35.366006 1 connection.go:235] GRPC call: /csi.v1.Controller/ControllerPublishVolume
I1128 12:35:35.366013 1 connection.go:236] GRPC request: volume_id:"projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002" node_id:"projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/instances/bootstrap-e2e-minion-group-r7dh" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1543408336401-8081-pd.csi.storage.gke.io" >
I1128 12:35:47.982680 1 connection.go:238] GRPC response:
I1128 12:35:47.982733 1 connection.go:239] GRPC error: <nil>
I1128 12:35:47.982744 1 csi_handler.go:131] Attached "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:35:47.982757 1 util.go:33] Marking as attached "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:35:47.998078 1 util.go:43] Marked as attached "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:35:47.998099 1 csi_handler.go:137] Fully attached "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
Due to exponential back off attach-detach controller won't "retry" attach until 12:36:33.328041594
kubelet is also waiting for attach-detach controller to ack volume is attached:
E1128 12:36:33.114836 6672 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002\"" failed. No retries permitted until 2018-11-28 12:38:35.11478352 +0000 UTC m=+4788.780131927 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume \"pvc-a2f50d29-f309-11e8-802b-42010a800002\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002\") pod \"pod-subpath-test-pd-csi-storage-gke-io-dynamicpv-ghvv\" (UID: \"a7db48f2-f309-11e8-802b-42010a800002\") "
attach-detach controller wakes up and sees the volume is attached:
I1128 12:36:33.332854 1 reconciler.go:289] attacherDetacher.AttachVolume started for volume "pvc-a2f50d29-f309-11e8-802b-42010a800002" (UniqueName: "kubernetes.io/csi/pd.csi.storage.gke.io^projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002") from node "bootstrap-e2e-minion-group-r7dh"
I1128 12:36:33.345070 1 csi_attacher.go:110] kubernetes.io/csi: attachment [csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea] for volume [projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002] already exists (will not be recreated)
I1128 12:36:33.345100 1 csi_attacher.go:148] kubernetes.io/csi: probing for updates from CSI driver for [attachment.ID=csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea]
I1128 12:36:33.345113 1 csi_attacher.go:157] kubernetes.io/csi: probing VolumeAttachment [id=csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea]
I1128 12:36:33.348650 1 csi_attacher.go:119] kubernetes.io/csi: attacher.Attach finished OK with VolumeAttachment object [csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea]
I1128 12:36:33.348731 1 operation_generator.go:335] AttachVolume.Attach succeeded for volume "pvc-a2f50d29-f309-11e8-802b-42010a800002" (UniqueName: "kubernetes.io/csi/pd.csi.storage.gke.io^projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002") from node "bootstrap-e2e-minion-group-r7dh"
Due to exponential back off on kubelet, it will not check for attached volume again until 2018-11-28 12:38:35.11478352
Before kubelet can check, the test timesout and gives up
Nov 28 12:37:23.159: INFO: Deleting pod "pod-subpath-test-pd-csi-storage-gke-io-dynamicpv-ghvv" in namespace "e2e-tests-csi-volumes-65prz"
So for the most part seems like everything is operating as expected.
Outstanding question is why did external-attacher not see the nodeID annotation until 2 min after it was added to node?
Attacher is logging the PV name which is created by other CSI drivers or created by intree storage provisioners
logs:
I1204 04:42:58.513497 1 controller.go:205] Started PV processing "pvc-579343b3-663d-45a7-98a9-996983c2f09e"
more info at rook/rook#4365
Right now, each CSI driver must run exactly one external attacher and we recommend running it in a StatefulSet.
We should fix the attacher to do the same leader election as provisioner and update documentation and examples everywhere so the attacher runs in the same deployment as the provisioner.
After trying to create the external-attacher example YML, I see a perm denied error on the Quay images.... assume theres an easy fix? Not sure what the one would be though.
[centos@ip-10-0-1-136 demo]$ sudo docker pull quay.io/k8scsi/mock-plugin
Using default tag: latest
Pulling repository quay.io/k8scsi/mock-plugin
Error: Status 403 trying to pull repository k8scsi/mock-plugin: "{\"error\": \"Permission Denied\"}"
(image name change / image deleted) ?
They should be optional as not all volume drivers will need it: kubernetes/kubernetes#69122
As I was playing around with topology I ran into an issue: I provisioned a volume and tried to attach it to a pod, which I expected. Then I deleted the pod successfully. The strange part is when I deleted the PVC, the PV went into Terminating state but never got deleted.
The PV still has an external-attach/driverName
finalizer preventing its deletion. The associated VolumeAttachment
object still exists, and it looks like this:
apiVersion: storage.k8s.io/v1beta1
kind: VolumeAttachment
metadata:
creationTimestamp: 2018-08-16T01:22:00Z
finalizers:
- external-attacher/com-google-csi-gcepd
name: csi-d4935b83fe8fb00a069d523e71d54c504366d49defd4c6cff33fe41f9cfa4b04
namespace: ""
resourceVersion: "460172"
selfLink: /apis/storage.k8s.io/v1beta1/volumeattachments/csi-d4935b83fe8fb00a069d523e71d54c504366d49defd4
c6cff33fe41f9cfa4b04
uid: c6230d88-a0f2-11e8-a4ee-42010a800002
spec:
attacher: com.google.csi.gcepd
nodeName: kubernetes-minion-group-11bk
source:
persistentVolumeName: pvc-c39b0473-a0f2-11e8-a4ee-42010a800002
status:
attachError:
message: PersistentVolume "pvc-c39b0473-a0f2-11e8-a4ee-42010a800002" is marked
for deletion
time: 2018-08-16T01:24:57Z
attached: false
The code here says the PV's finalizer won't be removed if the VolumeAttachment object still contains the PV name, so maybe that's not updated properly?
This is similar to issue we have for mount operation - kubernetes/kubernetes#82190 . I have not investigated this fully if a similar problem exists for external-attacher.
Especially useful for volume drivers that don't implement attach
My kubernetes version:1.9.6
Installing tool: kubeadm
Troubles:
I'm testing with hostpath CSI driver as the example shows. The pod is running well:
[root@node-1 ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE csi-pod 4/4 Running 0 4h 172.17.0.3 node-1
And the pvc is created normally:
[root@node-1 ~]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
csi-pvc Bound pvc-b694d42f-42cc-11e8-bc4f-525400b538e8 1Gi RWO csi-hostpath-sc 2h
[root@node-1 ~]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-b694d42f-42cc-11e8-bc4f-525400b538e8 1Gi RWO Delete Bound default/csi-pvc csi-hostpath-sc 2h
But when I create a pod to test the pvc, the pod is always ContainerCreating, and the external-attacher container of csi-pod logs:
I0418 08:28:30.703166 1 reflector.go:240] Listing and watching *v1beta1.VolumeAttachment from github.com/kubernetes-csi/external-attacher/vendor/k8s.io/client-go/informers/factory.go:87 E0418 08:28:30.713445 1 reflector.go:205] github.com/kubernetes-csi/external-attacher/vendor/k8s.io/client-go/informers/factory.go:87: Failed to list *v1beta1.VolumeAttachment: the server could not find the requested resource
And while getting the resource directly, here is the result:
[root@node-1 ~]# kubectl get volumeattachments
NAME AGE
csi-ad50ac36b76e55e8621e4f37e9d561bf141ed57cd6c1d5b8d0219fda62ab03cb 2h
Anyone has met the same problem?
From kubernetes-csi/external-snapshotter#239:
$ go get -v github.com/kubernetes-csi/[email protected]
go get github.com/kubernetes-csi/[email protected]: github.com/kubernetes-csi/[email protected]: invalid version: module contains a go.mod file, so major version must be compatible: should be v0 or v1, not v2
See kubernetes-csi/external-snapshotter#240 for a fix.
Creating cluster "csi-prow" ...
โข Ensuring node image (kindest/node:v1.17.0) ๐ผ ...
โ Ensuring node image (kindest/node:v1.17.0) ๐ผ
โข Preparing nodes ๐ฆ ...
โ Preparing nodes ๐ฆ
โข Creating kubeadm config ๐ ...
โ Creating kubeadm config ๐
โข Starting control-plane ๐น๏ธ ...
panic: runtime error: slice bounds out of range [:109] with capacity 104
goroutine 130 [running]:
bytes.(*Buffer).grow(0xc0004e4c90, 0x6d, 0xc0000bfea0)
/usr/local/go/src/bytes/buffer.go:148 +0x2b8
bytes.(*Buffer).Write(0xc0004e4c90, 0xc000862000, 0x6d, 0x8000, 0x0, 0x0, 0x6d)
/usr/local/go/src/bytes/buffer.go:172 +0xdd
io.(*multiWriter).Write(0xc0004ca820, 0xc000862000, 0x6d, 0x8000, 0x6d, 0x0, 0x0)
/usr/local/go/src/io/multi.go:60 +0x87
io.copyBuffer(0x1454e40, 0xc0004ca820, 0x1455360, 0xc00000e240, 0xc000862000, 0x8000, 0x8000, 0x404d15, 0xc000848420, 0xc0000bffb0)
/usr/local/go/src/io/io.go:404 +0x1fb
io.Copy(...)
/usr/local/go/src/io/io.go:364
os/exec.(*Cmd).writerDescriptor.func1(0xc000848420, 0xc0000bffb0)
/usr/local/go/src/os/exec/exec.go:311 +0x63
os/exec.(*Cmd).Start.func1(0xc0000dc9a0, 0xc0004ca8e0)
/usr/local/go/src/os/exec/exec.go:435 +0x27
created by os/exec.(*Cmd).Start
/usr/local/go/src/os/exec/exec.go:434 +0x608
WARNING: Cluster creation failed. Will try again with higher verbosity.
INFO: Available Docker images:
REPOSITORY TAG IMAGE ID CREATED SIZE
csi-attacher csiprow 4e407f398301 2 minutes ago 46.1MB
csi-attacher latest 4e407f398301 2 minutes ago 46.1MB
kindest/node v1.17.0 675dfb7f800c 10 days ago 1.23GB
golang 1.13.3 dc7582e06f8e 2 months ago 803MB
gcr.io/distroless/static latest e4d2a899b1bf 50 years ago 1.82MB
Mon Dec 30 23:57:21 UTC 2019 go1.13.4 $ kind --loglevel debug create cluster --retain --name csi-prow --config /home/prow/go/pkg/csiprow.Jb6kZgvLDN/kind-config.yaml --wait 5m --image kindest/node:v1.17.0
WARNING: --loglevel is deprecated, please switch to -v and -q!
DEBUG: exec/local.go:88] Running: "docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format '{{.Label "io.k8s.sigs.kind.cluster"}}'"
ERROR: a cluster with the name "csi-prow" already exists
Mon Dec 30 23:57:22 UTC 2019 go1.13.4 $ kind export logs --name csi-prow /logs/artifacts/kind-cluster
Exported logs to: /logs/artifacts/kind-cluster
ERROR: Cluster creation failed again, giving up. See the 'kind-cluster' artifact directory for additional logs.
List of items we should do when we're ready for a 2.0:
/lifecycle frozen
Hi,
Now I want to built the external attacher image private, but the image size is much bigger than download from quar.io.
Below is the built process:
#pwd
/root/CSF-CMT-IMAGE/bcmt_csi/go/src/github.com/kubernetes-csi/external-attacher
#make csi-attacher
#docker build -t csi-attacher:v1.0.1 -f Dockerfile .
Check the image, the attacher images size download from quay.io is 50.2M quay.io/k8scsi/csi-attacher v1.0.1 2ef5329b7139 2 months ago 50.2 MB
The image built by myself is 119M. csi-attacher v1.0.1 b7c7323f64b3 10 minutes ago 119 MB
As now I need to modify base image of the Dockerfile, so I need to build the image manually(make first and then docker build)
# docker images|grep csi
csi-snapshotter v1.0.1 f7d7f6914253 About a minute ago 118 MB
csi-provisioner v1.0.1 6498b7fcbf02 5 minutes ago 123 MB
csi-attacher v1.0.1 b7c7323f64b3 10 minutes ago 119 MB
docker.io/k8scloudprovider/cinder-csi-plugin latest ad9688cdab83 7 days ago 326 MB
quay.io/k8scsi/csi-node-driver-registrar v1.0.1 4c7fa144c035 2 months ago 46.9 MB
quay.io/k8scsi/csi-provisioner v1.0.1 2004b031bce2 2 months ago 48 MB
quay.io/k8scsi/csi-attacher v1.0.1 2ef5329b7139 2 months ago 50.2 MB
quay.io/k8scsi/csi-snapshotter v1.0.1 c70168a8d1de 2 months ago 49.2 MB
Do you have any suggestions about the image size big differences? Thanks!
README introduces a deprecated flag retry-interval-end
--retry-interval-end: The exponential backoff maximum value. See CSI error and timeout handling for details. 5 minutes is used by default.
Seems this flag should have be replaced by retry-interval-max
as following code from master branch:
retryIntervalMax = flag.Duration("retry-interval-max", 5*time.Minute, "Maximum retry interval of failed create volume or deletion.")
Currently, the attacher calls ControllerPublishVolume()
and ControllerUnpublishVolume()
with context.TODO()
. It'd be better to call it with some reasonable deadline and exponential back-off.
Currently, attacher uses storage.k8s.io/v1alpha1
API to talk to Kubernetes and work with VOlumeAttachment
objects. v1alpha1
needs to be explicitly enabled by cluster admins.
We should move to v1beta1
, which is enabled by default.
I updated my driver to support CSI v1.0.0 and tested it with K8S release-1.13 branch. I didn't enable CSINodeInfo feature gate. Attach volume failed with the following errors, indicating CSINodeInfo is required.
external-attacher logs:
I1121 17:45:21.698075 1 reflector.go:357] k8s.io/client-go/informers/factory.go:131: Watch close - *v1.Node total 45 items received
I1121 17:46:41.961154 1 controller.go:167] Started VA processing "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.961200 1 csi_handler.go:91] CSIHandler: processing VA "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.961214 1 csi_handler.go:118] Attaching "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.961224 1 csi_handler.go:257] Starting attach operation for "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.961355 1 csi_handler.go:212] PV finalizer is already set on "pvc-990567c7-edb3-11e8-a8c1-000c29e70439"
I1121 17:46:41.968744 1 csi_handler.go:516] Can't get CSINodeInfo 127.0.0.1: csinodeinfos.csi.storage.k8s.io "127.0.0.1" is forbidden: User "system:serviceaccount:default:csi-attacher" cannot get resource "csinodeinfos" in API group "csi.storage.k8s.io" at the cluster scope
I1121 17:46:41.968795 1 csi_handler.go:380] Saving attach error to "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.972135 1 csi_handler.go:390] Saved attach error to "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.972155 1 csi_handler.go:101] Error processing "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3": failed to attach: node "127.0.0.1" has no NodeID annotation
I1121 17:46:41.972177 1 controller.go:167] Started VA processing "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.972182 1 csi_handler.go:91] CSIHandler: processing VA "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.972186 1 csi_handler.go:118] Attaching "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.972190 1 csi_handler.go:257] Starting attach operation for "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.972231 1 csi_handler.go:212] PV finalizer is already set on "pvc-990567c7-edb3-11e8-a8c1-000c29e70439"
I1121 17:46:41.973221 1 csi_handler.go:516] Can't get CSINodeInfo 127.0.0.1: csinodeinfos.csi.storage.k8s.io "127.0.0.1" is forbidden: User "system:serviceaccount:default:csi-attacher" cannot get resource "csinodeinfos" in API group "csi.storage.k8s.io" at the cluster scope
I1121 17:46:41.973257 1 csi_handler.go:380] Saving attach error to "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.975052 1 csi_handler.go:390] Saved attach error to "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.975087 1 csi_handler.go:101] Error processing "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3": failed to attach: node "127.0.0.1" has no NodeID annotation
In #184, we had decided that instead of marking the VolumeAttachment as detached, we would just requeue the volume to have the workqueue process it again.
However, this doesn't work in the case where the Node is deleted. In that scenario:
What should happen is:
I'm not sure the best way to fix step 2). Some suggestions I have in order of preference:
markAsDetached
if csiAttach
failed on the force sync.Actual behavior:
Delete
on a PV results in that PV vanishing but the volume at the cloud provider staying. The volume at the cloud provider only gets deleted when deleting the PVC
Expected behavior:
Currently, external-attacher will create 10 go routines when it start.
In some case, for instance, create 1000 Pod with 1000 PVC. Some
CSI drivers may execute ControllerPublishVolume
slowly.
10 routines may not suitable for some case, some user may need a bigger amount of sync worker.
I think it's necessary to make the worker count configurable.
The RBAC permissions listed in the attacher role in the example are different from those bundled into k8s:
https://github.com/kubernetes/kubernetes/blob/4e397d971a72f5bc6dbffbacca7297f17e17c572/plugin/pkg/auth/authorizer/rbac/bootstrappolicy/policy.go#L471
Should they be different? Should most users just use the built-in role?
There is currently a todo in pkg/connection/util.go
:
AccessType: &csi.VolumeCapability_Mount{
Mount: &csi.VolumeCapability_MountVolume{
// TODO: get FsType from somewhere
MountFlags: pv.Spec.MountOptions,
},
},
fstype
is now a field in CSIPersistentVolumeSource
and should be fetched from there and set.
Hi,
I have created the CSI plugin in my kubernetes cluster, now the SC, pvc, pv has been created successfully. But when create pod to attach the volume, it always reports the error "AttachVolume.Attach failed for volume "pvc-3d1d6dfb-39c3-11e9-9185-fa163e02d826" : node "sdc-bcmt-01-edge-worker-01" has no NodeID annotation"
# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
csi-pvc-cinderplugin Bound pvc-3d1d6dfb-39c3-11e9-9185-fa163e02d826 1Gi RWO csi-sc-cinderplugin 107m
# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-00f6dddd-3994-11e9-91ba-fa163e02d826 1Gi RWO Delete Terminating default/csi-pvc-cinderplugin csi-sc-cinderplugin 7h1m
pvc-3d1d6dfb-39c3-11e9-9185-fa163e02d826 1Gi RWO Delete Bound default/csi-pvc-cinderplugin csi-sc-cinderplugin 107m
# kubectl get pod nginx
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedAttachVolume 34m (x34 over 101m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-3d1d6dfb-39c3-11e9-9185-fa163e02d826" : node "sdc-bcmt-01-edge-worker-01" has no NodeID annotation
Warning FailedMount 3m11s (x44 over 100m) kubelet, sdc-bcmt-01-edge-worker-01 Unable to mount volumes for pod "nginx_default(f39547a0-39c3-11e9-9185-fa163e02d826)": timeout expired waiting for volumes to attach or mount for pod "default"/"nginx". list of unmounted volumes=[csi-data-cinderplugin]. list of unattached volumes=[csi-data-cinderplugin default-token-n2jlc]
Warning FailedAttachVolume 66s (x23 over 31m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-3d1d6dfb-39c3-11e9-9185-fa163e02d826" : node "sdc-bcmt-01-edge-worker-01" has no NodeID annotation
Below is the detailed log for attacher container:
attacher00.log
Thanks
Hi,
Probe failed with rpc error: code = DeadlineExceeded desc = context deadline exceeded
). In my opinion, when external attacher calling Probe, the timeout setting is too short (only one second). We can increase timeout second.external-attacher/cmd/csi-attacher/main.go
Line 198 in 5a4cc3b
external-attacher/cmd/csi-attacher/main.go
Line 101 in 5a4cc3b
If a plugin has a VolumeExpansion.OFFLINE
, then a resize process should happen after ControllerUnpublishVolume
and before ControllerPublishVolume
.
Currently, this is not the case, and if I am to delete Pod, then external-resizer and external-attacher would race each other. If external-resizer wins and calls ControllerPublishVolume
before a resize operation is finished, then we have to delete a Pod again. Which is incredibly frustrating.
I've hacked together a hacky "solution". But I think that OFFLINE resize should work properly on the CSI side, not on the plugin side.
I propose not Publishing a Volume to a Node if a PVC is currently resizing.
When a pod is deleted before nodePublish is done, but after nodeStage is done, it will not be unmounted by kubelet. This causes stale mounts on host to be left around.
A PersistentVolume can contain multiple access modes:
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
"providers will have different capabilities and each PVโs access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PVโs capabilities."
When multiple access modes are given to the CreateVolume
call it is the drivers responsibility to provision a volume that satisfies all access modes:
// The capabilities that the provisioned volume MUST have. SP MUST
// provision a volume that will satisfy ALL of the capabilities
// specified in this list. Otherwise SP MUST return the appropriate
// gRPC error code.
// The Plugin MUST assume that the CO MAY use the provisioned volume
// with ANY of the capabilities specified in this list.
Then once the driver provisions a volume the external provisioner then copies the multiple access modes to the PV:
However, when we try to attach the volume, the external attacher rejects multiple access modes on the PV:
external-attacher/pkg/controller/util.go
Line 190 in 08983ee
Since the PV supports multiple AccessModes in Kubernetes, but the CSI call to ControllerPublishVolume
accepts only one AccessMode. It is actually the external-attacher's job to "pick" the correct access mode based on other information (or some heuristic) before passing it to the CSI Driver. Throwing an error is not intended behavior.
The contents of extras/docker
are used to build the container because they output a container that is only 3MB instead of 300MB as done by the one in the root.
Version is needed to be logged
When a CSI driver returns non-trainsient error, corresponding VolumeAttachment is marked as detached (i.e. attacher's finalizer is removed).
I0805 16:05:08.948172 1 connection.go:180] GRPC call: /csi.v1.Controller/ControllerUnpublishVolume
I0805 16:05:08.948182 1 connection.go:181] GRPC request: {"node_id":"i-0eb51d1f90e270d91","volume_id":"vol-0766d9cc3c1966b86"}
I0805 16:05:15.323947 1 connection.go:183] GRPC response: {}
I0805 16:05:15.324406 1 connection.go:184] GRPC error: rpc error: code = Internal desc = Could not detach volume "vol-0766d9cc3c1966b86" from node "i-0eb51d1f90e270d91": could not detach volume "vol-0766d9cc3c1966b86" from node "i-0eb51d1f90e270d91": RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 10ceab6c-4a6d-4da5-add2-d46d4fdb652a
I0805 16:05:15.324422 1 csi_handler.go:369] Detached "csi-51ad5c68abe99844c08cfce659c0f8375d8b0255391a6b28b543501577b3dee6" with error rpc error: code = Internal desc = Could not detach volume "vol-0766d9cc3c1966b86" from node "i-0eb51d1f90e270d91": could not detach volume "vol-0766d9cc3c1966b86" from node "i-0
eb51d1f90e270d91": RequestLimitExceeded: Request limit exceeded.
status code: 503, request id: 10ceab6c-4a6d-4da5-add2-d46d4fdb652a
I0805 16:05:15.324459 1 util.go:70] Marking as detached "csi-51ad5c68abe99844c08cfce659c0f8375d8b0255391a6b28b543501577b3dee6"
This is wrong, basically any error from ControllerUnpublish
means that the volume could be still attacher and the attacher must wait for successful response to confirm the volume is detached.
Since the external-attacher uses CSINode object that has reached beta in all supported Kubernetes releases, the attacher should stop using csi.volume.kubernetes.io/nodeid
annotation on Node objects.
Reported by 3rd party:
on detach, detached, err := h.csiConnection.Detach(ctx, volumeHandle, nodeID, secrets)
is executed which on any error in Detach, will be set by: return isFinalError(err), err
which is this:
1func isFinalError(err error) bool {
2 // Sources:
3 // https://github.com/grpc/grpc/blob/master/doc/statuscodes.md
4 // https://github.com/container-storage-interface/spec/blob/master/spec.md
5 st, ok := status.FromError(err)
6 if !ok {
7 // This is not gRPC error. The operation must have failed before gRPC
8 // method was called, otherwise we would get gRPC error.
9 return true
10 }
11 switch st.Code() {
12 case codes.Canceled, // gRPC: Client Application cancelled the request
13 codes.DeadlineExceeded, // gRPC: Timeout
14 codes.Unavailable, // gRPC: Server shutting down, TCP connection broken - previous Attach() or Detach() may be still in progress.
15 codes.ResourceExhausted, // gRPC: Server temporarily out of resources - previous Attach() or Detach() may be still in progress.
16 codes.FailedPrecondition: // CSI: Operation pending for volume
17 return false
18 }
19 // All other errors mean that the operation (attach/detach) either did not
20 // even start or failed. It is for sure not in progress.
21 return true
}
line 9 in the above should not return true detached
, the caller of the function needs do additional checking not to get into this condition.
Also switch.st.Code()
cases codes.FailedPrecondition
should be replaced with Aborted
since it indicates an Operation pending for volume
and return false for this case.
Need to print git tag version built from at least.
The attacher only logs error when it fails to fetch a secret configured in PV and continues with attaching. It should return error to the user instead.
Secrets are optional in CSI, however CSI really wants them to be used once they're set in PV. If there is a typo in secret name, admin should know and attach should fail.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.