kubernetes-csi / external-attacher Goto Github PK

Sidecar container that watches Kubernetes VolumeAttachment objects and triggers ControllerPublish/Unpublish against a CSI endpoint

License: Apache License 2.0

Makefile 4.60% Go 58.52% Dockerfile 0.06% Shell 32.59% Python 4.22%

k8s-sig-storage

external-attacher's Issues

Should use "Patch" instead of "Update" when changing k8s API Objects

Currently when changing k8s API Objects the external attacher uses "Update" calls. These are less efficient than patching, they could also be problematic if there are small object changes in between Get and Update (accidental revert is possible).

They could also cause issue with mismatched API versions (this should not happen but we still want to minimize breakage/risk).

Specify the number of work threads

Specify the number of work threads, currently it is set 10 in the code.

Raw block volumes fail to attach

CSI has supported raw block volumes since version 0.1, and Kubernetes since v1.9, but the external-attacher cannot attach a block volume.

When attaching raw block volumes we don't even see an error on the external-attacher logs saying it cannot handle it, it just goes ahead assuming it's a mount type attach.

Vendoring is broken?

Checked out a fresh master branch and ran "make test":

### test-vendor:
Repo uses 'dep' for vendoring.
# vendor is out of sync:
github.com/googleapis/gnostic: hash of vendored tree not equal to digest in Gopkg.lock
make: *** [release-tools/build.make:132: test-vendor] Error 1

$ dep version
dep:
 version     : v0.5.1
 build date  : 2019-03-11
 git hash    : faa6189
 go version  : go1.12
 go compiler : gc
 platform    : linux/amd64
 features    : ImportDuringSolve=false

External-Attacher fails to issue controller probe request

According to the CSI spec, the CO (or its components) shall invoke this RPC (controllerprobe) to determine readiness of the service.

A CSI driver that follows the language of the spec will fail if a probe RPC is not invoked prior to other operations. If the CSI language is correct, then this constitutes a bug here.

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Update external-attacher to use v1.CSINode

We can also remove the fallback on Node annotation at the same time. This change will require a major version bump because we would have to bump the min K8s version to 1.17 when the object was introduced.

We should make this change at least 1, if not 2 releases before the v1beta gets removed from Kubernetes.

/lifecycle frozen

Cannot build on x86_64: not enough arguments in call to watch.NewStreamWatcher

Hi everyone,

I am running into a bit of a weird issue.
I have one ppc64le machine in which external-attached builds without a problem:

$ git clone https://github.com/kubernetes-csi/external-attacher.git
$ cd external-attacher
$ make (there is an error reported trying to build for windows, but the linux build works)
$ ls bin/
csi-attacher
$ go env
GOARCH="ppc64le"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="ppc64le"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/root/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_ppc64le"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/root/external-attacher/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build135377313=/tmp/go-build -gno-record-gcc-switches"

The client-go library is:

$ ls $GOPATH/pkg/mod/k8s.io/ | grep client
[email protected]

However the exact same make command in a x86_64 machine produces the following:

$ make
mkdir -p bin
CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-X main.version=v2.0.0-0-g27b83ca9 -extldflags "-static"' -o ./bin/csi-attacher ./cmd/csi-attacher
go: creating new go.mod: module github.com/kubernetes-csi/external-attacher
go: copying requirements from Gopkg.lock
# k8s.io/client-go/rest
../go/pkg/mod/k8s.io/[email protected]+incompatible/rest/request.go:598:31: not enough arguments in call to watch.NewStreamWatcher
	have (*versioned.Decoder)
	want (watch.Decoder, watch.Reporter)
make: *** [build-csi-attacher] Error 2
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/johngouf/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/johngouf/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/johngouf/external-attacher/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build193746378=/tmp/go-build -gno-record-gcc-switches"

The client-go library is:

$ ls $GOPATH/pkg/mod/k8s.io/ | grep client
[email protected]+incompatible

Any idea what I am doing wrong? Is it a bug?

Thanks!

Take advantage of `csi_secret` in CSI 1.0

CSI 1.0 decorates sensitive fields with csi_secret. Let's take advantage of this feature to programmatically ensure no sensitive fields are ever logged by this side car container.

Detach failure if node is removed from cluster

Problem

If node with volumes managed by CSI plugin is removed from Kubernetes cluster before volumes are detached then detaching will newer succeed.

function csiDetach (in pkg/controller/csi_handler.go +296 (master)) is failing always because it tries to get nodeID from kubernetes api and as node does not anymore exists it fails.

Environment

This behavior was produced in Openstack with Cinder-CSI plugin during scale down of worker nodes. As there Openstack cloud provider deleted the node from Kubernetes cluster when it found out that the virtual machine did not existed anymore.

Expected Behavior

Volume detach should not automatically fail if node where it was attached isn't anymore part of the kubernetes cluster. It should still try to detach the volume. The nodeID could be passed as empty string and then the actual detaching implementation could then decide if it possible to detach the volume or if it might already be detached.

In case of Openstack volumes are automatically detached when virtual machine is deleted.

Fix rbac rules for csinode

It's still using the alpha CRD instead of the beta in-tree API

/help

Update external-attacher to use v1.VolumeAttachment

It's still using the beta object. I think we can safely make this change without a major version bump since v1.VolumeAttachment was added in 1.13 which is below our min supported version.

Run func should use context instead of stop channel

It is preferable to use a context instead of a stop channel when controlling the lifetime of a function call (either long-lived goroutine or just RPC calls). Context makes it easier to propagate cancellation and still allow operations with short timeouts in a long-lived routine.

Additionally, our code will be easier to test if functions like Run() wait for their child goroutines to exit before returning themselves. This provides tests with a way to guarantee that all side-effects of a particular call have been executed.

external-attacher does not see nodeID annotation for 2 min?

This is an analysis of the following test flake: https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-serial/4398

Kubelet sees new plugin and registers it:
I1128 12:33:16.974128    6672 csi_plugin.go:116] kubernetes.io/csi: Trying to validate a new CSI Driver with name: pd.csi.storage.gke.io endpoint: /var/lib/kubelet/plugins/pd.csi.storage.gke.io/csi.sock versions: 1.0.0, foundInDeprecatedDir: false
I1128 12:33:16.974342    6672 csi_plugin.go:137] kubernetes.io/csi: Register new plugin with name: pd.csi.storage.gke.io at endpoint: /var/lib/kubelet/plugins/pd.csi.storage.gke.io/csi.sock

attach-detach controller is waiting for attach:
E1128 12:34:31.327964       1 csi_attacher.go:227] kubernetes.io/csi: attachment for projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002 failed: cannot find NodeID for driver "pd.csi.storage.gke.io" for node "bootstrap-e2e-minion-group-r7dh"
E1128 12:34:31.328073       1 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002\"" failed. No retries permitted until 2018-11-28 12:36:33.328041594 +0000 UTC m=+4563.274204929 (durationBeforeRetry 2m2s). Error: "AttachVolume.Attach failed for volume \"pvc-a2f50d29-f309-11e8-802b-42010a800002\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002\") from node \"bootstrap-e2e-minion-group-r7dh\" : cannot find NodeID for driver \"pd.csi.storage.gke.io\" for node \"bootstrap-e2e-minion-group-r7dh\""

external-attacher finally sees the annotation and attaches the volume:
I1128 12:35:35.359295       1 csi_handler.go:193] NodeID annotation added to "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:35:35.365952       1 csi_handler.go:203] VolumeAttachment "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea" updated with finalizer and/or NodeID annotation
I1128 12:35:35.366006       1 connection.go:235] GRPC call: /csi.v1.Controller/ControllerPublishVolume
I1128 12:35:35.366013       1 connection.go:236] GRPC request: volume_id:"projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002" node_id:"projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/instances/bootstrap-e2e-minion-group-r7dh" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1543408336401-8081-pd.csi.storage.gke.io" > 
I1128 12:35:47.982680       1 connection.go:238] GRPC response: 
I1128 12:35:47.982733       1 connection.go:239] GRPC error: <nil>
I1128 12:35:47.982744       1 csi_handler.go:131] Attached "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:35:47.982757       1 util.go:33] Marking as attached "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:35:47.998078       1 util.go:43] Marked as attached "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"
I1128 12:35:47.998099       1 csi_handler.go:137] Fully attached "csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea"

Due to exponential back off attach-detach controller won't "retry" attach until 12:36:33.328041594

kubelet is also waiting for attach-detach controller to ack volume is attached:

E1128 12:36:33.114836    6672 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/csi/pd.csi.storage.gke.io^projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002\"" failed. No retries permitted until 2018-11-28 12:38:35.11478352 +0000 UTC m=+4788.780131927 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume \"pvc-a2f50d29-f309-11e8-802b-42010a800002\" (UniqueName: \"kubernetes.io/csi/pd.csi.storage.gke.io^projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002\") pod \"pod-subpath-test-pd-csi-storage-gke-io-dynamicpv-ghvv\" (UID: \"a7db48f2-f309-11e8-802b-42010a800002\") "

attach-detach controller wakes up and sees the volume is attached:
I1128 12:36:33.332854       1 reconciler.go:289] attacherDetacher.AttachVolume started for volume "pvc-a2f50d29-f309-11e8-802b-42010a800002" (UniqueName: "kubernetes.io/csi/pd.csi.storage.gke.io^projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002") from node "bootstrap-e2e-minion-group-r7dh" 
I1128 12:36:33.345070       1 csi_attacher.go:110] kubernetes.io/csi: attachment [csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea] for volume [projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002] already exists (will not be recreated)
I1128 12:36:33.345100       1 csi_attacher.go:148] kubernetes.io/csi: probing for updates from CSI driver for [attachment.ID=csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea]
I1128 12:36:33.345113       1 csi_attacher.go:157] kubernetes.io/csi: probing VolumeAttachment [id=csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea]
I1128 12:36:33.348650       1 csi_attacher.go:119] kubernetes.io/csi: attacher.Attach finished OK with VolumeAttachment object [csi-1fe974024220d00085419a3757b0ccee5ebadeff7672fc3fe0ed469132a41aea]
I1128 12:36:33.348731       1 operation_generator.go:335] AttachVolume.Attach succeeded for volume "pvc-a2f50d29-f309-11e8-802b-42010a800002" (UniqueName: "kubernetes.io/csi/pd.csi.storage.gke.io^projects/k8s-jkns-gci-gce-flaky/zones/us-central1-f/disks/pvc-a2f50d29-f309-11e8-802b-42010a800002") from node "bootstrap-e2e-minion-group-r7dh" 

Due to exponential back off on kubelet, it will not check for attached volume again until 2018-11-28 12:38:35.11478352

Before kubelet can check, the test timesout and gives up

Nov 28 12:37:23.159: INFO: Deleting pod "pod-subpath-test-pd-csi-storage-gke-io-dynamicpv-ghvv" in namespace "e2e-tests-csi-volumes-65prz"

So for the most part seems like everything is operating as expected.
Outstanding question is why did external-attacher not see the nodeID annotation until 2 min after it was added to node?

Attacher logging the PV Name created by other CSI driver or other plugins

Attacher is logging the PV name which is created by other CSI drivers or created by intree storage provisioners

logs:

I1204 04:42:58.513497       1 controller.go:205] Started PV processing "pvc-579343b3-663d-45a7-98a9-996983c2f09e"

more info at rook/rook#4365

Bump version to 3.0.0

If #200 is merged, it introduced RBAC change that needs major version bump of the attacher to v3.0.0. And therefore we need to bump also attacher's go module to v3, similarly to #209 before release.

Allow running multiple attachers for one driver

Right now, each CSI driver must run exactly one external attacher and we recommend running it in a StatefulSet.

We should fix the attacher to do the same leader election as provisioner and update documentation and examples everywhere so the attacher runs in the same deployment as the provisioner.

Can't pull images

After trying to create the external-attacher example YML, I see a perm denied error on the Quay images.... assume theres an easy fix? Not sure what the one would be though.

[centos@ip-10-0-1-136 demo]$ sudo docker pull  quay.io/k8scsi/mock-plugin
Using default tag: latest
Pulling repository quay.io/k8scsi/mock-plugin
Error: Status 403 trying to pull repository k8scsi/mock-plugin: "{\"error\": \"Permission Denied\"}"

(image name change / image deleted) ?

Add rbac rules for accessing secrets

They should be optional as not all volume drivers will need it: kubernetes/kubernetes#69122

Failing to delete PV after attach failure

As I was playing around with topology I ran into an issue: I provisioned a volume and tried to attach it to a pod, which I expected. Then I deleted the pod successfully. The strange part is when I deleted the PVC, the PV went into Terminating state but never got deleted.

The PV still has an external-attach/driverName finalizer preventing its deletion. The associated VolumeAttachment object still exists, and it looks like this:

apiVersion: storage.k8s.io/v1beta1
  kind: VolumeAttachment
  metadata:
    creationTimestamp: 2018-08-16T01:22:00Z
    finalizers:
    - external-attacher/com-google-csi-gcepd
    name: csi-d4935b83fe8fb00a069d523e71d54c504366d49defd4c6cff33fe41f9cfa4b04
    namespace: ""
    resourceVersion: "460172"
    selfLink: /apis/storage.k8s.io/v1beta1/volumeattachments/csi-d4935b83fe8fb00a069d523e71d54c504366d49defd4
c6cff33fe41f9cfa4b04
    uid: c6230d88-a0f2-11e8-a4ee-42010a800002
  spec:
    attacher: com.google.csi.gcepd
    nodeName: kubernetes-minion-group-11bk
    source:
      persistentVolumeName: pvc-c39b0473-a0f2-11e8-a4ee-42010a800002
  status:
    attachError:
      message: PersistentVolume "pvc-c39b0473-a0f2-11e8-a4ee-42010a800002" is marked
        for deletion
      time: 2018-08-16T01:24:57Z
    attached: false

The code here says the PV's finalizer won't be removed if the VolumeAttachment object still contains the PV name, so maybe that's not updated properly?

external-attacher/pkg/controller/csi_handler.go

Line 391 in a4398f8

    
           if va.Spec.Source.PersistentVolumeName != nil && *va.Spec.Source.PersistentVolumeName == pv.Name {

@jsafrane @saad-ali

[investigate] Ensure that if an attach operation times out and plugin successfully attaches it is tracked somewhere

This is similar to issue we have for mount operation - kubernetes/kubernetes#82190 . I have not investigated this fully if a similar problem exists for external-attacher.

Make resync configurable

Especially useful for volume drivers that don't implement attach

Failed to list VolumentAttachment

My kubernetes version:1.9.6
Installing tool: kubeadm
Troubles:
I'm testing with hostpath CSI driver as the example shows. The pod is running well:
[root@node-1 ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE csi-pod 4/4 Running 0 4h 172.17.0.3 node-1

And the pvc is created normally:
[root@node-1 ~]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
csi-pvc Bound pvc-b694d42f-42cc-11e8-bc4f-525400b538e8 1Gi RWO csi-hostpath-sc 2h

[root@node-1 ~]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-b694d42f-42cc-11e8-bc4f-525400b538e8 1Gi RWO Delete Bound default/csi-pvc csi-hostpath-sc 2h

But when I create a pod to test the pvc, the pod is always ContainerCreating, and the external-attacher container of csi-pod logs:
I0418 08:28:30.703166 1 reflector.go:240] Listing and watching *v1beta1.VolumeAttachment from github.com/kubernetes-csi/external-attacher/vendor/k8s.io/client-go/informers/factory.go:87 E0418 08:28:30.713445 1 reflector.go:205] github.com/kubernetes-csi/external-attacher/vendor/k8s.io/client-go/informers/factory.go:87: Failed to list *v1beta1.VolumeAttachment: the server could not find the requested resource

And while getting the resource directly, here is the result:
[root@node-1 ~]# kubectl get volumeattachments
NAME AGE
csi-ad50ac36b76e55e8621e4f37e9d561bf141ed57cd6c1d5b8d0219fda62ab03cb 2h
Anyone has met the same problem?

Incorrect repository layour for v2 Go Modules

From kubernetes-csi/external-snapshotter#239:

$ go get -v github.com/kubernetes-csi/[email protected]
go get github.com/kubernetes-csi/[email protected]: github.com/kubernetes-csi/[email protected]: invalid version: module contains a go.mod file, so major version must be compatible: should be v0 or v1, not v2

See kubernetes-csi/external-snapshotter#240 for a fix.

kind cluster bringup failed

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-csi_external-attacher/201/pull-kubernetes-csi-external-attacher-1-17-on-kubernetes-1-17/1211797311915560967/build-log.txt

Creating cluster "csi-prow" ...
 • Ensuring node image (kindest/node:v1.17.0) 🖼  ...
 ✓ Ensuring node image (kindest/node:v1.17.0) 🖼
 • Preparing nodes 📦  ...
 ✓ Preparing nodes 📦
 • Creating kubeadm config 📜  ...
 ✓ Creating kubeadm config 📜
 • Starting control-plane 🕹️  ...
panic: runtime error: slice bounds out of range [:109] with capacity 104

goroutine 130 [running]:
bytes.(*Buffer).grow(0xc0004e4c90, 0x6d, 0xc0000bfea0)
	/usr/local/go/src/bytes/buffer.go:148 +0x2b8
bytes.(*Buffer).Write(0xc0004e4c90, 0xc000862000, 0x6d, 0x8000, 0x0, 0x0, 0x6d)
	/usr/local/go/src/bytes/buffer.go:172 +0xdd
io.(*multiWriter).Write(0xc0004ca820, 0xc000862000, 0x6d, 0x8000, 0x6d, 0x0, 0x0)
	/usr/local/go/src/io/multi.go:60 +0x87
io.copyBuffer(0x1454e40, 0xc0004ca820, 0x1455360, 0xc00000e240, 0xc000862000, 0x8000, 0x8000, 0x404d15, 0xc000848420, 0xc0000bffb0)
	/usr/local/go/src/io/io.go:404 +0x1fb
io.Copy(...)
	/usr/local/go/src/io/io.go:364
os/exec.(*Cmd).writerDescriptor.func1(0xc000848420, 0xc0000bffb0)
	/usr/local/go/src/os/exec/exec.go:311 +0x63
os/exec.(*Cmd).Start.func1(0xc0000dc9a0, 0xc0004ca8e0)
	/usr/local/go/src/os/exec/exec.go:435 +0x27
created by os/exec.(*Cmd).Start
	/usr/local/go/src/os/exec/exec.go:434 +0x608
WARNING: Cluster creation failed. Will try again with higher verbosity.
INFO: Available Docker images:
REPOSITORY                 TAG                 IMAGE ID            CREATED             SIZE
csi-attacher               csiprow             4e407f398301        2 minutes ago       46.1MB
csi-attacher               latest              4e407f398301        2 minutes ago       46.1MB
kindest/node               v1.17.0             675dfb7f800c        10 days ago         1.23GB
golang                     1.13.3              dc7582e06f8e        2 months ago        803MB
gcr.io/distroless/static   latest              e4d2a899b1bf        50 years ago        1.82MB
Mon Dec 30 23:57:21 UTC 2019 go1.13.4 $ kind --loglevel debug create cluster --retain --name csi-prow --config /home/prow/go/pkg/csiprow.Jb6kZgvLDN/kind-config.yaml --wait 5m --image kindest/node:v1.17.0
WARNING: --loglevel is deprecated, please switch to -v and -q!
DEBUG: exec/local.go:88] Running: "docker ps -q -a --no-trunc --filter label=io.k8s.sigs.kind.cluster --format '{{.Label "io.k8s.sigs.kind.cluster"}}'"
ERROR: a cluster with the name "csi-prow" already exists
Mon Dec 30 23:57:22 UTC 2019 go1.13.4 $ kind export logs --name csi-prow /logs/artifacts/kind-cluster
Exported logs to: /logs/artifacts/kind-cluster
ERROR: Cluster creation failed again, giving up. See the 'kind-cluster' artifact directory for additional logs.

Changes for 2.0

List of items we should do when we're ready for a 2.0:

Remove deprecated fields
Revert "revert use patch": #157

/lifecycle frozen

The external-attacher(external-provisioner) image size built private is much bigger than download from quay.io

Hi,
Now I want to built the external attacher image private, but the image size is much bigger than download from quar.io.
Below is the built process:

Download the v1.0.1 attacher source code.https://github.com/kubernetes-csi/external-attacher/archive/v1.0.1.tar.gz
run make csi-attacher
build the image

#pwd
/root/CSF-CMT-IMAGE/bcmt_csi/go/src/github.com/kubernetes-csi/external-attacher
#make csi-attacher
#docker build -t csi-attacher:v1.0.1 -f Dockerfile .

Check the image, the attacher images size download from quay.io is 50.2M quay.io/k8scsi/csi-attacher v1.0.1 2ef5329b7139 2 months ago 50.2 MB
The image built by myself is 119M. csi-attacher v1.0.1 b7c7323f64b3 10 minutes ago 119 MB
As now I need to modify base image of the Dockerfile, so I need to build the image manually(make first and then docker build)

# docker images|grep csi
csi-snapshotter                                                             v1.0.1              f7d7f6914253        About a minute ago   118 MB
csi-provisioner                                                             v1.0.1              6498b7fcbf02        5 minutes ago        123 MB
csi-attacher                                                                v1.0.1              b7c7323f64b3        10 minutes ago       119 MB
docker.io/k8scloudprovider/cinder-csi-plugin                                latest              ad9688cdab83        7 days ago           326 MB
quay.io/k8scsi/csi-node-driver-registrar                                    v1.0.1              4c7fa144c035        2 months ago         46.9 MB
quay.io/k8scsi/csi-provisioner                                              v1.0.1              2004b031bce2        2 months ago         48 MB
quay.io/k8scsi/csi-attacher                                                 v1.0.1              2ef5329b7139        2 months ago         50.2 MB
quay.io/k8scsi/csi-snapshotter                                              v1.0.1              c70168a8d1de        2 months ago         49.2 MB

Do you have any suggestions about the image size big differences? Thanks!

deprecated setting `--retry-interval-end`

README introduces a deprecated flag retry-interval-end

--retry-interval-end: The exponential backoff maximum value. See CSI error and timeout handling for details. 5 minutes is used by default.

Seems this flag should have be replaced by retry-interval-max as following code from master branch:

retryIntervalMax   = flag.Duration("retry-interval-max", 5*time.Minute, "Maximum retry interval of failed create volume or deletion.")

Add timeout and exponential backoff to attach/detach

Currently, the attacher calls ControllerPublishVolume() and ControllerUnpublishVolume() with context.TODO(). It'd be better to call it with some reasonable deadline and exponential back-off.

Check for PUBLISH_READONLY capability before setting readOnly field in ControllerPublishVolume

See https://github.com/container-storage-interface/spec/pull/310/files for details

Attacher should use storage.k8s.io/v1beta1 API instead of v1alpha1

Currently, attacher uses storage.k8s.io/v1alpha1 API to talk to Kubernetes and work with VOlumeAttachment objects. v1alpha1 needs to be explicitly enabled by cluster admins.

We should move to v1beta1, which is enabled by default.

CSINodeInfo should not be required

I updated my driver to support CSI v1.0.0 and tested it with K8S release-1.13 branch. I didn't enable CSINodeInfo feature gate. Attach volume failed with the following errors, indicating CSINodeInfo is required.

external-attacher logs:

I1121 17:45:21.698075 1 reflector.go:357] k8s.io/client-go/informers/factory.go:131: Watch close - *v1.Node total 45 items received
I1121 17:46:41.961154 1 controller.go:167] Started VA processing "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.961200 1 csi_handler.go:91] CSIHandler: processing VA "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.961214 1 csi_handler.go:118] Attaching "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.961224 1 csi_handler.go:257] Starting attach operation for "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.961355 1 csi_handler.go:212] PV finalizer is already set on "pvc-990567c7-edb3-11e8-a8c1-000c29e70439"
I1121 17:46:41.968744 1 csi_handler.go:516] Can't get CSINodeInfo 127.0.0.1: csinodeinfos.csi.storage.k8s.io "127.0.0.1" is forbidden: User "system:serviceaccount:default:csi-attacher" cannot get resource "csinodeinfos" in API group "csi.storage.k8s.io" at the cluster scope
I1121 17:46:41.968795 1 csi_handler.go:380] Saving attach error to "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.972135 1 csi_handler.go:390] Saved attach error to "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.972155 1 csi_handler.go:101] Error processing "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3": failed to attach: node "127.0.0.1" has no NodeID annotation
I1121 17:46:41.972177 1 controller.go:167] Started VA processing "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.972182 1 csi_handler.go:91] CSIHandler: processing VA "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.972186 1 csi_handler.go:118] Attaching "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.972190 1 csi_handler.go:257] Starting attach operation for "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.972231 1 csi_handler.go:212] PV finalizer is already set on "pvc-990567c7-edb3-11e8-a8c1-000c29e70439"
I1121 17:46:41.973221 1 csi_handler.go:516] Can't get CSINodeInfo 127.0.0.1: csinodeinfos.csi.storage.k8s.io "127.0.0.1" is forbidden: User "system:serviceaccount:default:csi-attacher" cannot get resource "csinodeinfos" in API group "csi.storage.k8s.io" at the cluster scope
I1121 17:46:41.973257 1 csi_handler.go:380] Saving attach error to "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.975052 1 csi_handler.go:390] Saved attach error to "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3"
I1121 17:46:41.975087 1 csi_handler.go:101] Error processing "csi-287954e86d0185d269332671ff365816b3506d826a6a01ca8c4caf4256d7a8b3": failed to attach: node "127.0.0.1" has no NodeID annotation

VolumeAttachment not marked as detached causes problems when the Node is deleted.

In #184, we had decided that instead of marking the VolumeAttachment as detached, we would just requeue the volume to have the workqueue process it again.

However, this doesn't work in the case where the Node is deleted. In that scenario:

ListVolumes() shows that volume is not attached to the node anymore
ReconcileVA() sets force sync
syncAttach() just tries to reattach the volume again and fails because node is gone
In k/k AD controller, we try to attach to new node, but it fails on the multi-attach check because volume is still attached in asw.

What should happen is:

ListVolumes() shows that volume is not attached to the node anymore
We actually mark VolumeAttachment.status.attached as detached
In k/k AD controller, VerifyVolumesAttached() sees that VolumeAttachment is detached, updates asw
AD reconciler allows new Attach on new node to proceed.

I'm not sure the best way to fix step 2). Some suggestions I have in order of preference:

We go back to actually updating VolumeAttachment in ReconcileVA() like the original PR did. But we call markAsDetached to make sure we update everything properly.
We pass some more state to syncVA() so that it can markAsDetached if csiAttach failed on the force sync.

Deleting a PV does not result in a deletion of the associated resources

Actual behavior:

Issuing Delete on a PV results in that PV vanishing but the volume at the cloud provider staying. The volume at the cloud provider only gets deleted when deleting the PVC

Expected behavior:

After a PV is deleted from my cluster, all associated resources at the cloud provider are gone

Make worker count configurable

Currently, external-attacher will create 10 go routines when it start.
In some case, for instance, create 1000 Pod with 1000 PVC. Some
CSI drivers may execute ControllerPublishVolume slowly.
10 routines may not suitable for some case, some user may need a bigger amount of sync worker.
I think it's necessary to make the worker count configurable.

Different permissions between attacher roles in example and k8s

The RBAC permissions listed in the attacher role in the example are different from those bundled into k8s:
https://github.com/kubernetes/kubernetes/blob/4e397d971a72f5bc6dbffbacca7297f17e17c572/plugin/pkg/auth/authorizer/rbac/bootstrappolicy/policy.go#L471

Should they be different? Should most users just use the built-in role?

fstype should be fetched from CSI volume source

There is currently a todo in pkg/connection/util.go:

		AccessType: &csi.VolumeCapability_Mount{
			Mount: &csi.VolumeCapability_MountVolume{
				// TODO: get FsType from somewhere
				MountFlags: pv.Spec.MountOptions,
			},
		},

fstype is now a field in CSIPersistentVolumeSource and should be fetched from there and set.

AttachVolume.Attach failed for volume "pvc-3d1d6dfb-39c3-11e9-9185-fa163e02d826" : node has no NodeID annotation

Hi,
I have created the CSI plugin in my kubernetes cluster, now the SC, pvc, pv has been created successfully. But when create pod to attach the volume, it always reports the error "AttachVolume.Attach failed for volume "pvc-3d1d6dfb-39c3-11e9-9185-fa163e02d826" : node "sdc-bcmt-01-edge-worker-01" has no NodeID annotation"

# kubectl get pvc
NAME                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          AGE
csi-pvc-cinderplugin   Bound    pvc-3d1d6dfb-39c3-11e9-9185-fa163e02d826   1Gi        RWO            csi-sc-cinderplugin   107m

# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS        CLAIM                          STORAGECLASS          REASON   AGE
pvc-00f6dddd-3994-11e9-91ba-fa163e02d826   1Gi        RWO            Delete           Terminating   default/csi-pvc-cinderplugin   csi-sc-cinderplugin            7h1m
pvc-3d1d6dfb-39c3-11e9-9185-fa163e02d826   1Gi        RWO            Delete           Bound         default/csi-pvc-cinderplugin   csi-sc-cinderplugin            107m

# kubectl get pod nginx
Events:
  Type     Reason              Age                    From                                 Message
  ----     ------              ----                   ----                                 -------
  Warning  FailedAttachVolume  34m (x34 over 101m)    attachdetach-controller              AttachVolume.Attach failed for volume "pvc-3d1d6dfb-39c3-11e9-9185-fa163e02d826" : node "sdc-bcmt-01-edge-worker-01" has no NodeID annotation
  Warning  FailedMount         3m11s (x44 over 100m)  kubelet, sdc-bcmt-01-edge-worker-01  Unable to mount volumes for pod "nginx_default(f39547a0-39c3-11e9-9185-fa163e02d826)": timeout expired waiting for volumes to attach or mount for pod "default"/"nginx". list of unmounted volumes=[csi-data-cinderplugin]. list of unattached volumes=[csi-data-cinderplugin default-token-n2jlc]
  Warning  FailedAttachVolume  66s (x23 over 31m)     attachdetach-controller              AttachVolume.Attach failed for volume "pvc-3d1d6dfb-39c3-11e9-9185-fa163e02d826" : node "sdc-bcmt-01-edge-worker-01" has no NodeID annotation

Below is the detailed log for attacher container:
attacher00.log

Thanks

call Probe error due to timeout setting is too short

Hi,

I develop a CSI driver and implement Probe call. The Probe call verifies driver status and takes about 2 seconds. The external attacher prints calling Probe RPC timeout log (Probe failed with rpc error: code = DeadlineExceeded desc = context deadline exceeded). In my opinion, when external attacher calling Probe, the timeout setting is too short (only one second). We can increase timeout second.

external-attacher/cmd/csi-attacher/main.go

Line 198 in 5a4cc3b

ctx, cancel := context.WithTimeout(context.Background(), csiTimeout)
We would not share the same context object in several GRPC call.

external-attacher/cmd/csi-attacher/main.go

Line 101 in 5a4cc3b

csiConn, err := connection.New(*csiAddress, *connectionTimeout)

Delay PublishVolumeRequest until OFFLINE-resizable Volume is resized

If a plugin has a VolumeExpansion.OFFLINE, then a resize process should happen after ControllerUnpublishVolume and before ControllerPublishVolume.

Currently, this is not the case, and if I am to delete Pod, then external-resizer and external-attacher would race each other. If external-resizer wins and calls ControllerPublishVolume before a resize operation is finished, then we have to delete a Pod again. Which is incredibly frustrating.

I've hacked together a hacky "solution". But I think that OFFLINE resize should work properly on the CSI side, not on the plugin side.

I propose not Publishing a Volume to a Node if a PVC is currently resizing.

Stale mounts are left if pod is deleted before nodePublish is completed

When a pod is deleted before nodePublish is done, but after nodeStage is done, it will not be unmounted by kubelet. This causes stale mounts on host to be left around.

GetVolumeCapabilities incorrectly errors when multiple volume capabilities are specified in the PV

A PersistentVolume can contain multiple access modes:
https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

"providers will have different capabilities and each PV’s access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV’s capabilities."

When multiple access modes are given to the CreateVolume call it is the drivers responsibility to provision a volume that satisfies all access modes:

// The capabilities that the provisioned volume MUST have. SP MUST
// provision a volume that will satisfy ALL of the capabilities
// specified in this list. Otherwise SP MUST return the appropriate
// gRPC error code.
// The Plugin MUST assume that the CO MAY use the provisioned volume
// with ANY of the capabilities specified in this list.

Then once the driver provisions a volume the external provisioner then copies the multiple access modes to the PV:

https://github.com/kubernetes-csi/external-provisioner/blob/1730a1e31d92b454e72c8c443369b3338d2f3599/pkg/controller/controller.go#L521

However, when we try to attach the volume, the external attacher rejects multiple access modes on the PV:

external-attacher/pkg/controller/util.go

Line 190 in 08983ee

    
           return nil, fmt.Errorf("CSI does not support ReadOnlyMany and ReadWriteOnce on the same PersistentVolume")

Since the PV supports multiple AccessModes in Kubernetes, but the CSI call to ControllerPublishVolume accepts only one AccessMode. It is actually the external-attacher's job to "pick" the correct access mode based on other information (or some heuristic) before passing it to the CSI Driver. Throwing an error is not intended behavior.

/cc @saad-ali @msau42 @jsafrane

Docker file in root needs to be deleted.

The contents of extras/docker are used to build the container because they output a container that is only 3MB instead of 300MB as done by the one in the root.

Versioning needed

Version is needed to be logged

external-attacher marks volumes as detached after final error

When a CSI driver returns non-trainsient error, corresponding VolumeAttachment is marked as detached (i.e. attacher's finalizer is removed).

I0805 16:05:08.948172       1 connection.go:180] GRPC call: /csi.v1.Controller/ControllerUnpublishVolume
I0805 16:05:08.948182       1 connection.go:181] GRPC request: {"node_id":"i-0eb51d1f90e270d91","volume_id":"vol-0766d9cc3c1966b86"}
I0805 16:05:15.323947       1 connection.go:183] GRPC response: {}
I0805 16:05:15.324406       1 connection.go:184] GRPC error: rpc error: code = Internal desc = Could not detach volume "vol-0766d9cc3c1966b86" from node "i-0eb51d1f90e270d91": could not detach volume "vol-0766d9cc3c1966b86" from node "i-0eb51d1f90e270d91": RequestLimitExceeded: Request limit exceeded.
        status code: 503, request id: 10ceab6c-4a6d-4da5-add2-d46d4fdb652a
I0805 16:05:15.324422       1 csi_handler.go:369] Detached "csi-51ad5c68abe99844c08cfce659c0f8375d8b0255391a6b28b543501577b3dee6" with error rpc error: code = Internal desc = Could not detach volume "vol-0766d9cc3c1966b86" from node "i-0eb51d1f90e270d91": could not detach volume "vol-0766d9cc3c1966b86" from node "i-0
eb51d1f90e270d91": RequestLimitExceeded: Request limit exceeded.
        status code: 503, request id: 10ceab6c-4a6d-4da5-add2-d46d4fdb652a
I0805 16:05:15.324459       1 util.go:70] Marking as detached "csi-51ad5c68abe99844c08cfce659c0f8375d8b0255391a6b28b543501577b3dee6"

This is wrong, basically any error from ControllerUnpublish means that the volume could be still attacher and the attacher must wait for successful response to confirm the volume is detached.

Stop using node annotation

Since the external-attacher uses CSINode object that has reached beta in all supported Kubernetes releases, the attacher should stop using csi.volume.kubernetes.io/nodeid annotation on Node objects.

external_attacher returns detached true when possible a volume still attached

Reported by 3rd party:
on detach, detached, err := h.csiConnection.Detach(ctx, volumeHandle, nodeID, secrets) is executed which on any error in Detach, will be set by: return isFinalError(err), err

which is this:

1func isFinalError(err error) bool {
2  // Sources:
3    // https://github.com/grpc/grpc/blob/master/doc/statuscodes.md
4    // https://github.com/container-storage-interface/spec/blob/master/spec.md
5    st, ok := status.FromError(err)
6    if !ok {
7        // This is not gRPC error. The operation must have failed before gRPC
8        // method was called, otherwise we would get gRPC error.
9        return true
10    }
11    switch st.Code() {
12    case codes.Canceled, // gRPC: Client Application cancelled the request
13       codes.DeadlineExceeded,   // gRPC: Timeout
14        codes.Unavailable,        // gRPC: Server shutting down, TCP connection broken - previous Attach() or Detach() may be still in progress.
15        codes.ResourceExhausted,  // gRPC: Server temporarily out of resources - previous Attach() or Detach() may be still in progress.
16        codes.FailedPrecondition: // CSI: Operation pending for volume
17        return false
18    }
19    // All other errors mean that the operation (attach/detach) either did not
20    // even start or failed. It is for sure not in progress.
21    return true
}

line 9 in the above should not return true detached, the caller of the function needs do additional checking not to get into this condition.

Also switch.st.Code() cases codes.FailedPrecondition should be replaced with Aborted since it indicates an Operation pending for volume and return false for this case.

kubernetes-csi / external-attacher Goto Github PK

external-attacher's Issues

Problem

Environment

Expected Behavior

Recommend Projects

Recommend Topics

Recommend Org