kubernetes-sigs / vsphere-csi-driver Goto Github PK

View Code? Open in Web Editor NEW

279.0 20.0 171.0 16.37 MB

vSphere storage Container Storage Interface (CSI) plugin

Home Page: https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/index.html

License: Apache License 2.0

Makefile 0.31% Dockerfile 0.23% Go 98.59% Shell 0.86%

csi vsphere k8s-sig-cloud-provider

vsphere-csi-driver's Introduction

Container Storage Interface (CSI) driver for vSphere

The vSphere CSI Driver is a Kubernetes plugin that allows persistent storage for containerized workloads running on vSphere infrastructure. It enables dynamic provisioning of storage volumes and provides features like snapshots, cloning, and dynamic expansion of volumes. The vSphere CSI Driver replaces the in-tree vSphere volume plugin and offers integration with vSphere with better scale and performance.

This driver is in a stable GA state and is suitable for production use.

It is recommended to install an out-of-tree Cloud Provider Interface like vSphere Cloud Provider Interface in the Kubernetes cluster to keep the Kubernetes cluster fully operational.

Documentation

Documentation for vSphere CSI Driver is available here:

vSphere CSI Driver Releases

Contributing

Please see CONTRIBUTING.md for instructions on how to contribute.

Contact

Slack

vsphere-csi-driver's People

Contributors

Stargazers

Watchers

Forkers

codenrhoden dvonthenen davlloyd zuzzas moondev raunakshah yuyiying lipingxue xing-yang misterikkit martinweindel flant flofourcade sandeeppissay shuinoo cchengleo sushilsuresh haewons-tanzu shreesha21 divyenpatel manojvs157 c3y1huang baludontu mooyeg shalini-b yuga711 larhauga wr41th vbalachandran dav1x raunakkumar alphasite jenkinshop jiangzheng51 svoskoboi moyni77 kavitha-jm akutz sashrith bhavyachoudhary mattthias svrc marunachalam danil-grigorev subramanian-neelakantan gohilankit pallabi-sutradhar vmansishah kavyashree-r deepakkinni robbiejvmw jsafrane th3n3xtg3n3ration yastij paudieo allanhoejgaardjensen jritter skogta lintongj gab-satchi yjuns isabella232 kumamonting swgupta-vmw ramjigit darkchaos sscemail openshift vmuniverse hinodeya69 bertinatto sophichia gnufied rpanduranga xiaoping8385 sfowl dazsco adestis-bm arxiv-research zhelongp anuthamb jgummadi23 arunmk dinesh-arumugam dbalar ahrtr codernitish pradeep-hegde zwx19623 nigupta1 ghaglersgi mahenaz-sayyed bryanl verult hoegaarden abhinavmpandey08 bostrt aishwarya-hebbar prakashkl88 xinyanw409

vsphere-csi-driver's Issues

Replace pkg/common/config with config package from CPI

/kind cleanup

The CNS common library copied the config package from the CPI, which is means to be re-used as a common library.

We need to remove pkg/common/config and use the one from CPI instead, as it is already a dependency for this repo. That way both CSI and CPI will have common/tested config file handling. The config file types and syntax are an API to the user and require backwards compatibility, deprecation notices, etc per the K8s project standards.

volume create failed when having two datastores in a zone

When i created a pvc in a zone(zone-a) which has two datastores, the pv created faild. But i used another zone(zone-b) which has only single datastore, pv can be created successfully.
The log of vsphere-csi-controller is in the following:

I0108 17:26:11.574537 I0108 17:26:11.574769 I0108 17:26:11.574777 I0108 17:26:11.574799 I0108 17:26:11.574804 I0108 17:26:11.574812 I0108 17:26:11.574822 I0108 17:26:11.574826 I0108 17:26:11.788643 I0108 17:26:11.788926 I0108 17:26:11.788987 I0108 17:26:11.789014 time="2020-01-08T17:26:11Z" time="2020-01-08T17:26:11Z" time="2020-01-08T17:26:11Z" time="2020-01-08T17:26:11Z" I0108 17:26:11.793208 I0108 17:26:11.793353 I0108 17:26:11.804130 I0108 17:26:11.804149 I0108 17:26:11.893251 I0108 17:26:11.893310 I0108 17:26:11.893375 I0108 17:26:11.893757 I0108 17:26:11.893791 I0108 17:26:11.893851 I0108 17:26:11.893862 I0108 17:26:11.893867 I0108 17:26:11.893872 I0108 17:26:11.925708 I0108 17:26:11.925764 I0108 17:26:11.925824 I0108 17:26:11.925839 I0108 17:26:11.925850 I0108 17:26:11.925860 I0108 17:26:12.023746 I0108 17:26:12.027570 I0108 17:26:12.027628 I0108 17:26:12.027859 I0108 17:26:12.027873 I0108 17:26:12.027882 I0108 17:26:12.027892 I0108 17:26:12.027902 I0108 17:26:12.027911 I0108 17:26:12.059761 I0108 17:26:12.059796 I0108 17:26:12.059810 I0108 17:26:12.059829 I0108 17:26:12.060261 I0108 17:26:12.060275 I0108 17:26:12.098502 I0108 17:26:12.154967 I0108 17:26:12.155057 I0108 17:26:12.155073 I0108 17:26:12.155094 I0108 17:26:12.155119 I0108 17:26:12.155164 I0108 17:26:12.155178 I0108 17:26:12.155188 I0108 17:26:12.155197 I0108 17:26:12.186242 I0108 17:26:12.186275 I0108 17:26:12.186287 I0108 17:26:12.186298 I0108 17:26:12.270352 I0108 17:34:12.804690 I0108 17:40:03.944951 I0108 17:40:03.946435 I0108 17:40:03.946473 I0108 17:40:04.188345 I0108 17:40:04.188421 I0108 17:40:04.188495 I0108 17:40:04.188531 I0108 17:40:04.188567 I0108 17:40:04.188612 I0108 17:40:04.188653 I0108 17:40:04.188723 I0108 17:40:04.188741 I0108 17:40:04.188792 I0108 17:40:04.188868 I0108 17:40:04.590415 I0108 17:40:04.590548 I0108 17:40:05.134616 I0108 17:40:05.179237 I0108 17:40:05.179317 I0108 17:40:05.473059 I0108 17:40:05.729980 I0108 17:40:05.897122 I0108 17:40:05.942668 I0108 17:40:05.942689 I0108 17:40:06.143389 I0108 17:40:06.219155 I0108 17:40:06.393739 I0108 17:40:06.438611 I0108 17:40:06.667479 I0108 17:40:06.667613 I0108 17:40:07.049321 I0108 17:40:07.049420 I0108 17:40:07.638535 I0108 17:40:07.686361 I0108 17:40:07.686545 I0108 17:40:07.936444 I0108 17:40:08.171643 I0108 17:40:08.378222 I0108 17:40:08.587307 I0108 17:40:08.587331 I0108 17:40:08.802722 I0108 17:40:09.033697 I0108 17:40:09.237978 I0108 17:40:09.290767 I0108 17:40:09.346782 I0108 17:40:09.567819 I0108 17:40:09.568110 I0108 17:40:09.967352 I0108 17:40:09.967629 I0108 17:40:10.670152 I0108 17:40:10.718798 I0108 17:40:10.720287 I0108 17:40:10.962121 I0108 17:40:11.049047 I0108 17:40:11.108277 I0108 17:40:11.160528 I0108 17:40:11.160578 I0108 17:40:11.246856 I0108 17:40:11.488449 I0108 17:40:11.542314 I0108 17:40:11.595377 I0108 17:40:11.645451 I0108 17:40:11.788163 I0108 17:40:11.788263 I0108 17:40:11.888995 I0108 17:40:11.985369 I0108 17:40:11.985412 I0108 17:40:12.015722 DynamicData: (types.DynamicData) {
},
Name: (string) VolumeType: Datastores: (types.ManagedObjectReference) (types.ManagedObjectReference) },
Metadata: (types.CnsV DynamicData: (types.DynamicData) {
},
ContainerCluster: DynamicData: (types.DynamicData) {
},
ClusterType: ClusterId: (string) VSphereUser: },
EntityMetadata: },
BackingObjectDetails: DynamicData: (types.DynamicData) {
},
CapacityInMb: (int64) 10240
}),
Profile: ([]types.Bas })
I0108 17:40:12.077553 I0108 17:40:13.525391 E0108 17:40:13.525608 E0108 17:40:13.525714 E0108 17:40:13.525747 1 config.go:265] GetCnsconfig called with cfgPath: /etc/cloud/csi-vsphere.conf
1 config.go:209] Initializing vc server 100.2.51.14
1 controller.go:66] Initializing CNS controller
1 virtualcentermanager.go:61] Initializing defaultVirtualCenterManager...
1 virtualcentermanager.go:63] Successfully initialized defaultVirtualCenterManager
1 virtualcentermanager.go:105] Successfully registered VC "100.2.51.14"
1 manager.go:55] Initializing volume.defaultManager...
1 manager.go:59] volume.defaultManager initialized
1 virtualcenter.go:130] New session ID for 'VSPHERE.LOCAL\Administrator' = 52040b65-a46d-c864-42db-6093c39b8ab2
1 manager.go:71] Initializing node.defaultManager...
1 manager.go:75] node.defaultManager initialized
1 kubernetes.go:34] k8s client using in-cluster config
level=info msg="configured: csi.vsphere.vmware.com" controllerType=VANILLA mode=controller
level=info msg="identity service registered"
level=info msg="controller service registered"
level=info msg=serving endpoint="unix:///var/lib/csi/sockets/pluginproxy/csi.sock"
1 reflector.go:123] Starting reflector *v1.Node (0s) from pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:94
1 reflector.go:161] Listing and watching *v1.Node from pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:94
1 manager.go:98] Successfully registered node: "k8s-master-zhj" with nodeUUID "421C972B-903B-101C-84AE-D291AB325231"
1 virtualmachine.go:124] Initiating asynchronous datacenter listing with uuid 421C972B-903B-101C-84AE-D291AB325231
1 datacenter.go:146] Publishing datacenter Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]
1 virtualmachine.go:140] AsyncGetAllDatacenters finished with uuid 421C972B-903B-101C-84AE-D291AB325231
1 virtualmachine.go:161] AsyncGetAllDatacenters with uuid 421C972B-903B-101C-84AE-D291AB325231 sent a dc Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]
1 virtualmachine.go:140] AsyncGetAllDatacenters finished with uuid 421C972B-903B-101C-84AE-D291AB325231
1 virtualmachine.go:140] AsyncGetAllDatacenters finished with uuid 421C972B-903B-101C-84AE-D291AB325231
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421C972B-903B-101C-84AE-D291AB325231
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421C972B-903B-101C-84AE-D291AB325231
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421C972B-903B-101C-84AE-D291AB325231
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421C972B-903B-101C-84AE-D291AB325231
1 virtualmachine.go:176] Found VM VirtualMachine:vm-79 [VirtualCenterHost: 100.2.51.14, UUID: 421c972b-903b-101c-84ae-d291ab325231, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]] given uuid 421C972B-903B-101C-84AE-D291AB325231 on DC Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14], canceling context
1 virtualmachine.go:188] Returning VM VirtualMachine:vm-79 [VirtualCenterHost: 100.2.51.14, UUID: 421c972b-903b-101c-84ae-d291ab325231, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]] for UUID 421C972B-903B-101C-84AE-D291AB325231
1 manager.go:117] Successfully discovered node with nodeUUID 421C972B-903B-101C-84AE-D291AB325231 in vm VirtualMachine:vm-79 [VirtualCenterHost: 100.2.51.14, UUID: 421c972b-903b-101c-84ae-d291ab325231, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]]
1 manager.go:104] Successfully discovered node: "k8s-master-zhj" with nodeUUID "421C972B-903B-101C-84AE-D291AB325231"
1 manager.go:98] Successfully registered node: "k8s-node1-zhj" with nodeUUID "421C056B-782D-E3FD-758B-9109ABDB9162"
1 virtualmachine.go:124] Initiating asynchronous datacenter listing with uuid 421C056B-782D-E3FD-758B-9109ABDB9162
1 datacenter.go:146] Publishing datacenter Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]
1 virtualmachine.go:140] AsyncGetAllDatacenters finished with uuid 421C056B-782D-E3FD-758B-9109ABDB9162
1 virtualmachine.go:161] AsyncGetAllDatacenters with uuid 421C056B-782D-E3FD-758B-9109ABDB9162 sent a dc Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421C056B-782D-E3FD-758B-9109ABDB9162
1 virtualmachine.go:140] AsyncGetAllDatacenters finished with uuid 421C056B-782D-E3FD-758B-9109ABDB9162
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421C056B-782D-E3FD-758B-9109ABDB9162
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421C056B-782D-E3FD-758B-9109ABDB9162
1 virtualmachine.go:140] AsyncGetAllDatacenters finished with uuid 421C056B-782D-E3FD-758B-9109ABDB9162
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421C056B-782D-E3FD-758B-9109ABDB9162
1 virtualmachine.go:176] Found VM VirtualMachine:vm-83 [VirtualCenterHost: 100.2.51.14, UUID: 421c056b-782d-e3fd-758b-9109abdb9162, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]] given uuid 421C056B-782D-E3FD-758B-9109ABDB9162 on DC Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14], canceling context
1 virtualmachine.go:188] Returning VM VirtualMachine:vm-83 [VirtualCenterHost: 100.2.51.14, UUID: 421c056b-782d-e3fd-758b-9109abdb9162, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]] for UUID 421C056B-782D-E3FD-758B-9109ABDB9162
1 manager.go:117] Successfully discovered node with nodeUUID 421C056B-782D-E3FD-758B-9109ABDB9162 in vm VirtualMachine:vm-83 [VirtualCenterHost: 100.2.51.14, UUID: 421c056b-782d-e3fd-758b-9109abdb9162, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]]
1 manager.go:104] Successfully discovered node: "k8s-node1-zhj" with nodeUUID "421C056B-782D-E3FD-758B-9109ABDB9162"
1 manager.go:98] Successfully registered node: "k8s-node2-zhj" with nodeUUID "421CC765-4F58-5A4B-39DB-2C63908366B0"
1 virtualmachine.go:124] Initiating asynchronous datacenter listing with uuid 421CC765-4F58-5A4B-39DB-2C63908366B0
1 controller.go:350] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
1 datacenter.go:146] Publishing datacenter Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421CC765-4F58-5A4B-39DB-2C63908366B0
1 virtualmachine.go:161] AsyncGetAllDatacenters with uuid 421CC765-4F58-5A4B-39DB-2C63908366B0 sent a dc Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421CC765-4F58-5A4B-39DB-2C63908366B0
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421CC765-4F58-5A4B-39DB-2C63908366B0
1 virtualmachine.go:140] AsyncGetAllDatacenters finished with uuid 421CC765-4F58-5A4B-39DB-2C63908366B0
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421CC765-4F58-5A4B-39DB-2C63908366B0
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421CC765-4F58-5A4B-39DB-2C63908366B0
1 virtualmachine.go:156] AsyncGetAllDatacenters finished with uuid 421CC765-4F58-5A4B-39DB-2C63908366B0
1 virtualmachine.go:176] Found VM VirtualMachine:vm-82 [VirtualCenterHost: 100.2.51.14, UUID: 421cc765-4f58-5a4b-39db-2c63908366b0, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]] given uuid 421CC765-4F58-5A4B-39DB-2C63908366B0 on DC Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14], canceling context
1 virtualmachine.go:188] Returning VM VirtualMachine:vm-82 [VirtualCenterHost: 100.2.51.14, UUID: 421cc765-4f58-5a4b-39db-2c63908366b0, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]] for UUID 421CC765-4F58-5A4B-39DB-2C63908366B0
1 manager.go:117] Successfully discovered node with nodeUUID 421CC765-4F58-5A4B-39DB-2C63908366B0 in vm VirtualMachine:vm-82 [VirtualCenterHost: 100.2.51.14, UUID: 421cc765-4f58-5a4b-39db-2c63908366b0, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]]
1 manager.go:104] Successfully discovered node: "k8s-node2-zhj" with nodeUUID "421CC765-4F58-5A4B-39DB-2C63908366B0"
1 controller.go:350] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
1 reflector.go:370] pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:94: Watch close - *v1.Node total 26 items received
1 controller.go:113] CreateVolume: called with args {Name:pvc-53813478-1248-4e0a-a134-250e240b6dbf CapacityRange:required_bytes:10737418240 VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource: AccessibilityRequirements:requisite:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-1" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-a" > > preferred:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-1" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-a" > > XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
1 nodes.go:105] GetSharedDatastoresInTopology: called with topologyRequirement: requisite:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-1" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-a" > > preferred:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-1" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-a" > > , zoneCategoryName: k8s-zone, regionCategoryName: k8s-region
1 manager.go:217] Renewing VM VirtualMachine:vm-82 [VirtualCenterHost: 100.2.51.14, UUID: 421cc765-4f58-5a4b-39db-2c63908366b0, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]] with new connection: nodeUUID 421CC765-4F58-5A4B-39DB-2C63908366B0
1 manager.go:227] Updated VM VirtualMachine:vm-82 [VirtualCenterHost: 100.2.51.14, UUID: 421cc765-4f58-5a4b-39db-2c63908366b0, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2, VirtualCenterHost: 100.2.51.14]] for node with nodeUUID 421CC765-4F58-5A4B-39DB-2C63908366B0
1 manager.go:214] Renewing VM VirtualMachine:vm-79 [VirtualCenterHost: 100.2.51.14, UUID: 421c972b-903b-101c-84ae-d291ab325231, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]], no new connection needed: nodeUUID 421C972B-903B-101C-84AE-D291AB325231
1 manager.go:227] Updated VM VirtualMachine:vm-79 [VirtualCenterHost: 100.2.51.14, UUID: 421c972b-903b-101c-84ae-d291ab325231, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2, VirtualCenterHost: 100.2.51.14]] for node with nodeUUID 421C972B-903B-101C-84AE-D291AB325231
1 manager.go:214] Renewing VM VirtualMachine:vm-83 [VirtualCenterHost: 100.2.51.14, UUID: 421c056b-782d-e3fd-758b-9109abdb9162, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /Datacenter, VirtualCenterHost: 100.2.51.14]], no new connection needed: nodeUUID 421C056B-782D-E3FD-758B-9109ABDB9162
1 manager.go:227] Updated VM VirtualMachine:vm-83 [VirtualCenterHost: 100.2.51.14, UUID: 421c056b-782d-e3fd-758b-9109abdb9162, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2, VirtualCenterHost: 100.2.51.14]] for node with nodeUUID 421C056B-782D-E3FD-758B-9109ABDB9162
1 nodes.go:175] Using preferred topology
1 nodes.go:137] getSharedDatastoresInTopology: called with topologyArr: [segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-1" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-a" > ]
1 nodes.go:144] Getting list of nodeVMs for zone [zone-a] and region [region-1]
1 nodes.go:119] getNodesInZoneRegion: called with zoneValue: zone-a, regionValue: region-1
1 virtualmachine.go:324] IsInZoneRegion: called with zoneCategoryName: k8s-zone, regionCategoryName: k8s-region, zoneValue: zone-a, regionValue: region-1
1 virtualmachine.go:230] Using plain text username and password
1 virtualmachine.go:269] GetZoneRegion: called with zoneCategoryName: k8s-zone, regionCategoryName: k8s-region
1 virtualmachine.go:230] Using plain text username and password
1 virtualmachine.go:212] Host owning node vm: VirtualMachine:vm-82 [VirtualCenterHost: 100.2.51.14, UUID: 421cc765-4f58-5a4b-39db-2c63908366b0, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2, VirtualCenterHost: 100.2.51.14]] is 100.2.51.13
1 virtualmachine.go:263] Ancestors of node vm: VirtualMachine:vm-82 [VirtualCenterHost: 100.2.51.14, UUID: 421cc765-4f58-5a4b-39db-2c63908366b0, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2, VirtualCenterHost: 100.2.51.14]] are : [[{ExtensibleManagedObject:{Self:Folder:group-d1 Value:[] AvailableField:[]} Parent: CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:Datacenters DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]} {ExtensibleManagedObject:{Self:Datacenter:datacenter-2 Value:[] AvailableField:[]} Parent:Folder:group-d1 CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:Datacenter DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]} {ExtensibleManagedObject:{Self:Folder:group-h4 Value:[] AvailableField:[]} Parent:Datacenter:datacenter-2 CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:host DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]} {ExtensibleManagedObject:{Self:ClusterComputeResource:domain-c86 Value:[] AvailableField:[]} Parent:Folder:group-h4 CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:cluster2 DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]} {ExtensibleManagedObject:{Self:HostSystem:host-14 Value:[] AvailableField:[]} Parent:ClusterComputeResource:domain-c86 CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:100.2.51.13 DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]}]]
1 virtualmachine.go:285] Name: host-14, Type: HostSystem
1 virtualmachine.go:285] Name: domain-c86, Type: ClusterComputeResource
1 virtualmachine.go:292] Object [{{ClusterComputeResource:domain-c86 [] []} Folder:group-h4 [] [] [] [] cluster2 [] [] [] [] []}] has attached Tags [[urn:vmomi:InventoryServiceTag:c73d04ac-d8d7-4676-a476-c79288c8dfc4:GLOBAL]]
1 virtualmachine.go:300] Found tag: zone-b for object {{ClusterComputeResource:domain-c86 [] []} Folder:group-h4 [] [] [] [] cluster2 [] [] [] [] []}
1 virtualmachine.go:306] Found category: k8s-zone for object {{ClusterComputeResource:domain-c86 [] []} Folder:group-h4 [] [] [] [] cluster2 [] [] [] [] []} with tag: zone-b
1 virtualmachine.go:285] Name: group-h4, Type: Folder
1 virtualmachine.go:285] Name: datacenter-2, Type: Datacenter
1 virtualmachine.go:292] Object [{{Datacenter:datacenter-2 [] []} Folder:group-d1 [] [] [] [] Datacenter [] [] [] [] []}] has attached Tags [[urn:vmomi:InventoryServiceTag:cbe20733-aed0-4561-b6e5-1fc22e4c1dd3:GLOBAL]]
1 virtualmachine.go:300] Found tag: region-1 for object {{Datacenter:datacenter-2 [] []} Folder:group-d1 [] [] [] [] Datacenter [] [] [] [] []}
1 virtualmachine.go:306] Found category: k8s-region for object {{Datacenter:datacenter-2 [] []} Folder:group-d1 [] [] [] [] Datacenter [] [] [] [] []} with tag: region-1
1 virtualmachine.go:324] IsInZoneRegion: called with zoneCategoryName: k8s-zone, regionCategoryName: k8s-region, zoneValue: zone-a, regionValue: region-1
1 virtualmachine.go:230] Using plain text username and password
1 virtualmachine.go:269] GetZoneRegion: called with zoneCategoryName: k8s-zone, regionCategoryName: k8s-region
1 virtualmachine.go:230] Using plain text username and password
1 virtualmachine.go:212] Host owning node vm: VirtualMachine:vm-79 [VirtualCenterHost: 100.2.51.14, UUID: 421c972b-903b-101c-84ae-d291ab325231, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2, VirtualCenterHost: 100.2.51.14]] is 100.2.126.24
1 virtualmachine.go:263] Ancestors of node vm: VirtualMachine:vm-79 [VirtualCenterHost: 100.2.51.14, UUID: 421c972b-903b-101c-84ae-d291ab325231, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2, VirtualCenterHost: 100.2.51.14]] are : [[{ExtensibleManagedObject:{Self:Folder:group-d1 Value:[] AvailableField:[]} Parent: CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:Datacenters DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]} {ExtensibleManagedObject:{Self:Datacenter:datacenter-2 Value:[] AvailableField:[]} Parent:Folder:group-d1 CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:Datacenter DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]} {ExtensibleManagedObject:{Self:Folder:group-h4 Value:[] AvailableField:[]} Parent:Datacenter:datacenter-2 CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:host DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]} {ExtensibleManagedObject:{Self:ClusterComputeResource:domain-c69 Value:[] AvailableField:[]} Parent:Folder:group-h4 CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:cluster1 DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]} {ExtensibleManagedObject:{Self:HostSystem:host-39 Value:[] AvailableField:[]} Parent:ClusterComputeResource:domain-c69 CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:100.2.126.24 DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]}]]
1 virtualmachine.go:285] Name: host-39, Type: HostSystem
1 virtualmachine.go:285] Name: domain-c69, Type: ClusterComputeResource
1 virtualmachine.go:292] Object [{{ClusterComputeResource:domain-c69 [] []} Folder:group-h4 [] [] [] [] cluster1 [] [] [] [] []}] has attached Tags [[urn:vmomi:InventoryServiceTag:34d348cf-82a5-4b62-8faa-c0bb5128f9e1:GLOBAL]]
1 virtualmachine.go:300] Found tag: zone-a for object {{ClusterComputeResource:domain-c69 [] []} Folder:group-h4 [] [] [] [] cluster1 [] [] [] [] []}
1 virtualmachine.go:306] Found category: k8s-zone for object {{ClusterComputeResource:domain-c69 [] []} Folder:group-h4 [] [] [] [] cluster1 [] [] [] [] []} with tag: zone-a
1 virtualmachine.go:285] Name: group-h4, Type: Folder
1 virtualmachine.go:285] Name: datacenter-2, Type: Datacenter
1 virtualmachine.go:292] Object [{{Datacenter:datacenter-2 [] []} Folder:group-d1 [] [] [] [] Datacenter [] [] [] [] []}] has attached Tags [[urn:vmomi:InventoryServiceTag:cbe20733-aed0-4561-b6e5-1fc22e4c1dd3:GLOBAL]]
1 virtualmachine.go:300] Found tag: region-1 for object {{Datacenter:datacenter-2 [] []} Folder:group-d1 [] [] [] [] Datacenter [] [] [] [] []}
1 virtualmachine.go:306] Found category: k8s-region for object {{Datacenter:datacenter-2 [] []} Folder:group-d1 [] [] [] [] Datacenter [] [] [] [] []} with tag: region-1
1 virtualmachine.go:347] MoRef [VirtualMachine:vm-79] belongs to zone [zone-a] and region [region-1]
1 virtualmachine.go:324] IsInZoneRegion: called with zoneCategoryName: k8s-zone, regionCategoryName: k8s-region, zoneValue: zone-a, regionValue: region-1
1 virtualmachine.go:230] Using plain text username and password
1 virtualmachine.go:269] GetZoneRegion: called with zoneCategoryName: k8s-zone, regionCategoryName: k8s-region
1 virtualmachine.go:230] Using plain text username and password
1 virtualmachine.go:212] Host owning node vm: VirtualMachine:vm-83 [VirtualCenterHost: 100.2.51.14, UUID: 421c056b-782d-e3fd-758b-9109abdb9162, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2, VirtualCenterHost: 100.2.51.14]] is 100.2.126.24
1 virtualmachine.go:263] Ancestors of node vm: VirtualMachine:vm-83 [VirtualCenterHost: 100.2.51.14, UUID: 421c056b-782d-e3fd-758b-9109abdb9162, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2, VirtualCenterHost: 100.2.51.14]] are : [[{ExtensibleManagedObject:{Self:Folder:group-d1 Value:[] AvailableField:[]} Parent: CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:Datacenters DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]} {ExtensibleManagedObject:{Self:Datacenter:datacenter-2 Value:[] AvailableField:[]} Parent:Folder:group-d1 CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:Datacenter DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]} {ExtensibleManagedObject:{Self:Folder:group-h4 Value:[] AvailableField:[]} Parent:Datacenter:datacenter-2 CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:host DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]} {ExtensibleManagedObject:{Self:ClusterComputeResource:domain-c69 Value:[] AvailableField:[]} Parent:Folder:group-h4 CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:cluster1 DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]} {ExtensibleManagedObject:{Self:HostSystem:host-39 Value:[] AvailableField:[]} Parent:ClusterComputeResource:domain-c69 CustomValue:[] OverallStatus: ConfigStatus: ConfigIssue:[] EffectiveRole:[] Permission:[] Name:100.2.126.24 DisabledMethod:[] RecentTask:[] DeclaredAlarmState:[] TriggeredAlarmState:[] AlarmActionsEnabled: Tag:[]}]]
1 virtualmachine.go:285] Name: host-39, Type: HostSystem
1 virtualmachine.go:285] Name: domain-c69, Type: ClusterComputeResource
1 virtualmachine.go:292] Object [{{ClusterComputeResource:domain-c69 [] []} Folder:group-h4 [] [] [] [] cluster1 [] [] [] [] []}] has attached Tags [[urn:vmomi:InventoryServiceTag:34d348cf-82a5-4b62-8faa-c0bb5128f9e1:GLOBAL]]
1 virtualmachine.go:300] Found tag: zone-a for object {{ClusterComputeResource:domain-c69 [] []} Folder:group-h4 [] [] [] [] cluster1 [] [] [] [] []}
1 virtualmachine.go:306] Found category: k8s-zone for object {{ClusterComputeResource:domain-c69 [] []} Folder:group-h4 [] [] [] [] cluster1 [] [] [] [] []} with tag: zone-a
1 virtualmachine.go:285] Name: group-h4, Type: Folder
1 virtualmachine.go:285] Name: datacenter-2, Type: Datacenter
1 virtualmachine.go:292] Object [{{Datacenter:datacenter-2 [] []} Folder:group-d1 [] [] [] [] Datacenter [] [] [] [] []}] has attached Tags [[urn:vmomi:InventoryServiceTag:cbe20733-aed0-4561-b6e5-1fc22e4c1dd3:GLOBAL]]
1 virtualmachine.go:300] Found tag: region-1 for object {{Datacenter:datacenter-2 [] []} Folder:group-d1 [] [] [] [] Datacenter [] [] [] [] []}
1 virtualmachine.go:306] Found category: k8s-region for object {{Datacenter:datacenter-2 [] []} Folder:group-d1 [] [] [] [] Datacenter [] [] [] [] []} with tag: region-1
1 virtualmachine.go:347] MoRef [VirtualMachine:vm-83] belongs to zone [zone-a] and region [region-1]
1 nodes.go:150] Obtained list of nodeVMs [[VirtualMachine:vm-79 [VirtualCenterHost: 100.2.51.14, UUID: 421c972b-903b-101c-84ae-d291ab325231, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2, VirtualCenterHost: 100.2.51.14]] VirtualMachine:vm-83 [VirtualCenterHost: 100.2.51.14, UUID: 421c056b-782d-e3fd-758b-9109abdb9162, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2, VirtualCenterHost: 100.2.51.14]]]] for zone [zone-a] and region [region-1]
1 nodes.go:219] Getting accessible datastores for node VirtualMachine:vm-79
1 nodes.go:219] Getting accessible datastores for node VirtualMachine:vm-83
1 nodes.go:156] Obtained shared datastores : [] for topology: segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-1" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-a" >
1 controller.go:165] Shared datastores [[Datastore: Datastore:datastore-68, datastore URL: ds:///vmfs/volumes/5df10bce-59da0dfa-df69-6c92bfebd126/ Datastore: Datastore:datastore-71, datastore URL: ds:///vmfs/volumes/vsan:52b98b583dea3365-d797441ba454996d/]] retrieved for topologyRequirement [requisite:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-1" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-a" > > preferred:<segments:<key:"failure-domain.beta.kubernetes.io/region" value:"region-1" > segments:<key:"failure-domain.beta.kubernetes.io/zone" value:"zone-a" > > ] with datastoreTopologyMap [+map[ds:///vmfs/volumes/5df10bce-59da0dfa-df69-6c92bfebd126/:[map[failure-domain.beta.kubernetes.io/region:region-1 failure-domain.beta.kubernetes.io/zone:zone-a]] ds:///vmfs/volumes/vsan:52b98b583dea3365-d797441ba454996d/:[map[failure-domain.beta.kubernetes.io/region:region-1 failure-domain.beta.kubernetes.io/zone:zone-a]]]]
1 vsphereutil.go:115] vSphere CNS driver creating volume pvc-53813478-1248-4e0a-a134-250e240b6dbf with create spec (*types.CnsVolumeCreateSpec)(0xc0007bb810)({
(len=40) "pvc-53813478-1248-4e0a-a134-250e240b6dbf",
(string) (len=5) "BLOCK",
([]types.ManagedObjectReference) (len=2 cap=2) {
Datastore:datastore-68,
Datastore:datastore-71
olumeMetadata) {
(types.CnsContainerCluster) {
(string) (len=10) "KUBERNETES",
(len=8) "cluster1",
(string) (len=27) "[email protected]"
([]types.BaseCnsEntityMetadata)
(*types.CnsBackingObjectDetails)(0xc000859df0)({
eVirtualMachineProfileSpec)
1 manager.go:91] Update VSphereUser from [email protected] to VSPHERE.LOCAL\Administrator
1 manager.go:110] CreateVolume: VolumeName: "pvc-53813478-1248-4e0a-a134-250e240b6dbf", opId: "5d599dc2"
1 manager.go:126] failed to create cns volume. createSpec: "(*types.CnsVolumeCreateSpec)(0xc0007bb810)({\n DynamicData: (types.DynamicData) {\n },\n Name: (string) (len=40) "pvc-53813478-1248-4e0a-a134-250e240b6dbf",\n VolumeType: (string) (len=5) "BLOCK",\n Datastores: ([]types.ManagedObjectReference) (len=2 cap=2) {\n (types.ManagedObjectReference) Datastore:datastore-68,\n (types.ManagedObjectReference) Datastore:datastore-71\n },\n Metadata: (types.CnsVolumeMetadata) {\n DynamicData: (types.DynamicData) {\n },\n ContainerCluster: (types.CnsContainerCluster) {\n DynamicData: (types.DynamicData) {\n },\n ClusterType: (string) (len=10) "KUBERNETES",\n ClusterId: (string) (len=8) "cluster1",\n VSphereUser: (string) (len=27) "VSPHERE.LOCAL\\Administrator"\n },\n EntityMetadata: ([]types.BaseCnsEntityMetadata) \n },\n BackingObjectDetails: (*types.CnsBackingObjectDetails)(0xc000859df0)({\n DynamicData: (types.DynamicData) {\n },\n CapacityInMb: (int64) 10240\n }),\n Profile: ([]types.BaseVirtualMachineProfileSpec) \n})\n", fault: "(*types.CnsFault)(0xc000971700)({\n Fault: (*types.BaseMethodFault)(0xc000623fa0)(),\n LocalizedMessage: (string) (len=81) "CnsFault error: CNS: Failed to create disk.:Fault cause: vmodl.fault.SystemError\n"\n})\n", opId: "5d599dc2"
1 vsphereutil.go:118] Failed to create disk pvc-53813478-1248-4e0a-a134-250e240b6dbf with error CnsFault error: CNS: Failed to create disk.:Fault cause: vmodl.fault.SystemError
1 controller.go:195] Failed to create volume. Error: CnsFault error: CNS: Failed to create disk.:Fault cause: vmodl.fault.SystemError

Add e2e tests for plugin that use VMC

The goal here is have a set of tests that can run on VMC infrastructure that exercises E2E functionality of the CSI plugin.

To this end, the following tasks need to be completed:

Add support to sk8 for deploying the CSI plugin
Determine how to build an e2e.test binary from tests housed in this repo that use the existing K8s testing framework
Determine which E2E storage tests from the K8s repo can be run on VMC infrastructure. This will be a subset of what is there now, as most "disruptive" tests (e.g. those that reboot vCenter infrastructure components) cannot be run on VMC.
Determine what/where/how to run "destructive" tests, or tests that require additional infrastructure that is hard to achieve on VMC (e.g. multi-vcenter testing, zones, etc.).
Once prow jobs can launch tests onto VMC, run the E2E tests there and report status back to testgrid

These tests would be the "real" tests, that run on VMC against an actual vCenter instance.

This issue originally found at kubernetes/cloud-provider-vsphere#75

Cloud provider not initialized properly - RegisterPlugin error

/kind bug

What happened:
Symptom: We are trying to create PV, PVCs and Pods with data in one of the datastores. Following the examples in: https://github.com/kubernetes/examples/tree/master/staging/volumes/vsphere. Both Volumes and Persistent Volumes.. Example 1) and 2) fails with errors that the cloud provider is not initialized

Unable to mount volumes for pod "test-vmdk_default(bd6a832b-a92e-47d4-af1d-ddd7abf2727e)": timeout expired waiting for volumes to attach or mount for pod "default"/"test-vmdk". list of unmounted volumes=[test-volume]. list of unattached volumes=[test-volume default-token-spblv]; skipping pod
Dec 12 18:16:34 ait-kube-5 kubelet[6199]: E1212 18:16:34.805840 6199 pod_workers.go:190] Error syncing pod bd6a832b-a92e-47d4-af1d-ddd7abf2727e ("test-vmdk_default(bd6a832b-a92e-47d4-af1d-ddd7abf2727e)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"test-vmdk". list of unmounted volumes=[test-volume]. list of unattached volumes=[test-volume default-token-spblv]
Dec 12 18:16:48 ait-kube-5 kubelet[6199]: E1212 18:16:48.882631 6199 vsphere_volume_util.go:196] Cloud provider not initialized properly
Dec 12 18:16:48 ait-kube-5 kubelet[6199]: E1212 18:16:48.882656 6199 vsphere_volume_util.go:196] Cloud provider not initialized properly

I also see this error in the syslog of the node that the pod is being created on:

Error: "RegisterPlugin error -- failed to get plugin info using RPC GetInfo at socket /var/lib/kubelet/plugins_registry/csi.vsphere.vmware.com/csi.sock, err: rpc error: code = Unimplemented desc = unknown service pluginregistration.Registration"

At this time we are not sure if this is a driver issue or not. Any thoughts would be helpful.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
Provider IDs have been set
root@ait-kube-1:~# kubectl get nodes -o json | jq '.items[]|[.metadata.name, .spec.providerID, .status.nodeInfo.systemUUID]'
[
"ait-kube-1",
"vsphere://4239a93d-248a-add7-5ee1-c258cf1c4f43",
"3da93942-8a24-d7ad-5ee1-c258cf1c4f43"
]
[
"ait-kube-2",
"vsphere://4239ea73-5a4f-e69c-df1c-ce247f6caa1f",
"73ea3942-4f5a-9ce6-df1c-ce247f6caa1f"
]
[
"ait-kube-3",
"vsphere://423938d1-73ce-04b4-e393-dfcddb81dacb",
"d1383942-ce73-b404-e393-dfcddb81dacb"
]
[
"ait-kube-4",
"vsphere://4239ea4f-1150-6104-72ae-26816515d2e3",
"4fea3942-5011-0461-72ae-26816515d2e3"
]
[
"ait-kube-5",
"vsphere://4239ba10-cb9a-8e2c-9c60-79e2f4e70762",
"10ba3942-9acb-2c8e-9c60-79e2f4e70762"
]
[
"ait-kube-6",
"vsphere://4239ac6e-6dd8-7aaa-9a85-4177d4952057",
"6eac3942-d86d-aa7a-9a85-4177d4952057"
]

disk.EnableUUID has been set on all the nodes
kubectl get CSINodes -o wide
NAME CREATED AT
ait-kube-2 2019-12-11T16:41:06Z
ait-kube-3 2019-12-11T16:41:07Z
ait-kube-4 2019-12-11T16:41:07Z
ait-kube-5 2019-12-11T16:41:06Z
ait-kube-6 2019-12-11T16:41:07Z
kubectl describe csidrivers.storage.k8s.io
Name: csi.vsphere.vmware.com
Namespace:
Labels:
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"storage.k8s.io/v1beta1","kind":"CSIDriver","metadata":{"annotations":{},"name":"csi.vsphere.vmware.com"},"spec":{"attachReq...
API Version: storage.k8s.io/v1beta1
Kind: CSIDriver
Metadata:
Creation Timestamp: 2019-12-11T16:40:40Z
Resource Version: 126126
Self Link: /apis/storage.k8s.io/v1beta1/csidrivers/csi.vsphere.vmware.com
UID: caf11984-c90c-47d9-832f-a2beb8636c18
Spec:
Attach Required: true
Pod Info On Mount: false
Events:

Environment:

csi-vsphere version:
quay.io/k8scsi/csi-attacher:v1.1.1
gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1
gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1
quay.io/k8scsi/csi-provisioner:v1.2.2
quay.io/k8scsi/csi-node-driver-registrar:v1.1.0
vsphere-cloud-controller-manager version:
gcr.io/cloud-provider-vsphere/cpi/release/manager:latest
Kubernetes version:
1.15.6
vSphere version:
6.7.0 build 14368073
OS (e.g. from /etc/os-release):
Ubuntu 18.10
Kernel (e.g. uname -a):
18.0-25-generic #26-Ubuntu SMP Mon Jun 24 09:32:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux - Install tools:
Others:

Using the CSI driver directly to an esxi host as a Dcenter

Is it possible to use a standalone esxi 6.7u3 server as the datacenter host. This is for Lab testing only the lab doesn't have a vcenter installed.

Thanks
Phil

vsphere-cloud-controller-manager and vsphere-csi-controller can't be scheduled

I tried following documentation at https://cloud-provider-vsphere.sigs.k8s.io/tutorials/kubernetes-on-vsphere-with-kubeadm.html and https://docs.vmware.com/en/VMware-vSphere/6.7/Cloud-Native-Storage/GUID-039425C1-597F-46FF-8BAA-C5A46FF10E63.html which I believe is current on setting up the vSphere CSI driver. However, when I get to the step for creating the CPI daemonset or the statefulset for vsphere-csi-controller, the manifest is created however it fails to schedule.

The CPI daemonset vsphere-cloud-controller-manager shows up in kube-system but has no pod at all.

The vsphere-csi-controller fails to schedule with with the following error.

0/7 nodes are available: 7 node(s) didn't match node selector.

I have a 7 node cluster with 3 master/etcd nodes and 4 worker nodes. Following the documentation, the master nodes were tainted with node-role.kubernetes.io/master=:NoSchedule and the worker nodes do not have any taints.

The cluster was provisioned using Rancher 2.3.3, running Kubernetes 1.16.3. The master nodes comes with these taints by default.

Create a SECURITY_CONTACTS file.

As per the email sent to kubernetes-dev[1], please create a SECURITY_CONTACTS
file.

The template for the file can be found in the kubernetes-template repository[2].
A description for the file is in the steering-committee docs[3], you might need
to search that page for "Security Contacts".

Please feel free to ping me on the PR when you make it, otherwise I will see when
you close this issue. :)

Thanks so much, let me know if you have any questions.

(This issue was generated from a tool, apologies for any weirdness.)

[1] https://groups.google.com/forum/#!topic/kubernetes-dev/codeiIoQ6QE
[2] https://github.com/kubernetes/kubernetes-template-project/blob/master/SECURITY_CONTACTS
[3] https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance-template-short.md

Need to fix logic for determining raw block device

CC: @shalini-b

Is this a BUG REPORT or FEATURE REQUEST?:
BUG REPORT

/kind bug

What happened:
Current code is not returning raw block device correcting
https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/pkg/csi/service/node.go#L810-L835

	for _, m := range mnts {
		if m.Path == target {
			// something is mounted to target, get underlying disk
			d := m.Device
			if m.Device == "devtmpfs" {
				d = m.Source
			}
			dev, err := getDevice(d)
			if err != nil {
				return nil, err
			}
			return dev, nil
		}
	}

this needs to be fixed as below

	for _, m := range mnts {
		if m.Path == target {
			// something is mounted to target, get underlying disk
			d := m.Device
			if m.Device == "udev" {
				d = m.Source
			}
			dev, err := getDevice(d)
			if err != nil {
				return nil, err
			}
			return dev, nil
		}
	}

example for RAW block device

Device:udev
Path:/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-098a7585-109c-11ea-94c1-005056825b1f
Source:/dev/sdb
Type:devtmpfs
Opts:[rw relatime]}

example for Mounted block device

Device:/dev/sdb
Path:/var/lib/kubelet/pods/c46d6473-0810-11ea-94c1-005056825b1f/volumes/kubernetes.io~csi/pvc-9e3d1d08-080f-11ea-be93-005056825b1f/mount
Source:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-9e3d1d08-080f-11ea-be93-005056825b1f/globalmount
Type:ext4
Opts:[rw relatime]

What you expected to happen:
Raw block volume should be successfully removed from the node, when pod using it is deleted.

How to reproduce it (as minimally and precisely as possible):
Create a RAW block volume, Create Pod, Delete Pod and observe logs on the node.

Anything else we need to know?:

Environment:

csi-vsphere version:
vsphere-cloud-controller-manager version:
Kubernetes version:
vSphere version:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

When a Vmware node goes down, Kubernetes is unaware and PV isn't released

Migrated from kubernetes/cloud-provider-vsphere#185

Initially filed by: @d4larso

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
If a Vmware node with a PV crashes, is shutdown or halted, Kubernetes is unaware and doesn't release the a pod's claim (PVC) on the PV.

What you expected to happen:
The PV should become "unattached" so the node and can be picked up for use by a Pod on another node.

How to reproduce it (as minimally and precisely as possible):
Shutdown or halt a vmware VM and note that the associated PV can't be used by other nodes until the claim on the PV is deleted.

Anything else we need to know?:

Environment:

vsphere-cloud-controller-manager version: vSphere Client version 6.7.0.30000
OS (e.g. from /etc/os-release): Ubuntu 16.04.6 LTS
Kernel (e.g. uname -a): Linux us03479-vsp1-m01 4.4.0-145-generic #171-Ubuntu SMP Tue Mar 26 12:43:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
Others:

no supportability statement presents here or in the doc

There doesn't appear to be any statement around supportability for CNS either here or in the public documentation (https://docs.vmware.com/en/VMware-vSphere/6.7/Cloud-Native-Storage/GUID-10E170E2-79E7-4E72-A117-BC3A4725494D.html). Currently, support is community-based, outside of any errors/issues noted in vSphere/VSAN. This is causing confusion for both customers and VMware GSS around how to best resolve issues with configuration/instantiation.

Passwords cannot use escape characters

/kind bug

What happened:
CSI stateful set is unable to start when passwords have backslash in it (and possibly other special characters but I didn't do extensive testing).

When following the guide, under csi-vsphere.conf, I'm unable to use a username with backslash, which is talked about in issue 214, and seems to be a workaround. I also can't seem to use a backslash within a password, which is a bigger problem.

The error message looks like this:
time="2020-01-06T21:43:52Z" level=fatal msg="grpc failed" error="6:12: unquoted '\\' must be followed by new line or double quote"

I've tried escaping the backslash, encoding it (base64), and passing with single quotes, none of which work.

What you expected to happen:
Username and password to be accepted, even with special characters.

How to reproduce it (as minimally and precisely as possible):

$ cat csi-vsphere.conf 
[Global]
cluster-id = "k8s-cluster1"
[VirtualCenter "10.1.1.14"]
insecure-flag = "true"
user = "[email protected]"
password = "}sso\d$2!UsO"
port = "443"
datacenters = "Vancouver"

ubuntu@k8s-master:~$ kubectl create secret generic vsphere-config-secret --from-file=csi-vsphere.conf --namespace=kube-system
ubuntu@k8s-master:~$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/master/manifests/1.14/deploy/vsphere-csi-controller-ss.yaml

Anything else we need to know?:
Single quotes don't seem to work at all. Even once I simplified the password, surrounding them with single quotes failed with a permission denied.

Interestingly, single quotes seem to work in handling backslashes when deploying the CPI, but that's in YAML format as opposed to this which is from a conf file. (Under "Create a CPI secret" in the documentation)

Environment:

vsphere-cloud-controller-manager version: gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1
OS (e.g. from /etc/os-release): Ubuntu 18.04.3 LTS
Kernel (e.g. uname -a): 4.15.0-72-generic
Install tools:
Others:

CSI fails to find a VM and subsequently attach container volumes

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

Deployed CCM using https://docs.vmware.com/en/VMware-vSphere/6.7/Cloud-Native-Storage/GUID-10E170E2-79E7-4E72-A117-BC3A4725494D.html
Deployed CSI using : https://docs.vmware.com/en/VMware-vSphere/6.7/Cloud-Native-Storage/GUID-039425C1-597F-46FF-8BAA-C5A46FF10E63.html

Created an example storage class, pvc and a pod to use it.

Error

Name:         csi-9dbe0b32239d939135cee41386052a354ca3ab232b797319c0d1332dd056dc11
Namespace:
Labels:       <none>
Annotations:  csi.alpha.kubernetes.io/node-id: d8dfbbd3-da96-4d11-9821-74161b8f881d
API Version:  storage.k8s.io/v1
Kind:         VolumeAttachment
Metadata:
  Creation Timestamp:  2019-10-07T18:22:24Z
  Finalizers:
    external-attacher/csi-vsphere-vmware-com
  Resource Version:  11322
  Self Link:         /apis/storage.k8s.io/v1/volumeattachments/csi-9dbe0b32239d939135cee41386052a354ca3ab232b797319c0d1332dd056dc11
  UID:               68e9999c-e92f-11e9-aa2f-00505691fed6
Spec:
  Attacher:   csi.vsphere.vmware.com
  Node Name:  30.1.0.5
  Source:
    Persistent Volume Name:  pvc-af46fcb2-e92e-11e9-aa2f-00505691fed6
Status:
  Attach Error:
    Message:  rpc error: code = Internal desc = Failed to find VirtualMachine for node:"d8dfbbd3-da96-4d11-9821-74161b8f881d". Error: node wasn't found
    Time:     2019-10-07T18:26:30Z
  Attached:   false
Events:       <none>

What you expected to happen:
The volumeattachment should have been created.
The details as evidenced below

CSI DS

kubo@jumper:~/ccm+csi/csideploy$ kubectl get daemonsets vsphere-csi-node --namespace=kube-system
NAME               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
vsphere-csi-node   3         3         3       3            3           <none>          30m

CSI DRIVER

kubo@jumper:~/ccm+csi/csideploy$ kubectl get csidrivers
kubectl get nodes -o wide
NAME                     CREATED AT
csi.vsphere.vmware.com   2019-10-07T18:02:53Z

Nodes

kubo@jumper:~/ccm+csi/csideploy$ kubectl get nodes -o wide
NAME       STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
30.1.0.3   Ready    <none>   54m   v1.14.5   30.1.0.3      30.1.0.3      Ubuntu 16.04.6 LTS   4.15.0-64-generic   docker://18.9.8
30.1.0.4   Ready    <none>   53m   v1.14.5   30.1.0.4      30.1.0.4      Ubuntu 16.04.6 LTS   4.15.0-64-generic   docker://18.9.8
30.1.0.5   Ready    <none>   52m   v1.14.5   30.1.0.5      30.1.0.5      Ubuntu 16.04.6 LTS   4.15.0-64-generic   docker://18.9.8

Provider- IDs (Lower Cased)

kubo@jumper:~/ccm+csi/csideploy$ kubectl describe nodes | grep -i "provider"
ProviderID:                  vsphere://4211af5a-7b1b-f972-a394-e3a41eecc96b
ProviderID:                  vsphere://4211d4c6-68d8-9dc7-5df9-8dd5dfdbb7a2
ProviderID:                  vsphere://421199e0-cfe0-301d-d817-9112dae5d0a9
kubo@jumper:~/ccm+csi/csideploy$

How to reproduce it (as minimally and precisely as possible):
Deploy a PKS environment. Deploy CCM and CSI. Need to change the default locations for sockets.

Anything else we need to know?:
It is trying to find vm by DNS name in spite of provider-id(BIOS_UUID) set.

Environment:

csi-vsphere version:

csi-driver-new.yaml
24:          image: quay.io/k8scsi/csi-attacher:v1.1.1
36:          image: vmware/vsphere-block-csi-driver:v1.0.0
58:          image: quay.io/k8scsi/livenessprobe:v1.1.0
68:          image: vmware/volume-metadata-syncer:v1.0.0
82:          image: quay.io/k8scsi/csi-provisioner:v1.2.1
132:          image: quay.io/k8scsi/csi-node-driver-registrar:v1.1.0
154:          image: vmware/vsphere-block-csi-driver:v1.0.0
185:          image: quay.io/k8scsi/livenessprobe:v1.1.0

vsphere-cloud-controller-manager version:

ccm_deploy_mgr_ds.yaml
35:          image: gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.0.0

Kubernetes version: v1.14.5
vSphere version: 6.7u3 GA
OS (e.g. from /etc/os-release): Ubuntu 16.04.6 LTS
Kernel (e.g. uname -a): Linux bea8bee0-8fe4-4e34-963d-63ee576dd78b 4.15.0-64-generic #73~16.04.1-Ubuntu SMP Fri Sep 13 09:56:18 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
Others:

CSI Controller (ss) and CSI Node (ds) can't run on the same node

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
According to the recommended manifest, both the CSI Controller (StatefulSet) and the CSI node driver (DaemonSet) run with hostNetwork: true and with the livenessprobe container. As a result, whatever node the statefulset pod runs on, there is a port conflict on the host network since the livenesss probe port of both pods are shared.

Example:

$ kubectl -n kube-system get po
NAME                                                    READY   STATUS    RESTARTS   AGE
calico-kube-controllers-7bfdd87774-b745l                1/1     Running   0          3m10s
calico-node-6n4sd                                       1/1     Running   0          3m10s
calico-node-bkvhj                                       1/1     Running   0          3m10s
coredns-fb8b8dccf-gq746                                 1/1     Running   0          9m33s
coredns-fb8b8dccf-hqhj6                                 1/1     Running   0          9m33s
etcd-target-cluster-controlplane-0                      1/1     Running   0          8m42s
kube-apiserver-target-cluster-controlplane-0            1/1     Running   0          8m51s
kube-controller-manager-target-cluster-controlplane-0   1/1     Running   0          8m26s
kube-proxy-4xbk2                                        1/1     Running   0          4m1s
kube-proxy-5sbgj                                        1/1     Running   0          9m32s
kube-scheduler-target-cluster-controlplane-0            1/1     Running   0          8m42s
vsphere-cloud-controller-manager-bhchn                  1/1     Running   0          5m41s
vsphere-csi-controller-0                                5/5     Running   0          29s
vsphere-csi-node-drmdm                                  3/3     Running   0          3m1s
vsphere-csi-node-r9knf                                  2/3     Error     1          4s

$ kubectl -n kube-system logs -f vsphere-csi-node-r9knf -c liveness-probe
I0927 22:39:29.297558       1 connection.go:151] Connecting to unix:///csi/csi.sock
I0927 22:39:29.298291       1 main.go:86] calling CSI driver to discover driver name
I0927 22:39:29.298975       1 main.go:91] CSI driver name: "csi.vsphere.vmware.com"
I0927 22:39:29.298986       1 main.go:100] Serving requests to /healthz on: 0.0.0.0:9808
F0927 22:39:29.299074       1 main.go:103] failed to start http server with error: listen tcp 0.0.0.0:9808: bind: address already in use

What you expected to happen:
Both the CSI controller and CSI node driver should be able to run on the same host without erroring due to port conflicts.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

csi-vsphere version:
vsphere-cloud-controller-manager version:
Kubernetes version:
vSphere version:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

Support for Metrics

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened:

The vSphere CSI driver currently doesn't support exposing vSphere volume metrics (capacity, usage, ...) via the GetNodeVolumeStats endpoints.
The GetNodeVolumeStats endpoint is used by kubelets to expose kubelet_stats_volume* prometheus metrics.

What you expected to happen:

The CSI driver should implement this feature.
These metrics could then be used by monitoring / alerting stacks

How to reproduce it (as minimally and precisely as possible):
By using prometheus, or directly getting kubelet prometheus endpoint with curl.
No kubelet_stats_volume* are exposed for vSphere volumes

Anything else we need to know?:

Environment:

csi-vsphere version: 1.0.1
vsphere-cloud-controller-manager version: 1.0.0
Kubernetes version: 1.16.1
vSphere version: 6.7U3
OS (e.g. from /etc/os-release): CentOS 7
Kernel (e.g. uname -a):
Install tools: Kubeadm
Others:

Implement max_volumes_per_node so scheduler can determine whether to schedule pod that need PV

/kind feature

The CSI-driver should expose volume attach limits, as per the docs.

For CSI, any driver that advertises volume attach limits via CSI specs will have those limits available as the Node’s allocatable property and the Scheduler will not schedule Pods with volumes on any Node that is already at its capacity. Refer to the CSI specs for more details.

What happened:

driver only implements node_id, does not return data on max volumes.

Current code:

func (s *service) NodeGetInfo(
	ctx context.Context,
	req *csi.NodeGetInfoRequest) (
	*csi.NodeGetInfoResponse, error) {

	id, err := os.Hostname()
	if err != nil {
		return nil, status.Errorf(codes.Internal,
			"Unable to retrieve Node ID, err: %s", err)
	}

	return &csi.NodeGetInfoResponse{
		NodeId: id,
	}, nil
}

needs:

  // Maximum number of volumes that controller can publish to the node.
  // If value is not set or zero CO SHALL decide how many volumes of
  // this type can be published by the controller to the node. The
  // plugin MUST NOT set negative values here.
  // This field is OPTIONAL.
  int64 max_volumes_per_node = 2;

What you expected to happen:

Driver calculates max volumes available per node so scheduler can make informed decision on whether pod can land on node.

controller cannot connect with credentials from the secret

/kind bug

What happened:
The controller cannot connect with credentials from the secret.

What you expected to happen:
I created a secret as described in the example: https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/docs/deploying_csi_vsphere_with_rbac.md#2-optional-but-recommended-storing-vcenter-credentials-in-a-kubernetes-secret

apiVersion: v1
data:
  dc-vsphere.local.password: cGFzc3dvcmQK=
  dc-vsphere.local.username: dXNlcm5hbWUK
kind: Secret
metadata:
  name: vsphere-csi-driver-credentials
type: Opaque

And the configuration file:

apiVersion: v1
data:
  vsphere.conf: |
    [Global]
    secret-name = "vsphere-csi-driver-credentials"
    secret-namespace = "kube-system"
    service-account = "vsphere-csi-controller"

    port = "443"
    insecure-flag = "1"
    [VirtualCenter "dc-vsphere.local"]
    datacenters = "X2"
kind: ConfigMap
metadata:
  name: csi-config

In controller logs:

time="2019-07-05T06:50:58Z" level=debug msg="enabled context injector"
time="2019-07-05T06:50:58Z" level=debug msg="init req & rep validation" withSpec=false
time="2019-07-05T06:50:58Z" level=debug msg="init implicit rep validation" withSpecRep=false
time="2019-07-05T06:50:58Z" level=debug msg="init req validation" withSpecReq=false
time="2019-07-05T06:50:58Z" level=debug msg="enabled request ID injector"
time="2019-07-05T06:50:58Z" level=debug msg="enabled request logging"
time="2019-07-05T06:50:58Z" level=debug msg="enabled response logging"
time="2019-07-05T06:50:58Z" level=debug msg="enabled serial volume access"
time="2019-07-05T06:50:58Z" level=info msg="Initializing CSI for Kubernetes"
I0705 06:50:58.877520       1 reflector.go:202] Starting reflector *v1.Secret (0s) from pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99
I0705 06:50:58.877542       1 reflector.go:240] Listing and watching *v1.Secret from pkg/mod/k8s.io/[email protected]+incompatible/tools/cache/reflector.go:99
E0705 06:51:02.906278       1 connection.go:63] Failed to create govmomi client. err: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
time="2019-07-05T06:51:03Z" level=info msg="configured: vsphere.csi.vmware.com" api=FCD mode=controller
time="2019-07-05T06:51:03Z" level=info msg="identity service registered"
time="2019-07-05T06:51:03Z" level=info msg="controller service registered"
time="2019-07-05T06:51:03Z" level=info msg=serving endpoint="unix:///var/lib/csi/sockets/pluginproxy/csi.sock"
time="2019-07-05T06:51:03Z" level=debug msg="/csi.v1.Identity/GetPluginInfo: REQ 0001: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2019-07-05T06:51:03Z" level=debug msg="/csi.v1.Identity/GetPluginInfo: REP 0001: Name=vsphere.csi.vmware.com, VendorVersion=v0.2.0, XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"

Anything else we need to know?:
Controller version: gcr.io/cloud-provider-vsphere/vsphere-csi:v0.2.0

Environment:

csi-vsphere version: 0.2.0
vsphere-cloud-controller-manager version: without controller-manager
Kubernetes version: 1.15.0
vSphere version: 6.7.0.30000
OS (e.g. from /etc/os-release): Ubuntu 16.04.6 LTS
Kernel (e.g. uname -a): 4.15.0-54-generic
Install tools: helm

Error: No Virtual Center hosts defined

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
After deploying vsphere-csi-controller, deploy in CrashLoopBackOff. Logs read:
I0910 21:54:10.733816 1 config.go:265] GetCnsconfig called with cfgPath: /etc/cloud/csi-vsphere.conf
E0910 21:54:10.734044 1 config.go:205] No Virtual Center hosts defined
E0910 21:54:10.734119 1 config.go:272] Error reading vsphere.conf
E0910 21:54:10.734183 1 service.go:125] Failed to read cnsconfig. Error: No Virtual Center hosts defined

What you expected to happen:
Successful deploy
How to reproduce it (as minimally and precisely as possible):
deploy CSI driver following instructions in docs.

Anything else we need to know?:
Where is vsphere.conf supposed to be located? Instructions don't mention csi-vsphere.conf.

Environment:

csi-vsphere version: v1.2.1-0-g971feacb
vsphere-cloud-controller-manager version:
Kubernetes version: 1.15.0
vSphere version: 6.7.0 U3
OS (e.g. from /etc/os-release):18.04.3 LTS
Kernel (e.g. uname -a):Linux db-vm-csi-master 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
Others:

Explore replacing pkg/common/cns-lib/vsphere with vclib from CPI

Is this a BUG REPORT or FEATURE REQUEST?:

/kind cleanup

There is a lot of code in pkg/common/cns-lib/vsphere that is duplicated or copied from the vclib package in the CPI. It needs to be explored if vclib from CPI can be used directly instead. If not, what is missing? Can the common package be enhanced?

When CSI finds sharedDatastores that it lacks create access to, then it will fail to provision disk.

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Got error about "ServerFaultCode: NoPermission" for a datastore that the storagepolicy is not even going to provision the disk on.
When there are visible and irrelevant datastores ("Browse Datastore" permission).

In this log you will first see the "NoPermission" and later when we removed visibility of the other irrelevant datastores("Browse Datastore") then it succeeds.
dev-csi-controller.log

What you expected to happen:
Expect disc to be provisioned from the datastore that the storagepolicy is setup for and ignore the other datastores.

How to reproduce it (as minimally and precisely as possible):
Create a StorageClass with a StoragePolicy that will only provision to datastore A.
Make datastore B visible("Browse Datastore" permission).
Create a PVC for the policy that only schedules/matches to datastore A.
You will get error.

Anything else we need to know?:
When we enable write access to both (even though datastore B is not going to get any disks) then it successfully creates disk on datastore A.
Alternatively we can make datastore B not visible and it will also start working.
Environment:

csi-vsphere version: csi-provisioner:v1.2.2
vsphere-cloud-controller-manager version: 933c60515902 | latest v1.0.0
Kubernetes version:v1.16.3
vSphere version: vSphere 6.7 U3 / VM HWVersion: 15
OS (e.g. from /etc/os-release):Ubuntu 18.04.3 LTS
Kernel (e.g. uname -a): Linux k8s-master-01 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Install tools: rke 1.0
Others:

VolumeMetadata has invalid cluster Id

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened: Volume failed to provision (stuck as Pending)

What you expected to happen: Volume to provision

How to reproduce it (as minimally and precisely as possible):
Create example-sc.yaml and apply it to cluster then create a pvc.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: vcsi-fcd-primary
provisioner: csi.vsphere.vmware.com
parameters:
  datastoreURL: "ds:///vmfs/volumes/5a517b2e-14fbf45e-4466-a0369fa168ac/"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: example-sc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: vcsi-fcd-primary

Anything else we need to know?:
Logs from controller:

csi-provisioner W1006 15:04:35.786897       1 controller.go:886[] Retrying syncing claim "6b70a95b-b4d3-4185-a78f-20131cc8cada", failure 0
csi-provisioner E1006 15:04:35.788745       1 controller.go:908[] error syncing claim "6b70a95b-b4d3-4185-a78f-20131cc8cada": failed to provision volume with StorageClass "vcsi-fcd-primary": rpc error: code = Internal desc = Failed to create volume. Error: ServerFaultCode: VolumeMetadata has invalid cluster Id
csi-provisioner I1006 15:04:35.788469       1 event.go:209[] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"example-sc", UID:"6b70a95b-b4d3-4185-a78f-20131cc8cada", APIVersion:"v1", ResourceVersion:"40115228", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "vcsi-fcd-primary": rpc error: code = Internal desc = Failed to create volume. Error: ServerFaultCode: VolumeMetadata has invalid cluster Id
csi-provisioner I1006 15:05:45.870913       1 reflector.go:370[] sigs.k8s.io/sig-storage-lib-external-provisioner/controller/controller.go:800: Watch close - *v1.PersistentVolumeClaim total 0 items received
csi-provisioner I1006 15:09:35.789161       1 controller.go:1196[] provision "default/example-sc" class "vcsi-fcd-primary": started
csi-provisioner I1006 15:09:35.791504       1 event.go:209[] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"example-sc", UID:"6b70a95b-b4d3-4185-a78f-20131cc8cada", APIVersion:"v1", ResourceVersion:"40115228", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/example-sc"
csi-provisioner W1006 15:09:35.793554       1 topology.go:331[] No topology keys found on any node
csi-provisioner I1006 15:09:35.879482       1 controller.go:979[] Final error received, removing PVC 6b70a95b-b4d3-4185-a78f-20131cc8cada from claims in progress

Environment:

csi-vsphere version: v1.0.0 (using latest manifests in this repo)
vsphere-cloud-controller-manager version: v1.0.0 (using latest manifests in this repo)
Kubernetes version: 1.15.2
vSphere version: 6.7U3
OS (e.g. from /etc/os-release): Fedora 30
Kernel (e.g. uname -a): 5.0.10-300.fc30.x86_64
Install tools: kubeadm
Others:

Failed to provision volume: NoPermission

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Configured Cloud Native Storage with CPI and CSI driver, using topology aware approach with assigning region/zone tags on vSphere dc/cluster levels.
Applied the permissions mask on vSphere as per https://docs.vmware.com/en/VMware-vSphere/6.7/Cloud-Native-Storage/GUID-AEB07597-F303-4FDD-87D9-0FDA4836E5BB.html
Tried to deploy the example stateful set from https://docs.vmware.com/en/VMware-vSphere/6.7/Cloud-Native-Storage/GUID-B5438870-39F7-4697-B1D0-639F7F11AED4.html

As a result, the volumes are not attached to VMs, the error in csi-provisioner container:

I1127 01:25:28.970664       1 controller.go:1196] provision "default/www-web-0" class "topology-aware-standard": started
W1127 01:25:28.975610       1 controller.go:415] "fstype" is deprecated and will be removed in a future release, please use "csi.storage.k8s.io/fstype" instead
I1127 01:25:28.975729       1 event.go:209] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"www-web-0", UID:"a8ced1a8-10b4-11ea-a6f4-00505686f29b", APIVersion:"v1", ResourceVersion:"1844026", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/www-web-0"
I1127 01:25:33.111905       1 controller.go:1196] provision "default/logs-web-0" class "topology-aware-standard": started
W1127 01:25:33.117419       1 controller.go:415] "fstype" is deprecated and will be removed in a future release, please use "csi.storage.k8s.io/fstype" instead
I1127 01:25:33.117548       1 event.go:209] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"logs-web-0", UID:"a8cf7eb9-10b4-11ea-a6f4-00505686f29b", APIVersion:"v1", ResourceVersion:"1844023", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/logs-web-0"
I1127 01:25:38.843360       1 controller.go:979] Final error received, removing PVC a8ced1a8-10b4-11ea-a6f4-00505686f29b from claims in progress
W1127 01:25:38.843395       1 controller.go:886] Retrying syncing claim "a8ced1a8-10b4-11ea-a6f4-00505686f29b", failure 4
E1127 01:25:38.843413       1 controller.go:908] error syncing claim "a8ced1a8-10b4-11ea-a6f4-00505686f29b": failed to provision volume with StorageClass "topology-aware-standard": rpc error: code = Internal desc = ServerFaultCode: NoPermission
I1127 01:25:38.843413       1 event.go:209] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"www-web-0", UID:"a8ced1a8-10b4-11ea-a6f4-00505686f29b", APIVersion:"v1", ResourceVersion:"1844026", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "topology-aware-standard": rpc error: code = Internal desc = ServerFaultCode: NoPermission
I1127 01:25:43.329574       1 controller.go:979] Final error received, removing PVC a8cf7eb9-10b4-11ea-a6f4-00505686f29b from claims in progress
W1127 01:25:43.329591       1 controller.go:886] Retrying syncing claim "a8cf7eb9-10b4-11ea-a6f4-00505686f29b", failure 4
E1127 01:25:43.329608       1 controller.go:908] error syncing claim "a8cf7eb9-10b4-11ea-a6f4-00505686f29b": failed to provision volume with StorageClass "topology-aware-standard": rpc error: code = Internal desc = ServerFaultCode: NoPermission
I1127 01:25:43.329645       1 event.go:209] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"logs-web-0", UID:"a8cf7eb9-10b4-11ea-a6f4-00505686f29b", APIVersion:"v1", ResourceVersion:"1844023", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "topology-aware-standard": rpc error: code = Internal desc = ServerFaultCode: NoPermission

At the same time, on the vSphere datastore level new FCD volumes are created on each attempt to provision a volume, filling up the corresponding vsan folder.

Right now the following permissions get enforced on all Kubernetes VMs:

Virtual machine
-- Change Configuration
---- Add existing disk
---- Add new disk
---- Add or remove device
---- Remove disk

and on the datastore level:

Datastore
-- Allocate space
-- Low level file operations

What permissions are missing to allow the provisioner attach the volumes to VMs?

What you expected to happen:
Persistent volumes attached correctly to k8s cluster VMs.

How to reproduce it (as minimally and precisely as possible):
Follow https://cloud-provider-vsphere.sigs.k8s.io/tutorials/kubernetes-on-vsphere-with-kubeadm.html step by step and try to provision a test stateful set.

Anything else we need to know?:
The setup looks correct, as zone/regions were propagated to node labels.

Storage class:

apiVersion: v1
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: topology-aware-standard
provisioner: csi.vsphere.vmware.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  fstype: xfs
  storagePolicyName: "vSAN Default Storage Policy"

Environment:

csi-vsphere version:
1.0.1
vsphere-cloud-controller-manager version:
latest
Kubernetes version:
v1.14
vSphere version:
6.7u3
OS (e.g. from /etc/os-release):
CentOS Linux release 7.7.1908 (Core)
Kernel (e.g. uname -a):
3.10.0-1062.4.1.el7.x86_64
Install tools:
kubeadm
Others:

Attach disk should not depend on fault message string returned by CNS server to decide the workflow

Code - https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/pkg/common/cns-lib/volume/manager.go#L193

CNS does not guarantee not changing the fault error message in future. Also the fault error message could be different if different locale is configured on the server which can easily break CSI code. Instead the CSI code should check if the volume is already attached to the VM to be sure.

I'm filing this issue to remove such checks that may fail in future.

Err: No vSphere disk ID/Name found

Migrated from kubernetes/cloud-provider-vsphere#178

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Volume failed to attach to pod after it was restarted.

What you expected to happen:
Volume to successfully reattach as the PersistentVolumeClaim and PersistentVolume still exist as well as the vmdk.

How to reproduce it (as minimally and precisely as possible):
After a grace period (a day or two?), scale the replicas of Deployment/StatefulSet to 0 that has a storage volume attached through the CSI driver. Set the scale back to its original value and CSI driver should fail to attach the volume on the new pod(s):

Warning FailedAttachVolume 5s (x4 over 6m12s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-2e07359a-4f90-11e9-a939-000c29616bad" : rpc error: code = Internal desc = WhichVCandDCByFCDId(927ef76f-2312-4a4e-b634-8fbb13134462) failed. Err: No vSphere disk ID/Name found

Please note that this is independent of the node that the pod previously resided on. The disk does not appear to unmount from the node in vCenter either which seems problematic.

Anything else we need to know?:
csi-attacher logs:

I0403 05:37:54.655419 1 controller.go:173] Started VA processing "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.655541 1 csi_handler.go:93] CSIHandler: processing VA "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.655565 1 csi_handler.go:120] Attaching "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.655915 1 csi_handler.go:259] Starting attach operation for "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.656107 1 csi_handler.go:388] Saving attach error to "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.660047 1 controller.go:139] Ignoring VolumeAttachment "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63" change
I0403 05:37:54.660383 1 csi_handler.go:398] Saved attach error to "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63"
I0403 05:37:54.660413 1 csi_handler.go:103] Error processing "csi-94a7031ea7c96543a069a44a5cba3b8f4db0aa7f2cd7f6800610578ba01e3b63": failed to attach: persistentvolume "pvc-7df261c8-4c50-11e9-a573-000c29616bad" not found

vsphere-csi-controller logs:

I0403 05:43:21.955579 1 datacenter.go:615] DoesFirstClassDiskExist(927ef76f-2312-4a4e-b634-8fbb13134462): NOT FOUND
E0403 05:43:21.955607 1 search.go:329] Error while looking for FCD= in vc=10.0.10.25 and datacenter=homelab: No vSphere disk ID/Name found
time="2019-04-03T05:43:21Z" level=error msg="WhichVCandDCByFCDId(927ef76f-2312-4a4e-b634-8fbb13134462) failed. Err: No vSphere disk ID/Name found"

Environment:

vsphere-cloud-controller-manager version: v0.1.1 and master
OS (e.g. from /etc/os-release): Fedora 28
Kernel (e.g. uname -a): 4.18.8-200.fc28.x86_64
Install tools:
Others:

@codenrhoden:

Hi @Elegant996, thanks for the report.

Just to clarify, you had an existing Deployment that was working as expected, then scaled it down to 0 and then back up?

And you noticed that when scaling to 0, the existing PV was not detached from the node? Just trying to get a clearer picture for when I try to recreate.

@Elegant996

Hi @codenrhoden,

That is correct, this happens with both my Deployments and StatefulSets when scaling. The PVC and PV are still marked as bound in kubectl.

vCenter does not appear to detach the volumes after scaling as it no longer seems to be able to determine the associated vmdk. This issue appears to occur in 90% of my pods.

One StatefulSet (replicas: 1) survived as the CSI driver was able to determine the correct vmdk but for whatever reason was unable to detach it from the node/VM despite being given administrator credentials for testing. Once the affected vmdk was forcefully detached using vCenter, the driver quickly attached it the pod's current node and mounted the volume without issue.

In another scenario with a StatefulSet (replicas: 2), only one survived while the other produced the above warnings/errors.

This will become an issue down the road when attempting to upgrade nodes as draining them will render most if not all of the pods unusable.

Thanks!

Node daemonset fails without csi-vsphre.conf file

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
vSphere CSI Node daemonset fails when csi-vsphere.conf is not mounted as a volume to this container.

What you expected to happen:
It isn't mandatory to mount these configurations to the nodes unless the customer is using a topology aware cluster. For non-topology aware clusters, the node ds should run without any errors.

How to reproduce it (as minimally and precisely as possible):
Remove csi-vsphere.conf as a mounted file to csi vsphere node ds.

Anything else we need to know?:

Environment:

csi-vsphere version:
vsphere-cloud-controller-manager version:
Kubernetes version:
vSphere version:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

Failed to create volume. Error: CnsFault error: CNS: Failed to create disk.:Fault cause: vmodl.fault.NotSupported

Can anyone help me , I'm getting this error when I want to create pvc

kubectl describe sc
Name: vsphere-sc
IsDefaultClass: Yes
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"vsphere-sc"},"parameters":{"storagepolicyname":"Space-Efficient"},"provisioner":"csi.vsphere.vmware.com"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner: csi.vsphere.vmware.com
Parameters: storagepolicyname=Space-Efficient
AllowVolumeExpansion:
MountOptions:
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events:

kubectl describe pvc
Name: vsphere-disk
Namespace: default
StorageClass: vsphere-sc
Status: Pending
Volume:
Labels:
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"vsphere-disk","namespace":"default"},"spec":{"acces...
volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Events:
Type Reason Age From Message

Normal ExternalProvisioning 7s persistentvolume-controller waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
Normal Provisioning 3s (x3 over 7s) csi.vsphere.vmware.com_vsphere-csi-controller-0_ac8572f4-f7d7-11e9-837c-0a580ae94203 External provisioner is provisioning volume for claim "default/vsphere-disk"
Warning ProvisioningFailed 3s (x3 over 6s) csi.vsphere.vmware.com_vsphere-csi-controller-0_ac8572f4-f7d7-11e9-837c-0a580ae94203 failed to provision volume with StorageClass "vsphere-sc": rpc error: code = Internal desc = Failed to create volume. Error: CnsFault error: CNS: Failed to create disk.:Fault cause: vmodl.fault.NotSupported
Mounted By:

kubectl describe CSINode
Name: k8s-worker-0
Namespace:
Labels:
Annotations:
API Version: storage.k8s.io/v1beta1
Kind: CSINode
Metadata:
Creation Timestamp: 2019-10-26T10:01:57Z
Owner References:
API Version: v1
Kind: Node
Name: k8s-worker-0
UID: d0254cbb-f7d6-11e9-b9d2-0050568f2859
Resource Version: 1805
Self Link: /apis/storage.k8s.io/v1beta1/csinodes/k8s-worker-0
UID: a504f73b-f7d7-11e9-8aaf-0050568fe318
Spec:
Drivers:
Name: csi.vsphere.vmware.com
Node ID: k8s-worker-0
Topology Keys:
Events:

kubectl describe csidrivers
Name: csi.vsphere.vmware.com
Namespace:
Labels:
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"storage.k8s.io/v1beta1","kind":"CSIDriver","metadata":{"annotations":{},"name":"csi.vsphere.vmware.com"},"spec":{"attachReq...
API Version: storage.k8s.io/v1beta1
Kind: CSIDriver
Metadata:
Creation Timestamp: 2019-10-26T10:01:42Z
Resource Version: 1737
Self Link: /apis/storage.k8s.io/v1beta1/csidrivers/csi.vsphere.vmware.com
UID: 9c166a56-f7d7-11e9-8aaf-0050568fe318
Spec:
Attach Required: true
Pod Info On Mount: false
Events:

kubectl describe nodes | grep "ProviderID"
ProviderID: vsphere://420f28e3-01b2-983e-4caa-0813c8c98c82
ProviderID: vsphere://420fe8c4-766a-c2b3-e372-4fd463ab9a95
ProviderID: vsphere://420fa168-7aca-b02c-8348-7f1eb807ce1f
ProviderID: vsphere://420f73e9-b233-71a8-abc1-3a9dc508f2cd

uname -a
Linux k8s-master-0 4.4.0-165-generic #193-Ubuntu SMP Tue Sep 17 17:42:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial

Kubernetes 1.14.3
vSphere version 6.7.0.41000 Build Number 14836122
ESXi version 6.7.0, 10302608

In this case I've 1 Esxi and vSphere in one host

datastore migration / storage vmotion renamed volumes

/kind feature

What happened:

I have successfully deployed services using vsphere csi on a local datastore. I am planning on migrating everything to an iSCSI LUN. When migrating the first node to the iSCSI datastore using storage vmotion the volumes are renamed using the node name.

Now when a pod is restarted I get the following error because the volume path has changed

Warning  FailedAttachVolume  39s (x16 over 17m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-8a0d07c8-7989-11e9-837e-005056bb3b01" : File []/vmfs/volumes/5ccd2965-c10c950c-feba-94c691af21f8/kubevols/kubernetes-dynamic-pvc-8a0d07c8-7989-11e9-837e-005056bb3b01.vmdk was not found

What you expected to happen:

I understand that this is a feature of storage vmotion and not strictly a driver bug. Is there a procedure that will allow me to migrate my existing volumes to a new datastore? I have searched around as well as reading the docs and am unable to find anything specific to my situation.

I have previously backed up the volumes using velero (formely ark) so restoring may be an option if I am forced to recreate. This is my lab cluster so it is not critical for me. However I would like to clarify the functionality for future.

How to reproduce it (as minimally and precisely as possible):

create a pod which depends on a pvc using the vsphere storage class
migrate the nodes storage to a different datastore with vmotion (storage only)
restart the pod and the volume attach fails

Anything else we need to know?:

Environment:

csi-vsphere version: not sure how to check this though the cluster was built with rke v0.2.2
vsphere-cloud-controller-manager version: as above
Kubernetes version: v1.13.5
vSphere version: 6.7
OS (e.g. from /etc/os-release): Ubuntu 19.10
Kernel (e.g. uname -a): 4.18
Install tools: rke v0.2.2
Others:

Failed to create Persistent volume vmodl.fault.NotSupported

/kind bug

What happened:
Can't create persistent volume due to error
What you expected to happen:
Creation of Persistent Volume
How to reproduce it (as minimally and precisely as possible):
Create any volume based on default StorageClass

Anything else we need to know?:
Logs:
CreateVolume: VolumeName: "pvc-6ad079f1-a805-4080-80aa-2dffaaba605d", opId: "0bb24060" E1223 22:54:04.229271 1 manager.go:130] failed to create cns volume. createSpec: "(*types.CnsVolumeCreateSpec)(0xc0004d0c60)({\n DynamicData: (types.DynamicData) {\n },\n Name: (string) (len=40) \"pvc-6ad079f1-a805-4080-80aa-2dffaaba605d\",\n VolumeType: (string) (len=5) \"BLOCK\",\n Datastores: ([]types.ManagedObjectReference) (len=7 cap=8) {\n (types.ManagedObjectReference) Datastore:datastore-512,\n (types.ManagedObjectReference) Datastore:datastore-890,\n (types.ManagedObjectReference) Datastore:datastore-914,\n (types.ManagedObjectReference) Datastore:datastore-915,\n (types.ManagedObjectReference) Datastore:datastore-916,\n (types.ManagedObjectReference) Datastore:datastore-917,\n (types.ManagedObjectReference) Datastore:datastore-953\n },\n Metadata: (types.CnsVolumeMetadata) {\n DynamicData: (types.DynamicData) {\n },\n ContainerCluster: (types.CnsContainerCluster) {\n DynamicData: (types.DynamicData) {\n },\n ClusterType: (string) (len=10) \"KUBERNETES\",\n ClusterId: (string) (len=27) \"default/workload-cluster-01\",\n VSphereUser: (string) (len=27) \"VSPHERE.LOCAL\\\\Administrator\"\n },\n EntityMetadata: ([]types.BaseCnsEntityMetadata) <nil>\n },\n BackingObjectDetails: (*types.CnsBackingObjectDetails)(0xc000024b20)({\n DynamicData: (types.DynamicData) {\n },\n CapacityInMb: (int64) 1024\n }),\n Profile: ([]types.BaseVirtualMachineProfileSpec) (len=1 cap=1) {\n (*types.VirtualMachineDefinedProfileSpec)(0xc000427640)({\n VirtualMachineProfileSpec: (types.VirtualMachineProfileSpec) {\n DynamicData: (types.DynamicData) {\n }\n },\n ProfileId: (string) (len=36) \"538532cc-6549-4d03-8731-4beec2baf0b8\",\n ReplicationSpec: (*types.ReplicationSpec)(<nil>),\n ProfileData: (*types.VirtualMachineProfileRawData)(<nil>),\n ProfileParams: ([]types.KeyValue) <nil>\n })\n }\n})\n", fault: "(*types.CnsFault)(0xc0004e0f40)({\n Fault: (*types.BaseMethodFault)(0xc000428b00)(<nil>),\n LocalizedMessage: (string) (len=82) \"CnsFault error: CNS: Failed to create disk.:Fault cause: vmodl.fault.NotSupported\\n\"\n})\n", opId: "0bb24060" E1223 22:54:04.229515 1 vsphereutil.go:118] Failed to create disk pvc-6ad079f1-a805-4080-80aa-2dffaaba605d with error CnsFault error: CNS: Failed to create disk.:Fault cause: vmodl.fault.NotSupported

Environment:

csi-vsphere version: 1.0.1
vsphere-cloud-controller-manager version: 1.1.0
Kubernetes version: 1.16.3
vSphere version :6.7U3
OS (e.g. from /etc/os-release): Ubuntu 18.04 LTS
Kernel (e.g. uname -a):
Install tools:
Others:

Use Secret Lister versus Mounting Secrets

Is this a BUG REPORT or FEATURE REQUEST?:
/kind cleanup

Highly recommend using Secret Lister to both remove the delete secret permission currently required in order to get updates to secrets and also to minimize the potential security exposure by mounting secrets to the filesystem.

What happened:
Secrets are mounted as apart of the pod specs.

What you expected to happen:
Use client-go to listen for updates to secrets.

Firewall settings for CSI

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened:

We have a firewall between kubernetes cluster and vsphere API (vcenter). I was trying to find in the documentation which connections do we need to allow - if all masters -> vsphere API is enough or if also nodes need to access it.

Could you please clarify this in the docs?

err: node "NODENAME" has no NodeID annotation

I followed the document - https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/docs/deploying_csi_vsphere_with_rbac.md

Storageclass and PVC are deployed without any problems.
But when I deploy pod
I am getting the error as below

describe pod

Events:
  Type     Reason              Age                From                     Message
  ----     ------              ----               ----                     -------
  Warning  FailedScheduling    91s (x6 over 93s)  default-scheduler        pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
  Normal   Scheduled           91s                default-scheduler        Successfully assigned default/web2-0 to worker01
  Warning  FailedAttachVolume  83s (x5 over 91s)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-47b8628b-762f-11e9-bc24-000c292efb20" : node "worker01" has no NodeID annotation

csi-attacher logs

I0514 09:53:47.407461       1 main.go:152] CSI driver supports ControllerPublishUnpublish, using real CSI handler
I0514 09:53:47.407619       1 controller.go:111] Starting CSI attacher
I0514 09:53:47.507809       1 controller.go:203] Started PV processing "pvc-5bc33246-7623-11e9-bc24-000c292efb20"
I0514 09:53:47.507854       1 csi_handler.go:418] CSIHandler: processing PV "pvc-5bc33246-7623-11e9-bc24-000c292efb20"
I0514 09:53:47.507865       1 csi_handler.go:422] CSIHandler: processing PV "pvc-5bc33246-7623-11e9-bc24-000c292efb20": no deletion timestamp, ignoring
I0514 09:53:47.507811       1 controller.go:173] Started VA processing "csi-110f4283c16ebd61324b6671f9b677628d14075d94e4eded580cafaa57221262"
I0514 09:53:47.507882       1 csi_handler.go:93] CSIHandler: processing VA "csi-110f4283c16ebd61324b6671f9b677628d14075d94e4eded580cafaa57221262"
I0514 09:53:47.507887       1 csi_handler.go:120] Attaching "csi-110f4283c16ebd61324b6671f9b677628d14075d94e4eded580cafaa57221262"
I0514 09:53:47.507896       1 csi_handler.go:259] Starting attach operation for "csi-110f4283c16ebd61324b6671f9b677628d14075d94e4eded580cafaa57221262"
I0514 09:53:47.507966       1 csi_handler.go:214] PV finalizer is already set on "pvc-5bc33246-7623-11e9-bc24-000c292efb20"
I0514 09:53:47.508756       1 csi_handler.go:524] Can't get CSINodeInfo worker01: the server could not find the requested resource (get csinodeinfos.csi.storage.k8s.io worker01)
I0514 09:53:47.508773       1 csi_handler.go:388] Saving attach error to "csi-110f4283c16ebd61324b6671f9b677628d14075d94e4eded580cafaa57221262"
I0514 09:53:47.515029       1 csi_handler.go:398] Saved attach error to "csi-110f4283c16ebd61324b6671f9b677628d14075d94e4eded580cafaa57221262"
I0514 09:53:47.515048       1 csi_handler.go:103] Error processing "csi-110f4283c16ebd61324b6671f9b677628d14075d94e4eded580cafaa57221262": failed to attach: node "worker01" has no NodeID annotation
I0514 09:53:47.515316       1 controller.go:139] Ignoring VolumeAttachment "csi-110f4283c16ebd61324b6671f9b677628d14075d94e4eded580cafaa57221262" change

then I checked annotations on nodes but annotations for CSI are nonexistent as below, even though I added all options(--allow-privileged=true --feature-gates=KubeletPluginsWatcher=true,CSINodeInfo=true,CSIDriverRegistry=true) to kubelet and apiserver

      kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
      node.alpha.kubernetes.io/ttl: "0"
      volumes.kubernetes.io/controller-managed-attach-detach: "true"

Anything else we need to know?:

Environment:

csi-provisioner: v1.0.1
vsphere-csi: v0.1.1
csi-attacher: v1.0.1
Kubernetes version: v1.13.4
vSphere version: 6.5
OS (e.g. from /etc/os-release): Centos 7.4
Kernel (e.g. uname -a): 3.10.0-693.e17
Install tools: kubeADM
Others:

csi-controller SIGSEGV

/kind bug

What happened:
We're trying to setup the VSphere CPI/CSI following this doc

The csi-controller daemonset is in a CrashLoopBackOff state. The liveness check fails, because the csi-controller container cannot start due to a SIGSEGV.

kubectl logs -n kube-system vsphere-csi-controller-0 vsphere-csi-controller 
I1206 07:39:54.069404       1 config.go:261] GetCnsconfig called with cfgPath: /etc/cloud/csi-vsphere.conf
I1206 07:39:54.069647       1 config.go:206] Initializing vc server 160.98.220.50
I1206 07:39:54.069698       1 controller.go:67] Initializing CNS controller
I1206 07:39:54.069734       1 virtualcentermanager.go:63] Initializing defaultVirtualCenterManager...
I1206 07:39:54.069758       1 virtualcentermanager.go:65] Successfully initialized defaultVirtualCenterManager
I1206 07:39:54.069791       1 virtualcentermanager.go:107] Successfully registered VC "160.98.220.50"
I1206 07:39:54.069819       1 manager.go:60] Initializing volume.volumeManager...
I1206 07:39:54.069842       1 manager.go:64] volume.volumeManager initialized
time="2019-12-06T07:40:15Z" level=info msg="received signal; shutting down" signal=terminated
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x128 pc=0x867dc7]

goroutine 24 [running]:
google.golang.org/grpc.(*Server).GracefulStop(0x0)
        /go/pkg/mod/google.golang.org/[email protected]/server.go:1393 +0x37
github.com/rexray/gocsi.(*StoragePlugin).GracefulStop.func1()
        /go/pkg/mod/github.com/rexray/[email protected]/gocsi.go:333 +0x35
sync.(*Once).Do(0xc0002e080c, 0xc0003eeef8)
        /usr/local/go/src/sync/once.go:44 +0xb3
github.com/rexray/gocsi.(*StoragePlugin).GracefulStop(0xc0002e0780, 0x21183a0, 0xc0000ae010)
        /go/pkg/mod/github.com/rexray/[email protected]/gocsi.go:332 +0x56
github.com/rexray/gocsi.Run.func3()
        /go/pkg/mod/github.com/rexray/[email protected]/gocsi.go:121 +0x4e
github.com/rexray/gocsi.trapSignals.func1(0xc000437380, 0xc0004737a0, 0xc000473770)
        /go/pkg/mod/github.com/rexray/[email protected]/gocsi.go:502 +0x143
created by github.com/rexray/gocsi.trapSignals
        /go/pkg/mod/github.com/rexray/[email protected]/gocsi.go:487 +0x107

vsphere-csi-controller-0

kubectl describe pod -n kube-system vsphere-csi-controller-0 
Name:         vsphere-csi-controller-0
Namespace:    kube-system
Priority:     0
Node:         node1/160.98.236.80
Start Time:   Fri, 06 Dec 2019 07:31:50 +0000
Labels:       app=vsphere-csi-controller
              controller-revision-hash=vsphere-csi-controller-78bb4df5f7
              role=vsphere-csi
              statefulset.kubernetes.io/pod-name=vsphere-csi-controller-0
Annotations:  <none>
Status:       Running
IP:           10.245.0.5
IPs:
  IP:           10.245.0.5
Controlled By:  StatefulSet/vsphere-csi-controller
Containers:
  csi-attacher:
    Container ID:  docker://286d6b04f3cf300256681855bcd1f98903cb01d8c2da627b5952f1c642c34dae
    Image:         quay.io/k8scsi/csi-attacher:v1.1.1
    Image ID:      docker-pullable://quay.io/k8scsi/csi-attacher@sha256:e4db94969e1d463807162a1115192ed70d632a61fbeb3bdc97b40fe9ce78c831
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=4
      --timeout=300s
      --csi-address=$(ADDRESS)
    State:          Running
      Started:      Fri, 06 Dec 2019 07:31:51 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      ADDRESS:  /csi/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
  vsphere-csi-controller:
    Container ID:  docker://2b79e2f2a63ede8245da723f90c4f9b6e4cabdcb204d3843be01c1c3f1ec8bbf
    Image:         gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1
    Image ID:      docker-pullable://gcr.io/cloud-provider-vsphere/csi/release/driver@sha256:fae6806f5423a0099cdf60cf53cff474b228ee4846a242d025e4833a66f91b3f
    Port:          9808/TCP
    Host Port:     0/TCP
    Args:
      --v=4
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 06 Dec 2019 07:36:47 +0000
      Finished:     Fri, 06 Dec 2019 07:37:10 +0000
    Ready:          False
    Restart Count:  7
    Liveness:       http-get http://:healthz/healthz delay=10s timeout=3s period=5s #success=1 #failure=3
    Environment:
      CSI_ENDPOINT:        unix:///var/lib/csi/sockets/pluginproxy/csi.sock
      X_CSI_MODE:          controller
      VSPHERE_CSI_CONFIG:  /etc/cloud/csi-vsphere.conf
    Mounts:
      /etc/cloud from vsphere-config-volume (ro)
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
  liveness-probe:
    Container ID:  docker://dd27a57a7e3d3d9350685c6505498237dc6fc101e8a7dcf6af35b0cd99ce7d92
    Image:         quay.io/k8scsi/livenessprobe:v1.1.0
    Image ID:      docker-pullable://quay.io/k8scsi/livenessprobe@sha256:dde617756e0f602adc566ab71fd885f1dad451ad3fb063ac991c95a2ff47aea5
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=$(ADDRESS)
    State:          Running
      Started:      Fri, 06 Dec 2019 07:31:53 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
  vsphere-syncer:
    Container ID:  docker://cea6e98a429f7deb145ef885ddf3238a23d6eeb595e164107c3ddf75f3b9341a
    Image:         gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1
    Image ID:      docker-pullable://gcr.io/cloud-provider-vsphere/csi/release/syncer@sha256:fc80ec77a2ab4b58ddfa259a938f6d741933566011d56e5ffcc8680cc83538fe
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=2
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 06 Dec 2019 07:37:12 +0000
      Finished:     Fri, 06 Dec 2019 07:37:42 +0000
    Ready:          False
    Restart Count:  5
    Environment:
      FULL_SYNC_INTERVAL_MINUTES:  30
      VSPHERE_CSI_CONFIG:          /etc/cloud/csi-vsphere.conf
    Mounts:
      /etc/cloud from vsphere-config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
  csi-provisioner:
    Container ID:  docker://c3f788e2030f386d71252b2c63b50e1992f46eef2a4e0675cf856997d12dde2e
    Image:         quay.io/k8scsi/csi-provisioner:v1.2.2
    Image ID:      docker-pullable://quay.io/k8scsi/csi-provisioner@sha256:e3239de37c06d2bcd0e9e9648fe9a8b418d5caf9e89f243c649ff2394d3cbfef
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=4
      --timeout=300s
      --csi-address=$(ADDRESS)
      --feature-gates=Topology=true
      --strict-topology
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Message:      Lost connection to CSI driver, exiting
      Exit Code:    255
      Started:      Fri, 06 Dec 2019 07:36:47 +0000
      Finished:     Fri, 06 Dec 2019 07:37:08 +0000
    Ready:          False
    Restart Count:  5
    Environment:
      ADDRESS:  /csi/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from vsphere-csi-controller-token-h9bqc (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  vsphere-config-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  vsphere-config-secret
    Optional:    false
  socket-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com
    HostPathType:  DirectoryOrCreate
  vsphere-csi-controller-token-h9bqc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  vsphere-csi-controller-token-h9bqc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  7m17s                  default-scheduler  Successfully assigned kube-system/vsphere-csi-controller-0 to node1
  Normal   Pulled     7m15s                  kubelet, node1     Container image "quay.io/k8scsi/csi-attacher:v1.1.1" already present on machine
  Normal   Created    7m15s                  kubelet, node1     Created container csi-attacher
  Normal   Started    7m15s                  kubelet, node1     Started container csi-attacher
  Normal   Pulled     7m13s                  kubelet, node1     Container image "quay.io/k8scsi/livenessprobe:v1.1.0" already present on machine
  Normal   Pulling    7m13s                  kubelet, node1     Pulling image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1"
  Normal   Created    7m13s                  kubelet, node1     Created container liveness-probe
  Normal   Started    7m13s                  kubelet, node1     Started container liveness-probe
  Normal   Started    7m12s                  kubelet, node1     Started container vsphere-syncer
  Normal   Pulled     7m12s                  kubelet, node1     Container image "quay.io/k8scsi/csi-provisioner:v1.2.2" already present on machine
  Normal   Pulled     7m12s                  kubelet, node1     Successfully pulled image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1"
  Normal   Created    7m12s                  kubelet, node1     Created container vsphere-syncer
  Normal   Created    7m11s                  kubelet, node1     Created container csi-provisioner
  Normal   Started    7m11s                  kubelet, node1     Started container csi-provisioner
  Normal   Pulling    6m51s (x2 over 7m15s)  kubelet, node1     Pulling image "gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1"
  Normal   Killing    6m51s                  kubelet, node1     Container vsphere-csi-controller failed liveness probe, will be restarted
  Normal   Started    6m50s (x2 over 7m13s)  kubelet, node1     Started container vsphere-csi-controller
  Normal   Created    6m50s (x2 over 7m14s)  kubelet, node1     Created container vsphere-csi-controller
  Normal   Pulled     6m50s (x2 over 7m14s)  kubelet, node1     Successfully pulled image "gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1"
  Warning  Unhealthy  2m6s (x22 over 7m1s)   kubelet, node1     Liveness probe failed: Get http://10.245.0.5:9808/healthz: dial tcp 10.245.0.5:9808: connect: connection refused

What you expected to happen:
The daemonset should start successfully
How to reproduce it (as minimally and precisely as possible):
follow the documentation in the link up-there

Anything else we need to know?:

Environment:

vsphere-cloud-controller-manager version: 1.0.1
OS (e.g. from /etc/os-release): Ubuntu 18.04.3 LTS, Kubernetes v1.16.3
Kernel (e.g. uname -a): 4.15.0-72-generic
Install tools: kubeadm
Others: vCenter Appliance 6.7 Update 3 (6.7.0.40000)

Invalid patch version value

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
I followed the instructions on this page to install the driver on my kubernetes cluster.

When i apply the vSphere CSI driver configuration two containers in the vsphere-csi-controller-0 pod don't come up:
vsphere-csi-controller
csi-provisioner

This is what I see when I inspect the container logs:

kube@kube-master01:~/kubeManifests/storage$ kubectl logs --namespace=kube-system vsphere-csi-controller-0 csi-provisioner
I0905 15:24:01.982184       1 feature_gate.go:226] feature gates: &{map[Topology:true]}
I0905 15:24:01.982359       1 csi-provisioner.go:98] Version: v1.2.1-0-g971feacb
I0905 15:24:01.982406       1 csi-provisioner.go:112] Building kube configs for running in cluster...
E0905 15:24:01.994364       1 csi-provisioner.go:137] OnConnectionLoss callback only supported for unix:// addresses


kube@kube-master01:~/kubeManifests/storage$ kubectl logs --namespace=kube-system vsphere-csi-controller-0 vsphere-csi-controller
I0905 15:24:00.150395       1 config.go:265] GetCnsconfig called with cfgPath: /etc/cloud/csi-vsphere.conf
I0905 15:24:00.150597       1 config.go:209] Initializing vc server 10.3.1.52
I0905 15:24:00.150609       1 controller.go:66] Initializing CNS controller
I0905 15:24:00.150658       1 virtualcentermanager.go:61] Initializing defaultVirtualCenterManager...
I0905 15:24:00.150666       1 virtualcentermanager.go:63] Successfully initialized defaultVirtualCenterManager
I0905 15:24:00.150673       1 virtualcentermanager.go:105] Successfully registered VC "10.3.1.52"
I0905 15:24:00.150683       1 manager.go:55] Initializing volume.defaultManager...
I0905 15:24:00.150686       1 manager.go:59] volume.defaultManager initialized
I0905 15:24:00.179278       1 virtualcenter.go:130] New session ID for 'root' = 52658661-45de-92dc-33e3-515260399620
E0905 15:24:00.179295       1 controller.go:96] checkAPI failed for vcenter API version: 6.7.2, err=Invalid patch version value
time="2019-09-05T15:24:00Z" level=error msg="Failed to init controller" error="Invalid patch version value"
time="2019-09-05T15:24:00Z" level=info msg="configured: csi.vsphere.vmware.com" controllerType=VANILLA mode=controller
time="2019-09-05T15:24:00Z" level=info msg="removed sock file" path=/var/lib/csi/sockets/pluginproxy/csi.sock
time="2019-09-05T15:24:00Z" level=fatal msg="grpc failed" error="Invalid patch version value"

I'm confused because version 6.7.0 is running so don't know where 6.7.2 is coming from.

What you expected to happen:
Everything to come up correctly without issue

How to reproduce it (as minimally and precisely as possible):
follow install steps in guide

Anything else we need to know?:

Environment:

csi-vsphere version:
vsphere-cloud-controller-manager version: 6.7.0
Kubernetes version: 1.15.3
vSphere version:
OS (e.g. from /etc/os-release): Ubuntu 18.04.2 LTS
Kernel (e.g. uname -a): Linux kube-master01 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
Others:

Attached PVC vmdks get deleted when a node is removed

/kind bug

This is in reference to a closed issue on the older vsphere in-tree cloud provider. That issue was closed with a recommendation to re-file the issue, I believe here.

The core issue is that when a node gets removed, apparently sometimes by the provider, but also in vSphere itself, the attached VMDKs get deleted, causing a loss of data. Per our conversation at KubeCon, I'm filing this so we can hopefully come up with a good solution to help prevent data loss scenarios.

Note: I'm not yet experiencing this bug as I'm blocked by another issue but I'll try and replicate it more concretely when I get to that point, perhaps @RaceFPV from the original issue can comment more if they're still experiencing this situation.

Current provisioning RBAC does not support volume creation requirements for 1.14

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
When applying a PVC request provisioning the log file in the csi-provisioner pod displays permissions error that the service account is forbidden from viewing the csnodes

log entries showing issue:
I0509 04:16:54.362365 1 controller.go:1196] provision "default/data-1" class "csi-vsphere": started
I0509 04:16:54.375655 1 event.go:209] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-1", UID:"d7e192f4-7210-11e9-b8f4-00505680214b", APIVersion:"v1", ResourceVersion:"28701", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/data-1"
I0509 04:16:54.380026 1 controller.go:979] Final error received, removing PVC d7e192f4-7210-11e9-b8f4-00505680214b from claims in progress
W0509 04:16:54.380077 1 controller.go:886] Retrying syncing claim "d7e192f4-7210-11e9-b8f4-00505680214b", failure 4
E0509 04:16:54.380297 1 controller.go:908] error syncing claim "d7e192f4-7210-11e9-b8f4-00505680214b": failed to provision volume with StorageClass "csi-vsphere": error generating accessibility requirements: error listing CSINodes: csinodes.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:vsphere-csi-provisioner" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope
I0509 04:16:54.380371 1 event.go:209] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-1", UID:"d7e192f4-7210-11e9-b8f4-00505680214b", APIVersion:"v1", ResourceVersion:"28701", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "csi-vsphere": error generating accessibility requirements: error listing CSINodes: csinodes.storage.k8s.io is forbidden: User "system:serviceaccount:kube-system:vsphere-csi-provisioner" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope

What you expected to happen:
nodes identified, volume created, etc

How to reproduce it (as minimally and precisely as possible):
kubeadm deployment of k8s 1.14.1 and apply latest updates as per the manifest updated on 9th May, 2019 following deployment guide in this repo

Anything else we need to know?:
with the release of 1.14.1 the csnode object is a native object and not a CRD as per the prior releases. As such permission to view the object needs to be granted as part of the storage.k8s.io API group

Environment:

csi-vsphere version: latest release 9th may
vsphere-cloud-controller-manager version: running 1.14.1 so excluded
Kubernetes version: 1.14.1
vSphere version: 6.7U2
OS (e.g. from /etc/os-release): ubuntu 18.04
Kernel (e.g. uname -a): 4.15.0-48-generic
Install tools: kubeadm
Others:

Lable updates on statically provisioned PV is not getting updated in the CNS

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:
/kind bug

What happened:

Lable updates on statically provisioned PV is not getting updated in the CNS.

Issue is upon each label update on statically created PV, instead of calling updatevolumemetadata, syncer is calling CreateVolume API.

https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/pkg/syncer/metadatasyncer.go#L334

Refer following code block.

        if oldPv.Status.Phase == v1.VolumeAvailable || newPv.Spec.StorageClassName != "" {
  updateSpec := &cnstypes.CnsVolumeMetadataUpdateSpec{
   VolumeId: cnstypes.CnsVolumeId{
    Id: newPv.Spec.CSI.VolumeHandle,
   },
   Metadata: cnstypes.CnsVolumeMetadata{
    ContainerCluster: cnsvsphere.GetContainerCluster(metadataSyncer.cfg.Global.ClusterID, metadataSyncer.cfg.VirtualCenter[metadataSyncer.vcenter.Config.Host].User),
    EntityMetadata:   metadataList,
   },
  }

  klog.V(4).Infof("PVUpdated: Calling UpdateVolumeMetadata for volume %s with updateSpec: %+v", updateSpec.VolumeId.Id, spew.Sdump(updateSpec))
  if err := volumes.GetManager(metadataSyncer.vcenter).UpdateVolumeMetadata(updateSpec); err != nil {
   klog.Errorf("PVUpdated: UpdateVolumeMetadata failed with err %v", err)
  }
 } else {
  createSpec := &cnstypes.CnsVolumeCreateSpec{
   Name:       oldPv.Name,
   VolumeType: common.BlockVolumeType,
   Metadata: cnstypes.CnsVolumeMetadata{
    ContainerCluster: cnsvsphere.GetContainerCluster(metadataSyncer.cfg.Global.ClusterID, metadataSyncer.cfg.VirtualCenter[metadataSyncer.vcenter.Config.Host].User),
    EntityMetadata:   metadataList,
   },
   BackingObjectDetails: &cnstypes.CnsBlockBackingDetails{
    CnsBackingObjectDetails: cnstypes.CnsBackingObjectDetails{},
    BackingDiskId:           oldPv.Spec.CSI.VolumeHandle,
   },
  }
  volumeOperationsLock.Lock()
  defer volumeOperationsLock.Unlock()
  klog.V(4).Infof("PVUpdated: vSphere provisioner creating volume %s with create spec %+v", oldPv.Name, spew.Sdump(createSpec))
  _, err := volumes.GetManager(metadataSyncer.vcenter).CreateVolume(createSpec)

  if err != nil {
   klog.Errorf("PVUpdated: Failed to create disk %s with error %+v", oldPv.Name, err)
  }
 }

What you expected to happen:

Correct the logic, and make a call to update volume metadata APIs.

How to reproduce it (as minimally and precisely as possible):
Create a PV statically and attempt to update labels on the PV.

Anything else we need to know?:

Environment:

csi-vsphere version:
vsphere-cloud-controller-manager version:
Kubernetes version:
vSphere version:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

Test if driver supports inline volumes in K8s 1.15

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

With support for inline CSI volumes coming in K8s 1.15 (courtesy of @vladimirvivien), we need to test the CSI driver against that feature to see if works or if any changes are necessary.

Wrong images posted in README

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

Images posted in the README don't match the ones specified in the manifests we publish for deploying vsphere-csi-driver.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

csi-vsphere version:
vsphere-cloud-controller-manager version:
Kubernetes version:
vSphere version:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

Failed to verify volume attachment

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Volume attachment succeeds, but verification fails. Disk is created, but no uuid is associated to it.

What you expected to happen:
Volume gets mounted to Pod.

How to reproduce it (as minimally and precisely as possible):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-vsphere
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: vsphere
---
kind: Pod
apiVersion: v1
metadata:
  name: vsphere-app
spec:
  containers:
  - image: centos
    name: vsphere-app
    command: ["/bin/sh"]
    args: ["-c", "while true; do echo $(date -u) >> /data/out.txt; sleep 5; done"]
    volumeMounts:
    - name: persistent-storage
      mountPath: "/data"
  restartPolicy: Never
  volumes:
  - name: persistent-storage
    persistentVolumeClaim:
      claimName: pvc-vsphere

Anything else we need to know?:

$ kubectl -n kube-system logs vsphere-csi-node-dqmz6 vsphere-csi-node
[...]
I1029 10:38:58.697826       1 node.go:61] Checking if volume: 5dce72a7-d64f-4998-ad66-9d2c08b29218 with diskID: 6000c29777f036e89335428498f7d13f is attached
E1029 10:38:58.698349       1 node.go:64] Failed to verify volume attachment. Error: rpc error: code = NotFound desc = disk: 6000c29777f036e89335428498f7d13f not attached to node

dmesg output shows /dev/sdc is added:

[Tue Oct 29 12:06:06 2019] vmw_pvscsi: msg type: 0x0 - MSG RING: 9/8 (5)
[Tue Oct 29 12:06:06 2019] vmw_pvscsi: msg: device added at scsi0:0:0
[Tue Oct 29 12:06:06 2019] scsi 33:0:0:0: Direct-Access     VMware   Virtual disk     2.0  PQ: 0 ANSI: 6
[Tue Oct 29 12:06:06 2019] sd 33:0:0:0: Attached scsi generic sg3 type 0
[Tue Oct 29 12:06:06 2019] sd 33:0:0:0: [sdc] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
[Tue Oct 29 12:06:06 2019] sd 33:0:0:0: [sdc] Write Protect is off
[Tue Oct 29 12:06:06 2019] sd 33:0:0:0: [sdc] Mode Sense: 61 00 00 00
[Tue Oct 29 12:06:06 2019] sd 33:0:0:0: [sdc] Cache data unavailable
[Tue Oct 29 12:06:06 2019] sd 33:0:0:0: [sdc] Assuming drive cache: write through
[Tue Oct 29 12:06:06 2019] sd 33:0:0:0: [sdc] Attached SCSI disk

however blkid does not show that uuid is associated with the drive.

$ blkid
/dev/sda2: UUID="d27dd092-e59e-41fb-9d24-d12dac41cd34" TYPE="ext4" PARTUUID="cdfe7e56-1ef6-4f40-893e-518520986ff3"
/dev/sda3: UUID="yQVTll-Z3d6-zOT7-3cta-v4c6-T63t-hdE5WF" TYPE="LVM2_member" PARTUUID="e2d8ecd3-90f0-4b43-b00c-1e7860eebc83"
/dev/mapper/ubuntu--vg-ubuntu--lv: UUID="052afc94-eae9-45d8-b4c3-2606432fb469" TYPE="ext4"
/dev/sdb: UUID="YhMR4y-7Uij-eZr9-K7bf-1mOq-nhR2-WpQKpl" TYPE="LVM2_member"
/dev/loop0: TYPE="squashfs"
/dev/loop2: TYPE="squashfs"
/dev/sda1: PARTUUID="8c2eb04d-3e2e-4f4e-8a93-2ecebcd2aa7e"

Environment:

csi-vsphere version: v1.0.1-4-g20e40e1
vsphere-cloud-controller-manager version: v1.0.0-16-g5514104
Kubernetes version: v1.16.0
vSphere version: 6.7.0.40000
OS (e.g. from /etc/os-release): Ubuntu 18.04.3 LTS (Bionic Beaver)
Kernel (e.g. uname -a): Linux pke-cloud-vmware01 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Install tools: kubeadm
Others: CSI driver is set up based on: https://docs.vmware.com/en/VMware-vSphere/6.7/Cloud-Native-Storage/GUID-039425C1-597F-46FF-8BAA-C5A46FF10E63.html hostNetwork: true is removed from yaml.

attacher.MountDevice failed: rpc error: code = NotFound

/kind bug

What happened:
Installed vsphere-csi-driver in the cluster 1.15.
The disk was ordered and everything is fine, but the disk cannot be mounted. In kubelet logs, this is the error:

Jul 11 20:09:04 prod-kube-node-3 kubelet[1172944]: I0711 20:09:04.883814 1172944 csi_client.go:800] kubernetes.io/csi: creating new gRPC connection for [unix:///var/lib/kubelet/plugins_registry/vsphere.csi.vmware.com/csi.sock]
Jul 11 20:09:04 prod-kube-node-3 kubelet[1172944]: I0711 20:09:04.883844 1172944 clientconn.go:440] parsed scheme: ""
Jul 11 20:09:04 prod-kube-node-3 kubelet[1172944]: I0711 20:09:04.883858 1172944 clientconn.go:440] scheme "" not registered, fallback to default scheme
Jul 11 20:09:04 prod-kube-node-3 kubelet[1172944]: I0711 20:09:04.883935 1172944 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{/var/lib/kubelet/plugins_registry/vsphere.csi.vmware.com/csi.sock 0  <nil>}]
Jul 11 20:09:04 prod-kube-node-3 kubelet[1172944]: I0711 20:09:04.883952 1172944 clientconn.go:796] ClientConn switching balancer to "pick_first"
Jul 11 20:09:04 prod-kube-node-3 kubelet[1172944]: I0711 20:09:04.884021 1172944 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc000bd3b00, CONNECTING
Jul 11 20:09:04 prod-kube-node-3 kubelet[1172944]: I0711 20:09:04.884250 1172944 balancer_conn_wrappers.go:131] pickfirstBalancer: HandleSubConnStateChange: 0xc000bd3b00, READY
Jul 11 20:09:04 prod-kube-node-3 kubelet[1172944]: E0711 20:09:04.885622 1172944 csi_attacher.go:337] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = NotFound desc = disk: 6000c2917fad296049d3923d725e773e not attached to node

Such a disk is in the controller logs:

kubectl -n kube-system logs vsphere-csi-driver-controller-0 -c controller | grep 6000c2917fad296049d3923d725e773e
time="2019-07-11T17:05:15Z" level=info msg="AttachDisk([3par_4_Lun101] fcd/f15d121856e14273907c756a4cf925ad.vmdk) succeeded with: VolID=6fbf477f-a670-4080-917c-897efef937b6 UUID=6000c2917fad296049d3923d725e773e"

I see that the disk was mounted on the node:

# lsblk | grep sdd
sdd                           8:48   0    8G  0 disk

I asked for advice at # wg-sig: https://kubernetes.slack.com/archives/C8EJ01Z46/p1562865408192700

But in another cluster I have no such problems.

Error in vsphere-csi-node:

time="2019-07-11T18:03:40Z" level=debug msg="/csi.v1.Node/NodeGetCapabilities: REQ 0225: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2019-07-11T18:03:40Z" level=debug msg="/csi.v1.Node/NodeGetCapabilities: REP 0225: Capabilities=[rpc:<type:STAGE_UNSTAGE_VOLUME > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2019-07-11T18:03:40Z" level=debug msg="/csi.v1.Node/NodeStageVolume: REQ 0226: VolumeId=6fbf477f-a670-4080-917c-897efef937b6, PublishContext=map[datacenter:X2 name:pvc-97db6e50-14b7-428f-8581-9c937068cc56 page83data:6000c2917fad296049d3923d725e773e parent_name:LUN101 parent_type:Datastore type:First Class Disk vcenter:vcenter.domain.com], StagingTargetPath=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-97db6e50-14b7-428f-8581-9c937068cc56/globalmount, VolumeCapability=mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > , VolumeContext=map[datacenter:X2 name:pvc-97db6e50-14b7-428f-8581-9c937068cc56 parent_name:LUN101 parent_type:Datastore storage.kubernetes.io/csiProvisionerIdentity:1562863661502-8081-vsphere.csi.vmware.com type:First Class Disk vcenter:vcenter.example.com], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2019-07-11T18:03:40Z" level=debug msg="checking if volume is attached" diskID=6000c2917fad296049d3923d725e773e volID=6fbf477f-a670-4080-917c-897efef937b6
time="2019-07-11T18:03:40Z" level=debug msg="/csi.v1.Node/NodeStageVolume: REP 0226: rpc error: code = NotFound desc = disk: 6000c2917fad296049d3923d725e773e not attached to node"

# cat /var/lib/kubelet/pods/bd61aa8e-edf6-4fc9-be81-01db19e7aecd/volumes/kubernetes.io~csi/pvc-97db6e50-14b7-428f-8581-9c937068cc56/vol_data.json
{"attachmentID":"csi-8cb09e0b96e0373f8590ff57ddf7bacc0b85d33dc0b07c2f68498b8ac447e4ea","driverMode":"persistent","driverName":"vsphere.csi.vmware.com","nodeName":"prod-kube-node-3","specVolID":"pvc-97db6e50-14b7-428f-8581-9c937068cc56","volumeHandle":"6fbf477f-a670-4080-917c-897efef937b6"}

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

csi-vsphere version: 0.2.0
vsphere-cloud-controller-manager version: none
Kubernetes version: 1.15.0
vSphere version: 6.7.0.30000
OS (e.g. from /etc/os-release): Ubuntu 16.04.6 LTS
Kernel (e.g. uname -a): 4.18.0-20-generic
Install tools:
Others: Docker version: 18.06.3-ce

CSI cannot access to datastore

BUG REPORT:

Uncomment only one, leave it on its own line:

/kind bug

What happened:

vSphere CSI cannot create VMDKs.

What you expected to happen:

CSI, CPI and CNS are all now working.

How to reproduce it (as minimally and precisely as possible):

I've been working on CSI/CNS thing to make it work but it never access to datastores.
I follow all instructions in https://cloud-provider-vsphere.sigs.k8s.io/tutorials/kubernetes-on-vsphere-with-kubeadm.html step by step over and over again but I face with same problem everytime.

Anything else we need to know?:
My installations are so smooth, I don't face with any problems during installation but when I deploy a test application in order to see result, the volume can never be created.

ubuntu@kik-master1:~$ kubectl describe pvc
Name: mongodb-persistent-storage-claim-mongod-0
Namespace: default
StorageClass: mongodb-sc
Status: Pending
Volume:
Labels: environment=test
replicaset=MainRepSet
role=mongo
Annotations: volume.beta.kubernetes.io/storage-class: mongodb-sc
volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Events:
Type Reason Age From Message

Normal Provisioning 109s (x8 over 3m56s) csi.vsphere.vmware.com_kik-master1_2eb0be9e-1bfd-11ea-91ec-005056a6d84f External provisioner is provisioning volume for claim "default/mongodb-persistent-storage-claim-mongod-0"
Warning ProvisioningFailed 109s (x8 over 3m56s) csi.vsphere.vmware.com_kik-master1_2eb0be9e-1bfd-11ea-91ec-005056a6d84f failed to provision volume with StorageClass "mongodb-sc": rpc error: code = Internal desc = Failed to get shared datastores in kubernetes cluster. Error: Empty List of Node VMs returned from nodeManager
Normal ExternalProvisioning 11s (x16 over 3m56s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
Mounted By: mongod-0
ubuntu@kik-master1:~$

ubuntu@kik-master1:~$ kubectl describe pod mongod-0
Name: mongod-0
Namespace: default
Priority: 0
PriorityClassName:
Node:
Labels: controller-revision-hash=mongod-b4564d9dd
environment=test
replicaset=MainRepSet
role=mongo
statefulset.kubernetes.io/pod-name=mongod-0
Annotations:
Status: Pending
IP:
Controlled By: StatefulSet/mongod
Containers:
mongod-container:
Image: mongo:3.4
Port: 27017/TCP
Host Port: 0/TCP
Command:
numactl
--interleave=all
mongod
--bind_ip
0.0.0.0
--replSet
MainRepSet
--auth
--clusterAuthMode
keyFile
--keyFile
/etc/secrets-volume/internal-auth-mongodb-keyfile
--setParameter
authenticationMechanisms=SCRAM-SHA-1
Requests:
cpu: 200m
memory: 200Mi
Environment:
Mounts:
/data/db from mongodb-persistent-storage-claim (rw)
/etc/secrets-volume from secrets-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kr9br (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
mongodb-persistent-storage-claim:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mongodb-persistent-storage-claim-mongod-0
ReadOnly: false
secrets-volume:
Type: Secret (a volume populated by a Secret)
SecretName: shared-bootstrap-data
Optional: false
default-token-kr9br:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-kr9br
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message

Warning FailedScheduling 78s (x13 over 16m) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
ubuntu@kik-master1:~$

I am going to be very appreciated if someone can help me fix the problem.

Environment:
vSphere:
VMware ESXi, 6.7.0, 14320388
vSphere Client version 6.7.0.40000

Docker:
Docker version 18.06.0-ce, build 0ffa825

Kubernetes:
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T16:23:09Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.2", GitCommit:"66049e3b21efe110454d67df4fa62b08ea79a19b", GitTreeState:"clean", BuildDate:"2019-05-16T16:14:56Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
admissionregistration.k8s.io/v1beta1
apiextensions.k8s.io/v1beta1
apiregistration.k8s.io/v1
apiregistration.k8s.io/v1beta1
apps/v1
apps/v1beta1
apps/v1beta2
authentication.k8s.io/v1
authentication.k8s.io/v1beta1
authorization.k8s.io/v1
authorization.k8s.io/v1beta1
autoscaling/v1
autoscaling/v2beta1
autoscaling/v2beta2
batch/v1
batch/v1beta1
certificates.k8s.io/v1beta1
coordination.k8s.io/v1
coordination.k8s.io/v1beta1
events.k8s.io/v1beta1
extensions/v1beta1
networking.k8s.io/v1
networking.k8s.io/v1beta1
node.k8s.io/v1beta1
policy/v1beta1
rbac.authorization.k8s.io/v1
rbac.authorization.k8s.io/v1beta1
scheduling.k8s.io/v1
scheduling.k8s.io/v1beta1
storage.k8s.io/v1
storage.k8s.io/v1beta1
v1

OS:
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic

Add plugin tests using vcsim and KIND

Add tests that exercise basic functionality that can be supported with vcsim, by deploying vcsim into a KIND cluster and deploying the CSI plugin.

This issue was originally at kubernetes/cloud-provider-vsphere#74

csi controller fails to parse vcenter version 7.0.1.0 string

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Deployed a CAPV Cluster in a 7.0 vCenter, csi controller keeps crashing because it could not parse vCenter version string "7.0.1.0"

What you expected to happen:
csi controller is up and running

How to reproduce it (as minimally and precisely as possible):
Deploy a CAPV cluster in a 7.0 vSphere.

Anything else we need to know?:

E1217 23:39:03.047983       1 controller.go:97] checkAPI failed for vcenter API version: 7.0.1.0, err=Invalid API Version format
E1217 23:39:03.048003       1 service.go:107] Failed to init controller. Error: Invalid API Version format
I1217 23:39:03.048011       1 service.go:88] configured: csi.vsphere.vmware.com with map[mode:controller]
time="2019-12-17T23:39:03Z" level=info msg="removed sock file" path=/var/lib/csi/sockets/pluginproxy/csi.sock
time="2019-12-17T23:39:03Z" level=fatal msg="grpc failed" error="Invalid API Version format"

Environment:

csi-vsphere version:

  - containerID: containerd://50a092b267bc6984ffa5d129bb02bf42aa4ccaa9d43ef80e5866fe2c01bdca54
    image: gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1
    imageID: gcr.io/cloud-provider-vsphere/csi/release/driver@sha256:fae6806f5423a0099cdf60cf53cff474b228ee4846a242d025e4833a66f91b3f
    lastState:
      terminated:
        containerID: containerd://50a092b267bc6984ffa5d129bb02bf42aa4ccaa9d43ef80e5866fe2c01bdca54
        exitCode: 1
        finishedAt: "2019-12-18T00:03:54Z"
        reason: Error
        startedAt: "2019-12-18T00:03:54Z"
    name: vsphere-csi-controller
    ready: false
    restartCount: 10
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=vsphere-csi-controller
          pod=vsphere-csi-controller-0_kube-system(3012b517-5e15-4e5c-a5ad-817b0e075bca)
        reason: CrashLoopBackOff

vsphere-cloud-controller-manager version:

  containerStatuses:
  - containerID: containerd://8c7689bfb8f6219978b8dac2f7cc7834742a7c152ceb5e19eb76d60edb9dd2c0
    image: gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.0.0
    imageID: gcr.io/cloud-provider-vsphere/cpi/release/manager@sha256:933c605159021ab4423b076893aa0a2e1666d40643989672bea76cd930038d97
    lastState: {}
    name: vsphere-cloud-controller-manager
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2019-12-17T23:21:39Z"

Kubernetes version:

➜  cluster-api-provider-vsphere git:(master) kubectl --kubeconfig=/tmp/k version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.1", GitCommit:"d647ddbd755faf07169599a625faf302ffc34458", GitTreeState:"clean", BuildDate:"2019-10-02T17:01:15Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:09:08Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

vSphere version:

Name:         VMware vCenter Server
Vendor:       VMware, Inc.
Version:      7.0.0
Build:        31520061
OS type:      linux-x64
API type:     VirtualCenter
API version:  7.0.1.0
Product ID:   vpx
UUID:         6d96c06a-ce0e-4db0-a96a-414ac112514f

OS (e.g. from /etc/os-release):
https://storage.googleapis.com/capv-images/release/v1.16.2/ubuntu-1804-kube-v1.16.2.ova
Kernel (e.g. uname -a):
Install tools:
Others:

Don't redefine failure-domain labels from k8s.io/api/core/v1 package

/kind cleanup

As called out in https://github.com/kubernetes-sigs/vsphere-csi-driver/pull/52/files#r319657029, the CNS driver is redefining labels that are exported from k8s.io/api/core/v1.

Implement single watcher per process

Is this a BUG REPORT or FEATURE REQUEST?:
Implement a single watcher to be used for all created informers. What is currently implement is a new watcher is created per call to func NewInformer(client clientset.Interface) *InformerManager:

vsphere-csi-driver/pkg/kubernetes/informers.go

Line 34 in 99f9c17

func NewInformer(client clientset.Interface) *InformerManager {

Example implementation for a single watcher can be found here since the CPI and CSI projects shared common code at one point:
https://github.com/kubernetes/cloud-provider-vsphere/blob/master/pkg/common/kubernetes/informers.go

/kind feature

What happened:
Multiple watchers are generated

What you expected to happen:
Only generate a single watcher for each process.

How to reproduce it (as minimally and precisely as possible):
NA

Anything else we need to know?:
NA

Environment:

csi-vsphere version:
vsphere-cloud-controller-manager version:
Kubernetes version:
vSphere version:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

Helm Chart Recommendation

FEATURE REQUEST?:

Official Helm Chart for CSI driver would be great!

Uncomment only one, leave it on its own line:

/kind feature

vCenter/vpxd crash

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
vmware-csi crashed our vcenter (vpxd service) twice.

First time after installation when playing with the first pvc.
This led me to shut down the k8s cluster because I was not sure if it was really the culprit.
I then restarted all vCenter services.

Second crash this morning when I started the k8s cluster up again.

What you expected to happen:

vCenter not crashing

How to reproduce it (as minimally and precisely as possible):

Currently the vCenter keeps crashing as long as the cluster is running

Anything else we need to know?:

Environment:

csi-vsphere version: 0.2.0
vsphere-cloud-controller-manager version: 0.2.0
Kubernetes version: v1.14.2
vSphere version: 6.7.0.31000 / Build: 13643870
OS (e.g. from /etc/os-release): VERSION="18.04.2 LTS (Bionic Beaver)"
Kernel (e.g. uname -a): 4.15.0-50-generic #54-Ubuntu SMP Mon May 6 18:46:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
Others: vpxd crash log
vpxd-45955-13639312.txt

Explore replacing pkg/kubernetes with kubernetes package from CPI

/kind cleanup

The package at pkg/kubernetes was based off of https://github.com/kubernetes/cloud-provider-vsphere/tree/master/pkg/common/kubernetes. It's worth exploring why that code was copied, and whether or not it can be used directly instead. It is probably missing functionality, but can that functionality be added to the common package?

The code under k8s.io/cloud-provider-vsphere/pkg/common is explicitly meant to be imported and re-used.

RBAC Issue on k8s 1.14

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
Attempted to deploy the csi plugin following this guide. https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/docs/deploying_csi_vsphere_with_rbac.md

When deploying the nodes and CRD, I receive this error...

xadmin@fhckube01:~$ kubectl logs vsphere-csi-node-4srck --namespace=kube-system -c driver-registrar
I0425 16:27:02.963868       1 main.go:111] Version: v1.0.1-0-g27703026
I0425 16:27:02.963976       1 main.go:118] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0425 16:27:02.963993       1 connection.go:69] Connecting to /csi/csi.sock
I0425 16:27:02.964272       1 connection.go:96] Still trying, connection is CONNECTING
I0425 16:27:02.964571       1 connection.go:93] Connected
I0425 16:27:02.964588       1 main.go:126] Calling CSI driver to discover driver name.
I0425 16:27:02.964602       1 connection.go:137] GRPC call: /csi.v1.Identity/GetPluginInfo
I0425 16:27:02.964611       1 connection.go:138] GRPC request: {}
I0425 16:27:02.966230       1 connection.go:140] GRPC response: {"name":"io.k8s.cloud-provider-vsphere.vsphere","vendor_version":"v0.1.1"}
I0425 16:27:02.966907       1 connection.go:141] GRPC error: <nil>
I0425 16:27:02.966918       1 main.go:134] CSI driver name: "io.k8s.cloud-provider-vsphere.vsphere"
I0425 16:27:02.966929       1 main.go:138] Loading kubeconfig.
I0425 16:27:02.967192       1 node_register.go:55] Calling CSI driver to discover node ID.
I0425 16:27:02.967218       1 connection.go:137] GRPC call: /csi.v1.Node/NodeGetInfo
I0425 16:27:02.967229       1 connection.go:138] GRPC request: {}
I0425 16:27:02.970665       1 connection.go:140] GRPC response: {"node_id":"fhcklnode02"}
I0425 16:27:02.971244       1 connection.go:141] GRPC error: <nil>
I0425 16:27:02.971252       1 node_register.go:63] CSI driver node ID: "fhcklnode02"
I0425 16:27:02.971354       1 node_register.go:86] Starting Registration Server at: /registration/io.k8s.cloud-provider-vsphere.vsphere-reg.sock
I0425 16:27:02.971479       1 node_register.go:93] Registration Server started at: /registration/io.k8s.cloud-provider-vsphere.vsphere-reg.sock
I0425 16:27:03.290286       1 main.go:84] Received GetInfo call: &InfoRequest{}
I0425 16:27:05.756268       1 main.go:94] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:plugin registration failed with err: error updating CSINode object with CSI driver node info: error updating CSINode: timed out waiting for the condition; caused by: csinodes.storage.k8s.io "fhcklnode02" is forbidden: User "system:node:fhcklnode02" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope,}
E0425 16:27:05.756349       1 main.go:96] Registration process failed with error: plugin registration failed with err: error updating CSINode object with CSI driver node info: error updating CSINode: timed out waiting for the condition; caused by: csinodes.storage.k8s.io "fhcklnode02" is forbidden: User "system:node:fhcklnode02" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope, restarting registration container.

What you expected to happen:
All pods should be reporting ready. But instead we receive crash loop back off.

How to reproduce it (as minimally and precisely as possible):
Follow this guide.
https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/docs/deploying_csi_vsphere_with_rbac.md

Anything else we need to know?:
I originally had the csi plugin deployed on 1.13 and it was working fine. Upgrading to 1.14 broke it. Attempting to redeploy has only resulted in the above error.

Environment:

csi-vsphere version: master or latest?
vsphere-cloud-controller-manager version: unsure where to find this?
Kubernetes version: 1.14.1
vSphere version: 6.7.0 build 11675023
OS (e.g. from /etc/os-release): Ubuntu 18.04.2 LTS bionic
Kernel (e.g. uname -a): Linux fhckube01 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Install tools:
Others:

Update the README file with better documentation

Filing this bug to revamp vSphere CSI driver documentation. Few improvements/sections I would like to see are:

Call out latest stable images and corresponding deployment yaml files.
CSI version compatibility
Kubernetes version compatibility
Features supported and the stage (Alpha, Beta, GA etc)
Example yamls to try out features like static provisioning, dynamic provisioning, using SPBM policy in storage class, volume topology etc
Driver development/contribution instructions.

Windows CSI Proxy

Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature

Some folks at sig-windows are currently implementing a privileged proxy to enable csi-driver support on windows server nodes. See https://github.com/kubernetes/enhancements/blob/master/keps/sig-windows/20190714-windows-csi-support.md and https://github.com/kubernetes-csi/csi-proxy and https://kubernetes.slack.com/archives/CN5JCCW31 for details. The alpha csi-proxy release is expected roughly around the k8s 1.18 release.

The vpshere-csi-driver could be ported to work with the windows csi-proxy to support container storage operations on windows server nodes. The in-tree VCP supports vsphere volumes so this would bring the out-of-tree csi back into alignment with the in-tree functionality.

Thoughts?

Environment:

csi-vsphere version:
vsphere-cloud-controller-manager version:
Kubernetes version:
vSphere version:
OS (e.g. from /etc/os-release): Windows 1809+
Kernel (e.g. uname -a):
Install tools:
Others:

kubernetes-sigs / vsphere-csi-driver Goto Github PK

vsphere-csi-driver's Introduction

Container Storage Interface (CSI) driver for vSphere

Documentation

vSphere CSI Driver Releases

Contributing

Contact

vsphere-csi-driver's People

Contributors

Stargazers

Watchers

Forkers

vsphere-csi-driver's Issues

Error

Anything else we need to know?: My installations are so smooth, I don't face with any problems during installation but when I deploy a test application in order to see result, the volume can never be created.

Recommend Projects

Recommend Topics

Recommend Org

Anything else we need to know?:
My installations are so smooth, I don't face with any problems during installation but when I deploy a test application in order to see result, the volume can never be created.