Comments (9)
A previous volume is mounting (and stuck). This happens for various reasons. Such as networking unreachable, nfs client not responding...
I think you can get some debug info for further investigation. You can go inside the driver pod that scheduled to the problematic node and run the mount command manually to see what's going on.
for example: mount -t nfs -v nfs.nfs-server-domain-name/pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8 -o nfsvers=4.1 /mnt
from csi-driver-nfs.
I ran the command and got the following output. The command stays stuck after that output.
# mount -t nfs -v nfs.nfs-dns-entry:/pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8 -o nfsvers=4.1 /mnt
mount.nfs: timeout set for Wed Apr 17 12:48:37 2024
mount.nfs: trying text-based options 'nfsvers=4.1,addr=172.20.0.14,clientaddr=10.0.2.227'
172.20.0.14 - Service ip of the NFS server
10.0.2.227 - Node ip address
from csi-driver-nfs.
It seem DNS is fine. Assuming 172.20.0.14 is accessible. Is there any useful log from kernel? Such as checking the log in host's /var/log/messages
from csi-driver-nfs.
dmesg
may also help, since mount -t nfs ...
command does not work either, I don't think this issue is related to this csi driver.
from csi-driver-nfs.
Last few messages from dmesg:
[ 3.725480] xfs filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
[ 3.786293] systemd-journald[1624]: Received request to flush runtime journal from PID 1
[ 3.911448] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 3.920847] Bridge firewalling registered
[ 3.950107] pps_core: LinuxPPS API ver. 1 registered
[ 3.953863] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <[email protected]>
[ 3.972306] PTP clock support registered
[ 3.991454] ena 0000:00:05.0: Elastic Network Adapter (ENA) v2.12.0g
[ 4.006003] ena 0000:00:05.0: ENA device version: 0.10
[ 4.009369] ena 0000:00:05.0: ENA controller version: 0.0.1 implementation version 1
[ 4.016205] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
[ 4.027867] input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input4
[ 4.037933] ACPI: Power Button [PWRF]
[ 4.040982] input: Sleep Button as /devices/LNXSYSTM:00/LNXSLPBN:00/input/input5
[ 4.041137] mousedev: PS/2 mouse device common for all mice
[ 4.049986] ACPI: Sleep Button [SLPF]
[ 4.070505] cryptd: max_cpu_qlen set to 1000
[ 4.088076] AVX2 version of gcm_enc/dec engaged.
[ 4.091253] AES CTR mode by8 optimization enabled
[ 4.093898] ena 0000:00:05.0: Forcing large headers and decreasing maximum TX queue size to 512
[ 4.100441] ena 0000:00:05.0: ENA Large LLQ is enabled
[ 4.112498] ena 0000:00:05.0: Elastic Network Adapter (ENA) found at mem febf4000, mac addr 0e:3e:57:4f:8f:91
[ 4.714978] RPC: Registered named UNIX socket transport module.
[ 4.718603] RPC: Registered udp transport module.
[ 4.721759] RPC: Registered tcp transport module.
[ 4.724943] RPC: Registered tcp NFSv4.1 backchannel transport module.
/var/log/messages
is full of log entries like this in repeated manner:
Apr 18 12:59:48 ip-10-0-2-227 containerd: time="2024-04-18T12:59:48.486079974Z" level=info msg="StopPodSandbox for \"a02d527d47aa52c2f2bbcfc441dc41ff4ff1589e0e96b7e769a8a07f6e7d2c91\""
Apr 18 12:59:48 ip-10-0-2-227 containerd: time="2024-04-18T12:59:48.486503462Z" level=info msg="Kill container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\""
Apr 18 13:00:30 ip-10-0-2-227 kubelet: E0418 13:00:30.434631 3643 kubelet.go:1884] "Unable to attach or mount volumes for pod; skipping pod" err="unmounted volumes=[home], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition" pod="my-namespace/env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5"
Apr 18 13:00:30 ip-10-0-2-227 kubelet: E0418 13:00:30.434663 3643 pod_workers.go:1294] "Error syncing pod, skipping" err="unmounted volumes=[home], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition" pod="my-namespace/env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5" podUID=096e1ea2-4c1e-46eb-8006-ebcc57e34563
Apr 18 13:00:42 ip-10-0-2-227 kubelet: I0418 13:00:42.604054 3643 reconciler_common.go:231] "operationExecutor.MountVolume started for volume \"pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8\" (UniqueName: \"kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8##\") pod \"env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5\" (UID: \"096e1ea2-4c1e-46eb-8006-ebcc57e34563\") " pod="my-namespace/env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5"
Apr 18 13:00:42 ip-10-0-2-227 kubelet: E0418 13:00:42.606815 3643 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8## podName: nodeName:}" failed. No retries permitted until 2024-04-18 13:02:44.606800725 +0000 UTC m=+230683.392009574 (durationBeforeRetry 2m2s). Error: MountVolume.SetUp failed for volume "pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8" (UniqueName: "kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8##") pod "env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5" (UID: "096e1ea2-4c1e-46eb-8006-ebcc57e34563") : rpc error: code = Aborted desc = An operation with the given Volume ID nfs.nfs-dns-entry##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8## already exists
Apr 18 13:00:51 ip-10-0-2-227 dhclient[2781]: XMT: Solicit on eth0, interval 125350ms.
Apr 18 13:01:01 ip-10-0-2-227 systemd: Created slice User Slice of root.
Apr 18 13:01:01 ip-10-0-2-227 systemd: Started Session 73 of user root.
Apr 18 13:01:01 ip-10-0-2-227 systemd: Removed slice User Slice of root.
Apr 18 13:01:22 ip-10-0-2-227 kubelet: I0418 13:01:22.323364 3643 reconciler_common.go:172] "operationExecutor.UnmountVolume started for volume \"home\" (UniqueName: \"kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-4fca47f1-0977-4635-9602-a4674e852143##\") pod \"b330aada-c335-4561-a123-ec543faf2bdd\" (UID: \"b330aada-c335-4561-a123-ec543faf2bdd\") "
Apr 18 13:01:22 ip-10-0-2-227 kubelet: E0418 13:01:22.324408 3643 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-4fca47f1-0977-4635-9602-a4674e852143## podName:b330aada-c335-4561-a123-ec543faf2bdd nodeName:}" failed. No retries permitted until 2024-04-18 13:03:24.324390297 +0000 UTC m=+230723.109599146 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume "home" (UniqueName: "kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-4fca47f1-0977-4635-9602-a4674e852143##") pod "b330aada-c335-4561-a123-ec543faf2bdd" (UID: "b330aada-c335-4561-a123-ec543faf2bdd") : kubernetes.io/csi: Unmounter.TearDownAt failed: rpc error: code = Aborted desc = An operation with the given Volume ID nfs.nfs-dns-entry##pvc-4fca47f1-0977-4635-9602-a4674e852143## already exists
Apr 18 13:01:48 ip-10-0-2-227 containerd: time="2024-04-18T13:01:48.486369925Z" level=error msg="StopPodSandbox for \"a02d527d47aa52c2f2bbcfc441dc41ff4ff1589e0e96b7e769a8a07f6e7d2c91\" failed" error="rpc error: code = DeadlineExceeded desc = failed to stop container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\": an error occurs during waiting for container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\" to be killed: wait container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\": context deadline exceeded"
Apr 18 13:01:48 ip-10-0-2-227 kubelet: E0418 13:01:48.486710 3643 remote_runtime.go:205] "StopPodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = failed to stop container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\": an error occurs during waiting for container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\" to be killed: wait container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\": context deadline exceeded" podSandboxID="a02d527d47aa52c2f2bbcfc441dc41ff4ff1589e0e96b7e769a8a07f6e7d2c91"
Apr 18 13:01:48 ip-10-0-2-227 kubelet: E0418 13:01:48.486754 3643 kuberuntime_manager.go:1325] "Failed to stop sandbox" podSandboxID={Type:containerd ID:a02d527d47aa52c2f2bbcfc441dc41ff4ff1589e0e96b7e769a8a07f6e7d2c91}
Apr 18 13:01:48 ip-10-0-2-227 kubelet: E0418 13:01:48.486814 3643 kubelet.go:1973] [failed to "KillContainer" for "env-0f06babb-df89-4ad8-be63-b1e4585cf766" with KillContainerError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded", failed to "KillPodSandbox" for "914f1fa5-fa96-4509-940c-027e0aba7768" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = failed to stop container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\": an error occurs during waiting for container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\" to be killed: wait container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\": context deadline exceeded"]
Apr 18 13:01:48 ip-10-0-2-227 kubelet: E0418 13:01:48.486843 3643 pod_workers.go:1294] "Error syncing pod, skipping" err="[failed to \"KillContainer\" for \"env-0f06babb-df89-4ad8-be63-b1e4585cf766\" with KillContainerError: \"rpc error: code = DeadlineExceeded desc = context deadline exceeded\", failed to \"KillPodSandbox\" for \"914f1fa5-fa96-4509-940c-027e0aba7768\" with KillPodSandboxError: \"rpc error: code = DeadlineExceeded desc = failed to stop container \\\"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\\\": an error occurs during waiting for container \\\"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\\\" to be killed: wait container \\\"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\\\": context deadline exceeded\"]" pod="my-namespace/env-0f06babb-df89-4ad8-be63-b1e4585cf766-575f5988bb-lgtbg" podUID=914f1fa5-fa96-4509-940c-027e0aba7768
Apr 18 13:01:49 ip-10-0-2-227 kubelet: I0418 13:01:49.128598 3643 kuberuntime_container.go:742] "Killing container with a grace period" pod="my-namespace/env-0f06babb-df89-4ad8-be63-b1e4585cf766-575f5988bb-lgtbg" podUID=914f1fa5-fa96-4509-940c-027e0aba7768 containerName="env-0f06babb-df89-4ad8-be63-b1e4585cf766" containerID="containerd://32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295" gracePeriod=30
Apr 18 13:01:49 ip-10-0-2-227 containerd: time="2024-04-18T13:01:49.128841097Z" level=info msg="StopContainer for \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\" with timeout 30 (s)"
Apr 18 13:01:49 ip-10-0-2-227 containerd: time="2024-04-18T13:01:49.129231594Z" level=info msg="Skipping the sending of signal terminated to container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\" because a prior stop with timeout>0 request already sent the signal"
Apr 18 13:02:19 ip-10-0-2-227 containerd: time="2024-04-18T13:02:19.130119329Z" level=info msg="Kill container \"32ba16985a8ca641d384a131b484c9b1f59123c6ebf6d01b09192cc0bcdcd295\""
Apr 18 13:02:44 ip-10-0-2-227 kubelet: I0418 13:02:44.686092 3643 reconciler_common.go:231] "operationExecutor.MountVolume started for volume \"pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8\" (UniqueName: \"kubernetes.io/csi/nfs.csi.k8s.io^nfs.nfs-dns-entry##pvc-ff98ceee-5581-4e29-9213-bc6c8e131de8##\") pod \"env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5\" (UID: \"096e1ea2-4c1e-46eb-8006-ebcc57e34563\") " pod="my-namespace/env-0ca4db1c-c09f-4383-a8c0-cb38f18263cf-c5dcbc7db-jb7d5"
Do you think the NFS server running inside the cluster has anything to do with this? I noticed host network trying to connect to service network here: mount.nfs: trying text-based options 'nfsvers=4.1,addr=172.20.0.14,clientaddr=10.0.2.227'
. Does anyone have similar setup and works without any issues?
from csi-driver-nfs.
I've been noticing inconsistency especially when using helm 4.7.0
from csi-driver-nfs.
Related Issues (20)
- Please publish Helm Chart to OCI HOT 3
- helm chart volumesnapshot rbac problem HOT 1
- configMap can't mounted as file on the NFS PVC HOT 2
- [Feature Request] Share a PV across namespaces HOT 6
- Warning VolumeResizeFailed pod/pueel-sxx-0 NodeExpandVolume.NodeExpandVolume failed for volume "logging-nfs-pv" : Expander.NodeExpand found CSI plugin kubernetes.io/csi/nfs.csi.k8s.io to not support node expansion HOT 3
- Documentation for fsGroupPolicy is outdated HOT 1
- Single-node (WIP) cluster can't schedule controller HOT 3
- Failed to pull image "registry.k8s.io/sig-storage/nfsplugin:v4.7.0": rpc error: code = NotFound HOT 2
- wrong chart intent when using multiple mountOptions
- Remount when connection lost HOT 2
- Mounting fails with error "/usr/sbin/start-statd: 10: cannot create /run/rpc.statd.lock: Read-only file system" HOT 7
- pvc pending status Waiting for a volume to be created either by the external provisioner HOT 5
- [Helm] Incorrect indentation when passing multiple mountOptions to the storageclass HOT 4
- Updating chart content without incrementing the version HOT 6
- CSI driver for NFS with Helm Install - github URL not found HOT 1
- mounter.SetupAt failed to check volume lifecycle mode: volume mode "Ephemeral" not supported by driver nfs.csi.k8s.io (only supports ["Persistent"]) HOT 1
- workingMountDir not created from emptyDir in csi-nfs-node pods so /tmp is not writable HOT 2
- PV stuck in status "Terminating" after deleting corresponding PVC HOT 2
- csi-snapshotter complains about missing CRD's quite noisily in the logs when externalSnapshotter Helm value is set to disabled HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csi-driver-nfs.