Comments (9)
I blacklisted in multipath.conf and the mount succeeded.
Not sure if this should expected to be new behavior or not but I'm glad it worked now... I've been using Longhorn for a while with LOTs of volumes on several clusters and never seen this happen despite not explicitly doing anything to configure multipath or fundamentally altering how I'm making volumes.
Tempting to close this issue myself but not sure if additional comments might be desired from your team.
from longhorn.
I have not tried downgrading longhorn back to 1.5.3
I am sorry that downgrading is not possible.
I noticed that you don't have any worker nodes, right?
Did you try to scale down the workload, wait a few minutes, and then scale up?
Could you provide the support bundle for investigating?
from longhorn.
I have cordoned and drained the node and shut it down a couple times... completely powering down the instance and powering it back on. The behavior is the same.
It is probably worth noting that this only seems to happen on this one node. but the nodes are all identical. I have checked all the filesystems. There are no errors. It is baffling.
from longhorn.
The support bundle is over 200MB and it cannot be attached. Also, I'm not totally sure employer would allow it to be sent anyway.
from longhorn.
I also tried completely formatting the nvme volume that Longhorn is using... by erasing and generating a whole new fs, and then letting longhorn create new replicas from the other nodes.... it still didn't help.
from longhorn.
You can send the SB to [email protected] if allowed.
Can you provide the output of:
- kubectl get volumes -n longhorn-system -o yaml pvc-ead36e0a-c7e3-48b8-940b-f8c1ea47bbaa
- Find the status.currentNodeID field in the above output. SSH into that node. run
lsblk
And check if it is caused by [multipathd issue]
from longhorn.
kubectl get volumes -n longhorn-system -o yaml pvc-ead36e0a-c7e3-48b8-940b-f8c1ea47bbaa
apiVersion: longhorn.io/v1beta2
kind: Volume
metadata:
creationTimestamp: "2024-04-11T07:08:26Z"
finalizers:
- longhorn.io
generation: 16
labels:
longhornvolume: pvc-ead36e0a-c7e3-48b8-940b-f8c1ea47bbaa
recurring-job-group.longhorn.io/default: enabled
setting.longhorn.io/remove-snapshots-during-filesystem-trim: ignored
setting.longhorn.io/replica-auto-balance: ignored
setting.longhorn.io/snapshot-data-integrity: ignored
name: pvc-ead36e0a-c7e3-48b8-940b-f8c1ea47bbaa
namespace: longhorn-system
resourceVersion: "38620568"
uid: 747354cc-a8bc-4ced-980d-de1474960dbf
spec:
Standby: false
accessMode: rwo
backendStoreDriver: ""
backingImage: ""
backupCompressionMethod: lz4
dataEngine: v1
dataLocality: disabled
dataSource: ""
disableFrontend: false
diskSelector:
- nvme
encrypted: false
engineImage: ""
fromBackup: ""
frontend: blockdev
image: longhornio/longhorn-engine:v1.6.1
lastAttachedBy: ""
migratable: false
migrationNodeID: ""
nodeID: apollo
nodeSelector: []
numberOfReplicas: 2
offlineReplicaRebuilding: disabled
replicaAutoBalance: ignored
replicaDiskSoftAntiAffinity: ignored
replicaSoftAntiAffinity: ignored
replicaZoneSoftAntiAffinity: ignored
restoreVolumeRecurringJob: ignored
revisionCounterDisabled: false
size: "8589934592"
snapshotDataIntegrity: ignored
snapshotMaxCount: 250
snapshotMaxSize: "0"
staleReplicaTimeout: 30
unmapMarkSnapChainRemoved: ignored
status:
actualSize: 199016448
cloneStatus:
snapshot: ""
sourceVolume: ""
state: ""
conditions:
- lastProbeTime: ""
lastTransitionTime: "2024-04-11T07:08:27Z"
message: ""
reason: ""
status: "False"
type: WaitForBackingImage
- lastProbeTime: ""
lastTransitionTime: "2024-04-11T07:08:27Z"
message: ""
reason: ""
status: "False"
type: TooManySnapshots
- lastProbeTime: ""
lastTransitionTime: "2024-04-11T07:08:27Z"
message: ""
reason: ""
status: "True"
type: Scheduled
- lastProbeTime: ""
lastTransitionTime: "2024-04-11T07:08:28Z"
message: ""
reason: ""
status: "False"
type: Restore
currentImage: longhornio/longhorn-engine:v1.6.1
currentMigrationNodeID: ""
currentNodeID: apollo
expansionRequired: false
frontendDisabled: false
isStandby: false
kubernetesStatus:
lastPVCRefAt: ""
lastPodRefAt: ""
namespace: gitlab
pvName: pvc-ead36e0a-c7e3-48b8-940b-f8c1ea47bbaa
pvStatus: Bound
pvcName: data-gitlab-postgresql-0
workloadsStatus:
- podName: gitlab-postgresql-0
podStatus: Pending
workloadName: gitlab-postgresql
workloadType: StatefulSet
lastBackup: ""
lastBackupAt: ""
lastDegradedAt: ""
offlineReplicaRebuildingRequired: false
ownerID: apollo
pendingNodeID: ""
remountRequestedAt: "2024-04-12T01:40:28Z"
restoreInitiated: false
restoreRequired: false
robustness: healthy
shareEndpoint: ""
shareState: ""
state: attached
sudo lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 39.1M 1 loop /snap/snapd/21184
loop1 7:1 0 87M 1 loop /snap/lxd/27428
loop2 7:2 0 40.4M 1 loop /snap/snapd/20671
loop3 7:3 0 63.9M 1 loop /snap/core20/2182
loop4 7:4 0 170.7M 1 loop /snap/microk8s/6542
loop5 7:5 0 169.8M 1 loop /snap/microk8s/6239
loop6 7:6 0 87M 1 loop /snap/lxd/27948
loop7 7:7 0 63.9M 1 loop /snap/core20/2264
sda 8:0 0 18.2T 0 disk
└─sda1 8:1 0 18.2T 0 part
└─vg--sata-lv--sata1
252:1 0 18.2T 0 lvm /var/lib/longhorn-sata
sdb 8:16 0 8G 0 disk
└─mpathb 252:2 0 8G 0 mpath
sdc 8:32 0 8G 0 disk /var/snap/microk8s/common/var/lib/kubelet/pods/99bc9488-d81f-4b78-ae87-50bb3ad3d00f/volumes/kubernetes.io~csi/pvc-90df0847-e118-44f9-aa52-8030375019fd/mount
/var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/2cead03e5ec7c668621ceea9a2b9c14608ca66563d92f000d16861cdda7980ea/globalmount
sde 8:64 0 50G 0 disk /var/snap/microk8s/common/var/lib/kubelet/pods/b2a92acb-e219-43e0-8b7f-09b109e2d3df/volumes/kubernetes.io~csi/pvc-43020953-8b8e-43c4-870d-5af9e4552327/mount
/var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/7c516ada76d5831740be910e8f226e44162f379bfc9b297cc833b6f8964bae4b/globalmount
nvme0n1 259:0 0 7.3T 0 disk
├─nvme0n1p1 259:1 0 488M 0 part /boot/efi
├─nvme0n1p2 259:2 0 232.8G 0 part /var/snap/microk8s/common/var/lib/kubelet/pods/b2a92acb-e219-43e0-8b7f-09b109e2d3df/volume-subpaths/empty-dir/metrics/0
│ /var/snap/microk8s/common/var/lib/kubelet/pods/b2a92acb-e219-43e0-8b7f-09b109e2d3df/volume-subpaths/empty-dir/sentinel/0
│ /var/snap/microk8s/common/var/lib/kubelet/pods/b2a92acb-e219-43e0-8b7f-09b109e2d3df/volume-subpaths/empty-dir/redis/5
│ /var/snap/microk8s/common/var/lib/kubelet/pods/b2a92acb-e219-43e0-8b7f-09b109e2d3df/volume-subpaths/empty-dir/redis/4
│ /var/snap/microk8s/common/var/lib/kubelet/pods/cc93231a-db4a-41ab-9115-2a28319cbbe8/volume-subpaths/gitlab-exporter-config/gitlab-exporter/0
│ /var/snap/microk8s/common/var/lib/kubelet/pods/4b6cba62-db23-4aee-a343-4d2055d1fef0/volume-subpaths/sshd-config/gitlab-shell/3
│ /var/snap/microk8s/common/var/lib/kubelet/pods/4b6cba62-db23-4aee-a343-4d2055d1fef0/volume-subpaths/shell-config/gitlab-shell/2
│ /var/snap/microk8s/common/var/lib/kubelet/pods/08c19342-28c6-4a22-994e-9a0d93911eb5/volume-subpaths/nvidia-device-plugin-entrypoint/nvidia-device-plugin/0
│ /var/snap/microk8s/common/var/lib/kubelet/pods/bc63cc88-8537-4e15-9874-1595dd0a5714/volume-subpaths/nvidia-container-toolkit-entrypoint/nvidia-container-toolkit-ctr/0
│ /
└─nvme0n1p3 259:3 0 7T 0 part
└─vg--nvme-lv--nvme1
252:0 0 7T 0 lvm /var/lib/longhorn-nvme
from longhorn.
I am suspicious of multipath now because lsblk
mentions mpath, I'm reading that documentation page now. Is that something you would expect to see in 1.6.1 but not 1.5.3 somehow?
from longhorn.
Feel free to close or reopen it if you have any concerns in the future.
from longhorn.
Related Issues (20)
- [IMPROVEMENT] Add the ability to add custom labels to workloads HOT 1
- DOCS - Incorrect documentation on pre-upgrade checker configuration HOT 1
- [BACKPORT][v1.6.2]DOCS - Incorrect documentation on pre-upgrade checker configuration HOT 1
- [IMPROVEMENT] Pre-check Longhorn cluster conditions before upgrading Longhorn HOT 2
- [BUG] [v1.5.5-rc1] rwx volume fails to mount on pod HOT 11
- [BUG] Do not terminate nfs-ganesha in share-manager pod after failing to access recovery backend HOT 2
- [BACKPORT][v1.6.2][IMPROVEMENT] Do not terminate nfs-ganesha in share-manager pod after failing to access recovery backend HOT 2
- [BACKPORT][v1.5.5][IMPROVEMENT] Do not terminate nfs-ganesha in share-manager pod after failing to access recovery backend HOT 2
- [TEST] [v1.5.5-rc1] test case test_backuptarget_invalid failed HOT 4
- [QUESTION] longhorn-manager pods in CrashLoopBackOff state and longhorn-uninstall pods in Error state HOT 10
- [TEST] [ROBOT] implement manual test case Single replica node down HOT 2
- [FEATURE] Allow longhorn to restart pods with custom controllers, while the "Automatically Delete Workload Pod when The Volume Is Detached Unexpectedly" feature is enabled HOT 3
- [BUG] v2 volume gets stuck after force deleting one of its replicas HOT 8
- [TEST] [ROBOT] implement manual test case `Test backup listing S3/NFS`
- [RELEASE] 1.5.5 HOT 14
- How to use multiple directories to store data? HOT 1
- How to mount pvc nonrootuser ? OpenShift 4.15.6 HOT 1
- [BUG]cannot mount a PV into a pods HOT 2
- [FEATURE] Share manager pod placement via PVC Annotations HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from longhorn.