Code Monkey home page Code Monkey logo

Comments (9)

linuxfreakus avatar linuxfreakus commented on September 26, 2024 2

I blacklisted in multipath.conf and the mount succeeded.

Not sure if this should expected to be new behavior or not but I'm glad it worked now... I've been using Longhorn for a while with LOTs of volumes on several clusters and never seen this happen despite not explicitly doing anything to configure multipath or fundamentally altering how I'm making volumes.

Tempting to close this issue myself but not sure if additional comments might be desired from your team.

from longhorn.

mantissahz avatar mantissahz commented on September 26, 2024

I have not tried downgrading longhorn back to 1.5.3

I am sorry that downgrading is not possible.

I noticed that you don't have any worker nodes, right?
Did you try to scale down the workload, wait a few minutes, and then scale up?
Could you provide the support bundle for investigating?

from longhorn.

linuxfreakus avatar linuxfreakus commented on September 26, 2024

I have cordoned and drained the node and shut it down a couple times... completely powering down the instance and powering it back on. The behavior is the same.

It is probably worth noting that this only seems to happen on this one node. but the nodes are all identical. I have checked all the filesystems. There are no errors. It is baffling.

from longhorn.

linuxfreakus avatar linuxfreakus commented on September 26, 2024

The support bundle is over 200MB and it cannot be attached. Also, I'm not totally sure employer would allow it to be sent anyway.

from longhorn.

linuxfreakus avatar linuxfreakus commented on September 26, 2024

I also tried completely formatting the nvme volume that Longhorn is using... by erasing and generating a whole new fs, and then letting longhorn create new replicas from the other nodes.... it still didn't help.

from longhorn.

mantissahz avatar mantissahz commented on September 26, 2024

You can send the SB to [email protected] if allowed.
Can you provide the output of:

  1. kubectl get volumes -n longhorn-system -o yaml pvc-ead36e0a-c7e3-48b8-940b-f8c1ea47bbaa
  2. Find the status.currentNodeID field in the above output. SSH into that node. run lsblk

And check if it is caused by [multipathd issue]

from longhorn.

linuxfreakus avatar linuxfreakus commented on September 26, 2024
kubectl get volumes -n longhorn-system -o yaml pvc-ead36e0a-c7e3-48b8-940b-f8c1ea47bbaa
apiVersion: longhorn.io/v1beta2
kind: Volume
metadata:
  creationTimestamp: "2024-04-11T07:08:26Z"
  finalizers:
  - longhorn.io
  generation: 16
  labels:
    longhornvolume: pvc-ead36e0a-c7e3-48b8-940b-f8c1ea47bbaa
    recurring-job-group.longhorn.io/default: enabled
    setting.longhorn.io/remove-snapshots-during-filesystem-trim: ignored
    setting.longhorn.io/replica-auto-balance: ignored
    setting.longhorn.io/snapshot-data-integrity: ignored
  name: pvc-ead36e0a-c7e3-48b8-940b-f8c1ea47bbaa
  namespace: longhorn-system
  resourceVersion: "38620568"
  uid: 747354cc-a8bc-4ced-980d-de1474960dbf
spec:
  Standby: false
  accessMode: rwo
  backendStoreDriver: ""
  backingImage: ""
  backupCompressionMethod: lz4
  dataEngine: v1
  dataLocality: disabled
  dataSource: ""
  disableFrontend: false
  diskSelector:
  - nvme
  encrypted: false
  engineImage: ""
  fromBackup: ""
  frontend: blockdev
  image: longhornio/longhorn-engine:v1.6.1
  lastAttachedBy: ""
  migratable: false
  migrationNodeID: ""
  nodeID: apollo
  nodeSelector: []
  numberOfReplicas: 2
  offlineReplicaRebuilding: disabled
  replicaAutoBalance: ignored
  replicaDiskSoftAntiAffinity: ignored
  replicaSoftAntiAffinity: ignored
  replicaZoneSoftAntiAffinity: ignored
  restoreVolumeRecurringJob: ignored
  revisionCounterDisabled: false
  size: "8589934592"
  snapshotDataIntegrity: ignored
  snapshotMaxCount: 250
  snapshotMaxSize: "0"
  staleReplicaTimeout: 30
  unmapMarkSnapChainRemoved: ignored
status:
  actualSize: 199016448
  cloneStatus:
    snapshot: ""
    sourceVolume: ""
    state: ""
  conditions:
  - lastProbeTime: ""
    lastTransitionTime: "2024-04-11T07:08:27Z"
    message: ""
    reason: ""
    status: "False"
    type: WaitForBackingImage
  - lastProbeTime: ""
    lastTransitionTime: "2024-04-11T07:08:27Z"
    message: ""
    reason: ""
    status: "False"
    type: TooManySnapshots
  - lastProbeTime: ""
    lastTransitionTime: "2024-04-11T07:08:27Z"
    message: ""
    reason: ""
    status: "True"
    type: Scheduled
  - lastProbeTime: ""
    lastTransitionTime: "2024-04-11T07:08:28Z"
    message: ""
    reason: ""
    status: "False"
    type: Restore
  currentImage: longhornio/longhorn-engine:v1.6.1
  currentMigrationNodeID: ""
  currentNodeID: apollo
  expansionRequired: false
  frontendDisabled: false
  isStandby: false
  kubernetesStatus:
    lastPVCRefAt: ""
    lastPodRefAt: ""
    namespace: gitlab
    pvName: pvc-ead36e0a-c7e3-48b8-940b-f8c1ea47bbaa
    pvStatus: Bound
    pvcName: data-gitlab-postgresql-0
    workloadsStatus:
    - podName: gitlab-postgresql-0
      podStatus: Pending
      workloadName: gitlab-postgresql
      workloadType: StatefulSet
  lastBackup: ""
  lastBackupAt: ""
  lastDegradedAt: ""
  offlineReplicaRebuildingRequired: false
  ownerID: apollo
  pendingNodeID: ""
  remountRequestedAt: "2024-04-12T01:40:28Z"
  restoreInitiated: false
  restoreRequired: false
  robustness: healthy
  shareEndpoint: ""
  shareState: ""
  state: attached
sudo lsblk
NAME             MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
loop0              7:0    0  39.1M  1 loop  /snap/snapd/21184
loop1              7:1    0    87M  1 loop  /snap/lxd/27428
loop2              7:2    0  40.4M  1 loop  /snap/snapd/20671
loop3              7:3    0  63.9M  1 loop  /snap/core20/2182
loop4              7:4    0 170.7M  1 loop  /snap/microk8s/6542
loop5              7:5    0 169.8M  1 loop  /snap/microk8s/6239
loop6              7:6    0    87M  1 loop  /snap/lxd/27948
loop7              7:7    0  63.9M  1 loop  /snap/core20/2264
sda                8:0    0  18.2T  0 disk
└─sda1             8:1    0  18.2T  0 part
  └─vg--sata-lv--sata1
                 252:1    0  18.2T  0 lvm   /var/lib/longhorn-sata
sdb                8:16   0     8G  0 disk
└─mpathb         252:2    0     8G  0 mpath
sdc                8:32   0     8G  0 disk  /var/snap/microk8s/common/var/lib/kubelet/pods/99bc9488-d81f-4b78-ae87-50bb3ad3d00f/volumes/kubernetes.io~csi/pvc-90df0847-e118-44f9-aa52-8030375019fd/mount
                                            /var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/2cead03e5ec7c668621ceea9a2b9c14608ca66563d92f000d16861cdda7980ea/globalmount
sde                8:64   0    50G  0 disk  /var/snap/microk8s/common/var/lib/kubelet/pods/b2a92acb-e219-43e0-8b7f-09b109e2d3df/volumes/kubernetes.io~csi/pvc-43020953-8b8e-43c4-870d-5af9e4552327/mount
                                            /var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/7c516ada76d5831740be910e8f226e44162f379bfc9b297cc833b6f8964bae4b/globalmount
nvme0n1          259:0    0   7.3T  0 disk
├─nvme0n1p1      259:1    0   488M  0 part  /boot/efi
├─nvme0n1p2      259:2    0 232.8G  0 part  /var/snap/microk8s/common/var/lib/kubelet/pods/b2a92acb-e219-43e0-8b7f-09b109e2d3df/volume-subpaths/empty-dir/metrics/0
│                                           /var/snap/microk8s/common/var/lib/kubelet/pods/b2a92acb-e219-43e0-8b7f-09b109e2d3df/volume-subpaths/empty-dir/sentinel/0
│                                           /var/snap/microk8s/common/var/lib/kubelet/pods/b2a92acb-e219-43e0-8b7f-09b109e2d3df/volume-subpaths/empty-dir/redis/5
│                                           /var/snap/microk8s/common/var/lib/kubelet/pods/b2a92acb-e219-43e0-8b7f-09b109e2d3df/volume-subpaths/empty-dir/redis/4
│                                           /var/snap/microk8s/common/var/lib/kubelet/pods/cc93231a-db4a-41ab-9115-2a28319cbbe8/volume-subpaths/gitlab-exporter-config/gitlab-exporter/0
│                                           /var/snap/microk8s/common/var/lib/kubelet/pods/4b6cba62-db23-4aee-a343-4d2055d1fef0/volume-subpaths/sshd-config/gitlab-shell/3
│                                           /var/snap/microk8s/common/var/lib/kubelet/pods/4b6cba62-db23-4aee-a343-4d2055d1fef0/volume-subpaths/shell-config/gitlab-shell/2
│                                           /var/snap/microk8s/common/var/lib/kubelet/pods/08c19342-28c6-4a22-994e-9a0d93911eb5/volume-subpaths/nvidia-device-plugin-entrypoint/nvidia-device-plugin/0
│                                           /var/snap/microk8s/common/var/lib/kubelet/pods/bc63cc88-8537-4e15-9874-1595dd0a5714/volume-subpaths/nvidia-container-toolkit-entrypoint/nvidia-container-toolkit-ctr/0
│                                           /
└─nvme0n1p3      259:3    0     7T  0 part
  └─vg--nvme-lv--nvme1
                 252:0    0     7T  0 lvm   /var/lib/longhorn-nvme

from longhorn.

linuxfreakus avatar linuxfreakus commented on September 26, 2024

I am suspicious of multipath now because lsblk mentions mpath, I'm reading that documentation page now. Is that something you would expect to see in 1.6.1 but not 1.5.3 somehow?

from longhorn.

mantissahz avatar mantissahz commented on September 26, 2024

Feel free to close or reopen it if you have any concerns in the future.

from longhorn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.