Comments (10)
Could you help to create a separate issue for this
Created #8208 for it. Thank you!
from longhorn.
cc @c3y1huang
from longhorn.
The same test case, another type of failure:
ApiException: (0)
Reason: Handshake status 500 Internal Server Error -+-+-
{'content-length': '29', 'content-type': 'text/plain; charset=utf-8', 'date': 'Mon, 18 Mar 2024 12:42:25 GMT'} -+-+-
b'container not found ("sleep")'
https://ci.longhorn.io/job/private/job/longhorn-e2e-test/427/
from longhorn.
Running negative test case Reboot Node One By One While Workload Heavy Writing. It gets stuck in waiting for a pod of a deployment stable:
But the pod has been running:
[DEBUG] pod e2e-test-deployment-0-b957b9f54-4rt5p retry 59 != 60
Waiting for e2e-test-deployment-0 pods ['e2e-test-deployment-0-b957b9f54-4rt5p', 'e2e-test-deployment-0-b957b9f54-tnpzq'] stable, retry (59) ...
Waiting for e2e-test-deployment-0 pods ['e2e-test-deployment-0-b957b9f54-tnpzq'] stable, retry (60) ...
[DEBUG] pod e2e-test-deployment-0-b957b9f54-4rt5p retry 61 != 60
Somehow the count skipped 60. To address this, we can check whether the retry count exceeds or equals the maximum retry count.
from longhorn.
The same test case, another type of failure:
ApiException: (0) Reason: Handshake status 500 Internal Server Error -+-+- {'content-length': '29', 'content-type': 'text/plain; charset=utf-8', 'date': 'Mon, 18 Mar 2024 12:42:25 GMT'} -+-+- b'container not found ("sleep")'
https://ci.longhorn.io/job/private/job/longhorn-e2e-test/427/
I don't know what has triggered this.
- Volume attached at
12:31:39
.
Mar 18 12:31:39 ip-10-0-2-28 k3s[1352]: I0318 12:31:39.291688 1352 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"csi-8c8c2b911395deaf705fc0a43968072e1ac253115a720b249e14f9c24e8755ed\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
- Deployment verified stable at 12:37:59 (showing
20:37:59.360
in the report).
deployment . And Wait for deployment 0 pods stable
Start / End / Elapsed: 20240318 20:36:53.003 / 20240318 20:37:59.360 / 00:01:06.357
- The volume was unmounted.
Mar 18 12:41:49 ip-10-0-2-28 k3s[1352]: E0318 12:41:49.612041 1352 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 podName: nodeName:}" failed. No retries permitted until 2024-03-18 12:43:51.612014825 +0000 UTC m=+986.522755894 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0") pod "e2e-test-deployment-2-7ddccb49f4-l47jl" (UID: "89d256e9-5f00-4ac5-a768-94c41fe9f365") : rpc error: code = InvalidArgument desc = volume pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 hasn't been attached yet
Mar 18 12:41:49 ip-10-0-2-28 k3s[1352]: E0318 12:41:49.611852 1352 csi_attacher.go:364] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = InvalidArgument desc = volume pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 hasn't been attached yet
Mar 18 12:41:49 ip-10-0-2-28 k3s[1352]: I0318 12:41:49.593178 1352 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"csi-8c8c2b911395deaf705fc0a43968072e1ac253115a720b249e14f9c24e8755ed\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
Mar 18 12:41:49 ip-10-0-2-28 k3s[1352]: I0318 12:41:49.589369 1352 operation_generator.go:623] "MountVolume.WaitForAttach entering for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
Mar 18 12:40:54 ip-10-0-2-28 k3s[1352]: E0318 12:40:54.336071 1352 pod_workers.go:1300] "Error syncing pod, skipping" err="unmounted volumes=[pod-data], unattached volumes=[], failed to process volumes=[]: context deadline exceeded" pod="default/e2e-test-statefulset-2-0" podUID="51755125-cf24-4164-814c-ff779da0505c"
Mar 18 12:40:53 ip-10-0-2-28 k3s[1352]: E0318 12:40:53.336090 1352 pod_workers.go:1300] "Error syncing pod, skipping" err="unmounted volumes=[pod-data], unattached volumes=[], failed to process volumes=[]: context deadline exceeded" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl" podUID="89d256e9-5f00-4ac5-a768-94c41fe9f365"
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: E0318 12:39:47.768586 1352 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2 podName: nodeName:}" failed. No retries permitted until 2024-03-18 12:41:49.768566096 +0000 UTC m=+864.679307165 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-fb423261-1200-48a9-8077-e177d56746e2" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2") pod "e2e-test-statefulset-2-0" (UID: "51755125-cf24-4164-814c-ff779da0505c") : rpc error: code = InvalidArgument desc = volume pvc-fb423261-1200-48a9-8077-e177d56746e2 hasn't been attached yet
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: E0318 12:39:47.768380 1352 csi_attacher.go:364] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = InvalidArgument desc = volume pvc-fb423261-1200-48a9-8077-e177d56746e2 hasn't been attached yet
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: I0318 12:39:47.745025 1352 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-fb423261-1200-48a9-8077-e177d56746e2\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2\") pod \"e2e-test-statefulset-2-0\" (UID: \"51755125-cf24-4164-814c-ff779da0505c\") DevicePath \"csi-e3372656429ba5ddd2e47ad9bd23d10b758e716cac78ab1764808b39e824c032\"" pod="default/e2e-test-statefulset-2-0"
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: I0318 12:39:47.741015 1352 operation_generator.go:623] "MountVolume.WaitForAttach entering for volume \"pvc-fb423261-1200-48a9-8077-e177d56746e2\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2\") pod \"e2e-test-statefulset-2-0\" (UID: \"51755125-cf24-4164-814c-ff779da0505c\") DevicePath \"\"" pod="default/e2e-test-statefulset-2-0"
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: E0318 12:39:47.562797 1352 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 podName: nodeName:}" failed. No retries permitted until 2024-03-18 12:41:49.562778041 +0000 UTC m=+864.473519756 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0") pod "e2e-test-deployment-2-7ddccb49f4-l47jl" (UID: "89d256e9-5f00-4ac5-a768-94c41fe9f365") : rpc error: code = InvalidArgument desc = volume pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 hasn't been attached yet
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: E0318 12:39:47.562639 1352 csi_attacher.go:364] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = InvalidArgument desc = volume pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 hasn't been attached yet
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: I0318 12:39:47.543889 1352 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"csi-8c8c2b911395deaf705fc0a43968072e1ac253115a720b249e14f9c24e8755ed\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: I0318 12:39:47.540153 1352 operation_generator.go:623] "MountVolume.WaitForAttach entering for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
Mar 18 12:38:36 ip-10-0-2-28 k3s[1352]: E0318 12:38:36.336254 1352 pod_workers.go:1300] "Error syncing pod, skipping" err="unmounted volumes=[pod-data], unattached volumes=[], failed to process volumes=[]: context deadline exceeded" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl" podUID="89d256e9-5f00-4ac5-a768-94c41fe9f365"
Mar 18 12:38:36 ip-10-0-2-28 k3s[1352]: E0318 12:38:36.336255 1352 pod_workers.go:1300] "Error syncing pod, skipping" err="unmounted volumes=[pod-data], unattached volumes=[], failed to process volumes=[]: context deadline exceeded" pod="default/e2e-test-statefulset-2-0" podUID="51755125-cf24-4164-814c-ff779da0505c"
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: E0318 12:37:45.734339 1352 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2 podName: nodeName:}" failed. No retries permitted until 2024-03-18 12:39:47.734319757 +0000 UTC m=+742.645061587 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-fb423261-1200-48a9-8077-e177d56746e2" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2") pod "e2e-test-statefulset-2-0" (UID: "51755125-cf24-4164-814c-ff779da0505c") : rpc error: code = InvalidArgument desc = volume pvc-fb423261-1200-48a9-8077-e177d56746e2 hasn't been attached yet
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: E0318 12:37:45.734181 1352 csi_attacher.go:364] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = InvalidArgument desc = volume pvc-fb423261-1200-48a9-8077-e177d56746e2 hasn't been attached yet
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: I0318 12:37:45.717154 1352 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-fb423261-1200-48a9-8077-e177d56746e2\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2\") pod \"e2e-test-statefulset-2-0\" (UID: \"51755125-cf24-4164-814c-ff779da0505c\") DevicePath \"csi-e3372656429ba5ddd2e47ad9bd23d10b758e716cac78ab1764808b39e824c032\"" pod="default/e2e-test-statefulset-2-0"
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: I0318 12:37:45.713199 1352 operation_generator.go:623] "MountVolume.WaitForAttach entering for volume \"pvc-fb423261-1200-48a9-8077-e177d56746e2\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2\") pod \"e2e-test-statefulset-2-0\" (UID: \"51755125-cf24-4164-814c-ff779da0505c\") DevicePath \"\"" pod="default/e2e-test-statefulset-2-0"
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: E0318 12:37:45.534198 1352 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 podName: nodeName:}" failed. No retries permitted until 2024-03-18 12:39:47.534177625 +0000 UTC m=+742.444919394 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0") pod "e2e-test-deployment-2-7ddccb49f4-l47jl" (UID: "89d256e9-5f00-4ac5-a768-94c41fe9f365") : rpc error: code = InvalidArgument desc = volume pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 hasn't been attached yet
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: E0318 12:37:45.534030 1352 csi_attacher.go:364] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = InvalidArgument desc = volume pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 hasn't been attached yet
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: I0318 12:37:45.515715 1352 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"csi-8c8c2b911395deaf705fc0a43968072e1ac253115a720b249e14f9c24e8755ed\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: I0318 12:37:45.511729 1352 operation_generator.go:623] "MountVolume.WaitForAttach entering for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
- Hit container not found error at
12:42:25
.
ApiException: (0)
Reason: Handshake status 500 Internal Server Error -+-+- {'content-length': '29', 'content-type': 'text/plain; charset=utf-8', 'date': 'Mon, 18 Mar 2024 12:42:25 GMT'} -+-+- b'container not found ("sleep")'
@yangchiu , any idea? Do we hit this often?
from longhorn.
Do we hit this often?
Never seen this in previous release testing phases. Since it's the first release testing after the refactoring, it needs more time to figure out the reproducibility.
from longhorn.
Another type of failure:
Got /data/random-data checksum = d41d8cd98f00b204e9800998ecf8427e
Expected checksum = dd: can't open '/data/random-data': Input/output error
d41d8cd98f00b204e9800998ecf8427e
https://ci.longhorn.io/job/private/job/longhorn-e2e-test/430/
Need to check whether it's a real issue or test case defect.
from longhorn.
Another type of failure:
Got /data/random-data checksum = d41d8cd98f00b204e9800998ecf8427e Expected checksum = dd: can't open '/data/random-data': Input/output error d41d8cd98f00b204e9800998ecf8427e
https://ci.longhorn.io/job/private/job/longhorn-e2e-test/430/
Need to check whether it's a real issue or test case defect.
Could you help to create a separate issue for this because this is a different kind of failure. Combining newly discovered failures back to the original one makes the issue very hard to weigh its complexity. For this issue, I will just fix the looping issue as in the description. Thank you.
from longhorn.
Pre Ready-For-Testing Checklist
-
Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at: issue description -
Is there a workaround for the issue? If so, where is it documented?
The workaround is at: -
Does the PR include the explanation for the fix or the feature? -
Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
The PR for the YAML change is at:
The PR for the chart change is at: -
Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (includingbackport-needed/*
)?
The PR is at -
Which areas/issues this PR might have potential impacts on?
Area negative testing
Issues -
If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
The LEP PR is at -
If labeled: area/ui Has the UI issue filed or ready to be merged (includingbackport-needed/*
)?
The UI issue/PR is at -
If labeled: require/doc Has the necessary document PR submitted or merged (includingbackport-needed/*
)?
The documentation issue/PR is at -
If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including
backport-needed/*
)
The automation skeleton PR is at
The automation test case PR is at longhorn/longhorn-tests#1821
The issue of automation test case implementation is at (please create by the template) -
If labeled: require/automation-engine Has the engine integration test been merged (includingbackport-needed/*
)?
The engine automation PR is at -
If labeled: require/manual-test-plan Has the manual test plan been documented?
The updated manual test plan is at -
If the fix introduces the code for backward compatibility Has a separate issue been filed with the labelrelease/obsolete-compatibility
?
The compatibility issue is filed at
from longhorn.
Closing issue because PR merged.
from longhorn.
Related Issues (20)
- [BACKPORT][v1.6.2][BUG] BackupTarget conditions don't reflect connection errors in v1.6.0 HOT 2
- [IMPROVEMENT] Clean up BackupTarget condition message handling
- [BUG] talos /var/lib/rancher/longhorn vs /var/lib/longhorn HOT 6
- [FEATURE] Container-Optimized OS support for the v2 data engine
- [TEST][FEATURE] Container-Optimized OS support for the v2 data engine HOT 1
- [TEST] Analyze `test_ha_backup_deletion_recovery` flaky test case
- [BACKPORT][v1.6.1][IMPROVEMENT] Add dmsetup and dmcrypt utilities check in environment check script HOT 3
- [BUG] potential risk to unmap a negative number HOT 11
- [BACKPORT][v1.5.5][BUG] potential risk to unmap a negative number HOT 1
- [BACKPORT][v1.6.1][BUG] potential risk to unmap a negative number HOT 4
- Kubernetes added a new node, but Longhorn didn't detect the addition of the new node. How can I make Longhorn recognize the addition as well? I deployed using the kubectl method. HOT 3
- [TEST] Update regression job `K8S_DISTRO_VERSION` and `LONGHORN_STABLE_VERSION` parameter HOT 1
- [TEST] Verify upgrade for all gitops solutions HOT 2
- Add canonical links for SEO
- Almalinux 9 - longhorn-manager CrashLoopBackOff HOT 11
- Go-live checklist
- [TEST] Negative test case `Stress Volume Node Memory When Volume Is Offline Expanding` failed: `KeyError: 'test.longhorn.io/last-recorded-expanded-size'` HOT 1
- [CI] Add `xfstests` (filesystem testing suite) in CI test
- [BUG] Error get size (backups)
- [BUG] Negative test case got stuck in waiting for longhorn-ui pods HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from longhorn.