Comments (8)
Its flaky across all jobs, not just this.
I think it's related to running on kind, where hostpath volumes go to the host. There may be races due another test grabbing the same device
from csi-driver-host-path.
Although if that were the case, then intree parallel tests should be flaky too? Maybe it's a probability game since there a lot more tests running intree
from csi-driver-host-path.
There may be races due another test grabbing the same device
Creating a loop device is done by kubelet and the CSI hostpath driver with https://github.com/kubernetes/kubernetes/blob/8a09460c2f7ba8f6acd8a6fb7603ed3ac4805eb6/pkg/volume/util/volumepathhandler/volume_path_handler_linux.go#L96-L112.
The command invocation itself (= "losetup -f --show") shouldn't be racy. Let's look at its implementation:
openat(AT_FDCWD, "/dev/loop-control", O_RDWR|O_CLOEXEC) = 3
ioctl(3, LOOP_CTL_GET_FREE) = 0
close(3) = 0
lstat("/tmp", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=260, ...}) = 0
lstat("/tmp/test", {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0
openat(AT_FDCWD, "/tmp/test", O_RDWR|O_CLOEXEC) = 3
openat(AT_FDCWD, "/dev/loop0", O_RDWR|O_CLOEXEC) = 4
ioctl(4, LOOP_SET_FD, 3) = 0
ioctl(4, LOOP_SET_STATUS64, {lo_offset=0, lo_number=0, lo_flags=0, lo_file_name="/tmp/test", ...}) = 0
losetup
first determines the next available loop device, then opens it and configures it. At first glance this looks like it might be racy, but that depends on how LOOP_SET_FD
and losetup
behave when someone else grabs the device in parallel.
I tested that by running losetup
under gdb
and pausing it in the LOOP_SET_FD
ioctl, then taking the loop device with some other losetup
call. The first invocation recovered gracefully from that by retrying with another loop device.
So my conclusion is that we don't have a race around loop device creation.
Perhaps KinD containers have the same issue that we also had in the CSI hostpath container where /dev
is a static copy of the host and then new loop devices don't show up?
Indeed, when I just ran the CSI hostpath prow tests on my development machine after using all existing loop devices, the blockvolume tests fail because losetup
fails:
Jan 13 08:31:51.654: INFO: At 2020-01-13 08:27:28 +0100 CET - event for security-context-6ade56c3-0fd9-4080-a3b5-e46a57584a4c: {kubelet csi-prow-worker} FailedMapVolume: MapVolume.AttachFileDevice failed for volume "pvc-228ac164-bec1-43fe-a3d1-7671ef0549f6" : exit status 1
I've not found more about it in the logs, but I can reproduce the problem manually by creating loop devices with docker exec f049e80f378a sh -c 'truncate -s 1G /tmp/testfile; losetup -f --show /tmp/testfile'
until eventually all the ones that existed when the KinD container was started are in use and the command fails with:
losetup: /tmp/testfile: failed to set up loop device: No such file or directory
I'll file an issue against KinD.
from csi-driver-host-path.
I'll file an issue against KinD.
from csi-driver-host-path.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
from csi-driver-host-path.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
from csi-driver-host-path.
/close
from csi-driver-host-path.
@msau42: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
from csi-driver-host-path.
Related Issues (20)
- Need to locate the correct path of `csi-hostpath-plugin.yaml`
- Broken link of `contributor cheatsheet` Need to fix HOT 4
- SELinuxMountReadWriteOncePod tests are failing in CI HOT 4
- Implement support for SINGLE_NODE_SINGLE_WRITER enforcement in NodePublishVolume
- Host Path PV Encryption HOT 5
- Switch from k8s.gcr.io to registry.k8s.io HOT 5
- [enable discussion]
- [changing the pod's node == migrate the volume ?] HOT 1
- Inconsistent namespace in deployment templates HOT 3
- failed to provision volume with StorageClass "csi-hostpath-sc": error getting handle for DataSource Type PersistentVolumeClaim by Name pvc-claim-1: claim in dataSource not bound or invalid HOT 5
- Test config needs to disable new snapshot restore tests
- Are high vulnerabilities being addressed? HOT 10
- Sample implementation in CSI hostpath mock driver
- Remove dependencies on docker-hosted images HOT 5
- [discussion]is this line mean the directory created by hostpath csi driver with file mode 777? HOT 4
- Update the mentions of testgrid dashboard URL from k8s-testgrid.appspot.com to testgrid.k8s.io HOT 1
- Unix sock file is not removed after the server stop HOT 4
- State file not updated after stage/publish action
- 1.29 - [KEP-3751] VAC Hostpath Driver Implementation HOT 4
- Symlinks broken in /deploy HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csi-driver-host-path.