Comments (19)
Indeed after unmounting /proc/xen
it does work. I wonder if anything still use /proc/xen
in Qubes... AFAIR it's legacy location and the new one is /dev/xen
. There were more problems with /proc/xen
(where "normal files" behaves like character devices...). The fact that I could unmount it without killing anything suggests it isn't used anymore :)
from bubblewrap.
Looks like the combination of --unshare-user
, --unshare-pid
, and --proc /proc
is causing this. Test case:
bwrap --ro-bind / / --unshare-user --unshare-pid --proc /proc /bin/bash
If I remove any of those options, /bin/bash
is started. Otherwise, it throws an error:
Can't mount proc on /newroot/proc: Operation not permitted
Running with strace
doesn't say much more - indeed mount
syscall fails with EPERM:
mount("proc", "/newroot/proc", "proc", MS_MGC_VAL|MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = -1 EPERM (Operation not permitted)
Any idea?
from bubblewrap.
So, the kernel disallows mounting proc in the user + pid namespace. That is weird. Clearly it has mount capabilieites, because earlier mounts succeeded.
In the upstream kernel, procfs has:
static struct file_system_type proc_fs_type = {
.name = "proc",
.mount = proc_mount,
.kill_sb = proc_kill_sb,
.fs_flags = FS_USERNS_MOUNT,
};
This flag (FS_USERNS_MOUNT) should allow mounting a new proc instance in a user namespace. Does the qubes kernel change this in any way?
from bubblewrap.
And anyway, the debian build of bubblewrap uses setuid, so it should have capabilities in the parent namespace too. Very weird.
Does qubes itself use namespaces?
from bubblewrap.
from bubblewrap.
I wonder if its related to this: https://lwn.net/Articles/644932/
I.e. maybe your /proc
has some mount flag, or some covering mount.
How does your /proc/self/mounts
look?
from bubblewrap.
How does your
/proc/self/mounts
look?
sudo cat /proc/self/mounts
/dev/mapper/dmroot / ext4 rw,noatime,data=ordered 0 0
/dev/xvdd /lib/modules/4.4.31-11.pvops.qubes.x86_64 ext3 ro,relatime,data=ordered 0 0
sysfs /sys sysfs rw,relatime 0 0
proc /proc proc rw,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=149600k,nr_inodes=37400,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,size=1048576k,nr_inodes=39133 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,size=156532k,nr_inodes=39133,mode=755 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,nr_inodes=39133 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,size=156532k,nr_inodes=39133,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct 0 0
tmpfs /tmp tmpfs rw,size=1048576k,nr_inodes=39133 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
configfs /sys/kernel/config configfs rw,relatime 0 0
xen /proc/xen xenfs rw,relatime 0 0
/dev/xvdb /rw ext4 rw,relatime,discard,data=ordered 0 0
/dev/xvdb /home ext4 rw,relatime,discard,data=ordered 0 0
/dev/xvdb /var/spool/cron ext4 rw,relatime,discard,data=ordered 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
tmpfs /run/user/1000 tmpfs rw,nosuid,nodev,relatime,size=31308k,nr_inodes=39134,mode=700,uid=1000,gid=1000 0 0
from bubblewrap.
I don't have a xen build, but reading the code it seems this is the problem:
xen /proc/xen xenfs rw,relatime 0 0
This is created if you have the XEN_COMPAT_XENFS
config on in the kernel, and it is created by:
proc_mkdir("xen", NULL);
However, as far as I can see in the kernel that isn't enough to make it realize that this is an "empty" directory, and thus the /proc/xen mount is not covering anything. It should really call proc_create_mount_point("xen") for this to work.
Can you try disabling that kernel config option? (or fixing the mountpoint as per the above).
from bubblewrap.
If you want to be conservative, it might work to add a patch to bwrap to unmount it?
from bubblewrap.
(Just in the new mount namespace)
from bubblewrap.
from bubblewrap.
No, we can't unmount it. Thats the problem essentially. If /foo and /foo/bar are mountpoints when we create an unprivileged user namespace, then we get the two inherited as a unit, and we cannot unmount /foo/bar, because that may expose files under it that was not visible in the parent namespace. The same actually is true for mounting a new procfs instance, if /proc/foo was overmounted in the host, then we can't mount a fresh /proc, because we can see into foo where we couldn't before.
Of course in some cases we know it is safe, because foo is always empty, because the only reason its there is as a mountpoint. In such cases the kernel marks these directories as "always-empty", and mounts on top of them is not considered to cover anything, thus allowing a fresh proc to be mounted.
Changing proc_mkdir("xen", NULL) to proc_create_mount_point("xen") in the kernel would fix it, as the xen directory is then not considered covered.
from bubblewrap.
@alexlarsson Can we take advantage of the fact that we are suid to forcibly unmount /proc/xen
in the child? That does mean hardcoding /proc/xen
, but I consider that safe.
from bubblewrap.
The suid path isn't the future though. Based on comment #134 (comment) it sounds like Qubes is going to disable the legacy mountpoint which should address this issue, right?
from bubblewrap.
A quick git log -G proc.*mkdir.*xen
hits this commit which is in 4.10. So - anyone affected, upgrade your kernel.
from bubblewrap.
@cgwalters bwrap is suid at least on my system, and it would be nice to use it to solve this problem.
from bubblewrap.
Also apparently several legacy scripts in Quebes rely on /proc/xen
.
from bubblewrap.
Also apparently several legacy scripts in Quebes rely on /proc/xen.
Not that many. There is only one thing that is still used from that - /proc/xen/capabilities
, to detect dom0. Once replaced, we can get rid of /proc/xen
mount.
from bubblewrap.
This is still broken as of today in Qubes 3.2 with Fedora 27 template. Notably, it breaks video thumbnailing in Nautilus (and presumably other programs, whose video thumbnails do not show up):
[pid 8531] execve("/usr/bin/bwrap", ["bwrap", "--ro-bind", "/usr", "/usr", "--ro-bind", "/lib", "/lib", "--ro-bind", "/lib64", "/lib64", "--proc", "/proc", "--dev", "/dev", "--symlink", "usr/bin", "/bin", "--symlink", "usr/sbin", "/sbin", "--chdir", "/", "--setenv", "GIO_USE_VFS", "local", "--unshare-all", "--die-with-parent", "--bind", "/tmp/gnome-desktop-thumbnailer-0"..., "/tmp", "--ro-bind", "/home/user/sshfs/WhatsApp/Media/"..., ...], 0x58e7594331f0 /* 17 vars */ <unfinished ...>
[pid 3896] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 3896] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 3896] write(23, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 3948] write(23, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 3896] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 3948] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 8531] <... execve resumed> ) = 0
strace: Process 8532 attached
[pid 8531] write(5, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 8532] write(6, "0 1000 1\n", 9) = 9
[pid 8532] write(6, "deny\n", 5) = 5
[pid 8532] write(6, "0 1000 1\n", 9) = 9
[pid 8532] write(2, "bwrap: ", 7) = 7
[pid 8532] write(2, "Can't mount proc on /newroot/pro"..., 33) = 33
[pid 8532] write(2, ": Operation not permitted\n", 26) = 26
[pid 8532] +++ exited with 1 +++
[pid 8531] +++ exited with 1 +++
[pid 3947] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8531, si_uid=1000, si_status=1, si_utime=0, si_stime=0} ---
Please fix this.
from bubblewrap.
Related Issues (20)
- does bubblewrap blocks syscall utimensat ? HOT 2
- bwrap: Can't find source path /root/.cache/at-spi: Permission denied HOT 6
- bwrap with --unshare-pid runs twice and leaves a zombie process when ran inside a docker container HOT 4
- Directory at /proc/{PID}/root doesn't match root of the sandbox HOT 2
- [How-to] Handle 'chroot' system calls as an unprivileged user HOT 2
- Binding of joystick inside bubblewrap HOT 2
- bubblewrap should fall back to MS_MOVE if pivot_root() fails HOT 3
- What is a proper way to have a regular user with sudo and root in container? HOT 3
- "pivot_root: Invalid argument" when running on a SLURM cluster node from NFS HOT 12
- Overlayfs masking/whiteout layer
- Bubblewrap trying to access `/proc/sys/kernel/overflowuid` HOT 1
- Assessment of the difficulty in porting CPU architecture for bubblewrap HOT 1
- Best practices for running games on Linux with Nvidia HOT 6
- Fails to build with meson 1.3.0 rc1 due to broken bash-completion handling HOT 7
- Please specify the license in Github HOT 1
- [Question] How does bwrap handle nested bindings? HOT 3
- enhancement: --daemonize-with-child option
- not immediately obvious that `--file` can overwrite a file mounted rw from outside the container HOT 4
- bwrap processes not exiting cleanly under Linux 6.8 (likely kernel regression) HOT 24
- Is there like a native C Library?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bubblewrap.