Code Monkey home page Code Monkey logo

Comments (19)

marmarek avatar marmarek commented on June 1, 2024 1

Indeed after unmounting /proc/xen it does work. I wonder if anything still use /proc/xen in Qubes... AFAIR it's legacy location and the new one is /dev/xen. There were more problems with /proc/xen (where "normal files" behaves like character devices...). The fact that I could unmount it without killing anything suggests it isn't used anymore :)

from bubblewrap.

marmarek avatar marmarek commented on June 1, 2024

Looks like the combination of --unshare-user, --unshare-pid, and --proc /proc is causing this. Test case:

bwrap --ro-bind / /  --unshare-user --unshare-pid   --proc /proc  /bin/bash

If I remove any of those options, /bin/bash is started. Otherwise, it throws an error:

Can't mount proc on /newroot/proc: Operation not permitted

Running with strace doesn't say much more - indeed mount syscall fails with EPERM:

mount("proc", "/newroot/proc", "proc", MS_MGC_VAL|MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = -1 EPERM (Operation not permitted)

Any idea?

from bubblewrap.

alexlarsson avatar alexlarsson commented on June 1, 2024

So, the kernel disallows mounting proc in the user + pid namespace. That is weird. Clearly it has mount capabilieites, because earlier mounts succeeded.

In the upstream kernel, procfs has:

static struct file_system_type proc_fs_type = {
        .name           = "proc",
        .mount          = proc_mount,
        .kill_sb        = proc_kill_sb,
        .fs_flags       = FS_USERNS_MOUNT,
};

This flag (FS_USERNS_MOUNT) should allow mounting a new proc instance in a user namespace. Does the qubes kernel change this in any way?

from bubblewrap.

alexlarsson avatar alexlarsson commented on June 1, 2024

And anyway, the debian build of bubblewrap uses setuid, so it should have capabilities in the parent namespace too. Very weird.

Does qubes itself use namespaces?

from bubblewrap.

marmarek avatar marmarek commented on June 1, 2024

from bubblewrap.

alexlarsson avatar alexlarsson commented on June 1, 2024

I wonder if its related to this: https://lwn.net/Articles/644932/
I.e. maybe your /proc has some mount flag, or some covering mount.
How does your /proc/self/mounts look?

from bubblewrap.

adrelanos avatar adrelanos commented on June 1, 2024

How does your /proc/self/mounts look?

sudo cat /proc/self/mounts
/dev/mapper/dmroot / ext4 rw,noatime,data=ordered 0 0
/dev/xvdd /lib/modules/4.4.31-11.pvops.qubes.x86_64 ext3 ro,relatime,data=ordered 0 0
sysfs /sys sysfs rw,relatime 0 0
proc /proc proc rw,relatime 0 0
devtmpfs /dev devtmpfs rw,nosuid,size=149600k,nr_inodes=37400,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,size=1048576k,nr_inodes=39133 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,size=156532k,nr_inodes=39133,mode=755 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,nr_inodes=39133 0 0
tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,size=156532k,nr_inodes=39133,mode=755 0 0
cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=22,pgrp=1,timeout=300,minproto=5,maxproto=5,direct 0 0
tmpfs /tmp tmpfs rw,size=1048576k,nr_inodes=39133 0 0
mqueue /dev/mqueue mqueue rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
configfs /sys/kernel/config configfs rw,relatime 0 0
xen /proc/xen xenfs rw,relatime 0 0
/dev/xvdb /rw ext4 rw,relatime,discard,data=ordered 0 0
/dev/xvdb /home ext4 rw,relatime,discard,data=ordered 0 0
/dev/xvdb /var/spool/cron ext4 rw,relatime,discard,data=ordered 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
tmpfs /run/user/1000 tmpfs rw,nosuid,nodev,relatime,size=31308k,nr_inodes=39134,mode=700,uid=1000,gid=1000 0 0

from bubblewrap.

alexlarsson avatar alexlarsson commented on June 1, 2024

I don't have a xen build, but reading the code it seems this is the problem:

xen /proc/xen xenfs rw,relatime 0 0

This is created if you have the XEN_COMPAT_XENFS config on in the kernel, and it is created by:

        proc_mkdir("xen", NULL);

However, as far as I can see in the kernel that isn't enough to make it realize that this is an "empty" directory, and thus the /proc/xen mount is not covering anything. It should really call proc_create_mount_point("xen") for this to work.

Can you try disabling that kernel config option? (or fixing the mountpoint as per the above).

from bubblewrap.

cgwalters avatar cgwalters commented on June 1, 2024

If you want to be conservative, it might work to add a patch to bwrap to unmount it?

from bubblewrap.

cgwalters avatar cgwalters commented on June 1, 2024

(Just in the new mount namespace)

from bubblewrap.

adrelanos avatar adrelanos commented on June 1, 2024

from bubblewrap.

alexlarsson avatar alexlarsson commented on June 1, 2024

No, we can't unmount it. Thats the problem essentially. If /foo and /foo/bar are mountpoints when we create an unprivileged user namespace, then we get the two inherited as a unit, and we cannot unmount /foo/bar, because that may expose files under it that was not visible in the parent namespace. The same actually is true for mounting a new procfs instance, if /proc/foo was overmounted in the host, then we can't mount a fresh /proc, because we can see into foo where we couldn't before.

Of course in some cases we know it is safe, because foo is always empty, because the only reason its there is as a mountpoint. In such cases the kernel marks these directories as "always-empty", and mounts on top of them is not considered to cover anything, thus allowing a fresh proc to be mounted.

Changing proc_mkdir("xen", NULL) to proc_create_mount_point("xen") in the kernel would fix it, as the xen directory is then not considered covered.

from bubblewrap.

DemiMarie avatar DemiMarie commented on June 1, 2024

@alexlarsson Can we take advantage of the fact that we are suid to forcibly unmount /proc/xen in the child? That does mean hardcoding /proc/xen, but I consider that safe.

from bubblewrap.

cgwalters avatar cgwalters commented on June 1, 2024

The suid path isn't the future though. Based on comment #134 (comment) it sounds like Qubes is going to disable the legacy mountpoint which should address this issue, right?

from bubblewrap.

cgwalters avatar cgwalters commented on June 1, 2024

A quick git log -G proc.*mkdir.*xen hits this commit which is in 4.10. So - anyone affected, upgrade your kernel.

from bubblewrap.

DemiMarie avatar DemiMarie commented on June 1, 2024

@cgwalters bwrap is suid at least on my system, and it would be nice to use it to solve this problem.

from bubblewrap.

DemiMarie avatar DemiMarie commented on June 1, 2024

Also apparently several legacy scripts in Quebes rely on /proc/xen.

from bubblewrap.

marmarek avatar marmarek commented on June 1, 2024

Also apparently several legacy scripts in Quebes rely on /proc/xen.

Not that many. There is only one thing that is still used from that - /proc/xen/capabilities, to detect dom0. Once replaced, we can get rid of /proc/xen mount.

from bubblewrap.

Rudd-O avatar Rudd-O commented on June 1, 2024

This is still broken as of today in Qubes 3.2 with Fedora 27 template. Notably, it breaks video thumbnailing in Nautilus (and presumably other programs, whose video thumbnails do not show up):

[pid  8531] execve("/usr/bin/bwrap", ["bwrap", "--ro-bind", "/usr", "/usr", "--ro-bind", "/lib", "/lib", "--ro-bind", "/lib64", "/lib64", "--proc", "/proc", "--dev", "/dev", "--symlink", "usr/bin", "/bin", "--symlink", "usr/sbin", "/sbin", "--chdir", "/", "--setenv", "GIO_USE_VFS", "local", "--unshare-all", "--die-with-parent", "--bind", "/tmp/gnome-desktop-thumbnailer-0"..., "/tmp", "--ro-bind", "/home/user/sshfs/WhatsApp/Media/"..., ...], 0x58e7594331f0 /* 17 vars */ <unfinished ...>
[pid  3896] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid  3896] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid  3896] write(23, "\1\0\0\0\0\0\0\0", 8) = 8
[pid  3948] write(23, "\1\0\0\0\0\0\0\0", 8) = 8
[pid  3896] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid  3948] write(4, "\1\0\0\0\0\0\0\0", 8) = 8
[pid  8531] <... execve resumed> )      = 0
strace: Process 8532 attached
[pid  8531] write(5, "\1\0\0\0\0\0\0\0", 8) = 8
[pid  8532] write(6, "0 1000 1\n", 9)   = 9
[pid  8532] write(6, "deny\n", 5)       = 5
[pid  8532] write(6, "0 1000 1\n", 9)   = 9
[pid  8532] write(2, "bwrap: ", 7)      = 7
[pid  8532] write(2, "Can't mount proc on /newroot/pro"..., 33) = 33
[pid  8532] write(2, ": Operation not permitted\n", 26) = 26
[pid  8532] +++ exited with 1 +++
[pid  8531] +++ exited with 1 +++
[pid  3947] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8531, si_uid=1000, si_status=1, si_utime=0, si_stime=0} ---

Please fix this.

from bubblewrap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.