Comments (19)
So, here is the issue. When we're using user namespaces we need to have the privileged parent set the uid mapping in the child user namespace, because there is not necessarily a 1-1 mapping. However, the child, as well as the parent are non-dumpable due to the initial setuid bit.
Non-dumpable processes have uid 0 owning /proc/pid/' files though, so the parent (running as the user uid) isn't allowed to set the uid map unless the child is made dumpable.
However, this is only needed in the case when we need user namespaces. If we're not, then being non-dumpable is fine. So, my solution is to only make the child non-dumpable in the case of using user namespaces.
But, i hear you, then we'll be able to ptrace the child in the case of using user namespaces, causing this bug to re-appear! That is not true though, because in a user namespace we lose all capabilities in the parent namespace, so we will not be able to do something affecting the host namespace anyway (i.e. sethostname gets EPERM unless you unshared UTS ns). In fact, there is not much we can do here, because any process in the parent user namespace (i.e. the regular host namespace) have all caps in the user namespace, including CAP_SYS_PTRACE, so we override the value of dumpable anyway.
from bubblewrap.
As a short-term fix for Debian, I've reverted #94 so there is no way to induce the privileged process to issue the sethostname()
syscall, but I suspect a better long-term answer is to isolate the use of PR_SET_DUMPABLE
to precisely where it is needed.
See also http://www.openwall.com/lists/oss-security/2016/10/13/4
For some reason it sets the PR_SET_DUMPABLE flag, as seen below. The comment about
it looks strange to me. If thats really true, suid programs shouldn't
be forced to play with the dumpable flag to achieve their goal.I assume the developers of Bubblewrap wouldn't have done this if the
kernel (or at least a kernel they care about) didn't require it.
But hopefully becoming dumpable can be restricted to smaller sections
of the code.If it's only for write_uid_gid_map(), then one way would be for that
function to fork(), with the child making itself dumpable and writing
the map files, and the parent just waiting for the child?
...
I believe the intention is that none of the operations that can pass over
the privilege separation socket are problematic, because they only affect
the sandboxed processes, and if you can ptrace bwrap then you can certainly
ptrace and subvert the sandboxed processes too. SETUP_SET_HOSTNAME clearly
doesn't match that intention, if Bubblewrap didn't unshare the UTS namespace
beforehand; Bubblewrap does insist that --hostname can't be used without
--unshare-uts, but you're right that a user outside the sandbox can ptrace
it and bypass that check by making it behave as though --hostname had been
used.As a quick temporary fix for Debian, I'm reverting the addition of
SETUP_SET_HOSTNAME.Am I right in thinking that this pseudocode would fix it while reinstating
the feature?case PRIV_SEP_OP_SET_HOSTNAME: if (!opt_unshare_uts) die_with_error ("Refusing to set hostname in original namespace"); else if (!sethostname (... as before)) die_with_error (... as before);
from bubblewrap.
The check described above should be fine for that exact exploit, but we should also try to figure out the minimal part that needs DUMPABLE.
from bubblewrap.
So, for the dumpable, if you remove the code that sets this, then you get:
bwrap --unshare-user --bind / / sh
setting up uid map: Permission denied
Because its trying to write to /proc/self/uid_map, which is root owned for a non-dumpable app.
from bubblewrap.
See #109
from bubblewrap.
Here is my approach instead:
#110
from bubblewrap.
Let's call #109 the "fsuid" patch and #110 the "move-dumpable" patch.
I'd argue that the fsuid patch is a lot shorter, and more obviously correct than move-dumpable. Why is the latter better?
from bubblewrap.
Because the fsuid patch gives somewhat increases privileges (fsuid 0) which are not needed.
from bubblewrap.
In particular it is doing file writes as uid 0.
from bubblewrap.
So the risk with fsuid is things like - what if somehow the pid got reused and we happened to write to the uid map for another pid? Another aspect is that suddenly we're trusting /proc
to be the real /proc
. But the latter is true of all setuid binaries really.
The risk with dumpable is - now all of a sudden we have to be concerned about the fact that someone could ptrace us. And yes, we are in a user namespace...but in fact, your argument:
In fact any process from the parent user
namespace have ptrace access anyway, due to parent ns having
CAP_SYS_PTRACE (and all other caps) in the child ns.
appears to me to be wrong, because the dumpable flag isn't set, and we need that and the capability to ptrace. If I add in to your patch:
diff --git a/bubblewrap.c b/bubblewrap.c
index 5c7999b..06bc029 100644
--- a/bubblewrap.c
+++ b/bubblewrap.c
@@ -1652,11 +1652,14 @@ main (int argc,
if (is_privileged && opt_unshare_user)
{
+ fprintf (stderr, "** BEFORE SETTING DUMPABLE; pid=%u **\n", getpid());
+ sleep (15);
/* We have to be dumpable for the parent to be able to set the
uid map for us. This enables ptracing for the child, but that
is not really a major*/
if (prctl (PR_SET_DUMPABLE, 1, 0, 0, 0) < 0)
die_with_error ("prctl(PR_SET_DUMPABLE) failed");
+ fprintf (stderr, "** AFTER SETTING DUMPABLE **\n");
/* Let parent continue to set the uid map */
val = 1;
And try to strace before setting dumpable, I get EPERM
as expected.
So the aspect of being in a user namespace obviously mitigates things a lot...but we need to consider risks such as having an open file descriptor outside of the namespace. I doubt we have anything open that's writable, but we'd have to be careful in the future.
If you're arguing that we can enable dumpable, it follows that everything else after that could be arbitrary user code, and we could do:
--- a/bubblewrap.c
+++ b/bubblewrap.c
@@ -1728,7 +1728,7 @@ main (int argc,
if (chdir ("/") != 0)
die_with_error ("chdir / (base path)");
- if (is_privileged)
+ if (is_privileged && !opt_unshare_user)
{
pid_t child;
int privsep_sockets[2];
Or actually, hypothetically:
@@ -1807,7 +1807,8 @@ main (int argc,
die_with_error ("chdir /");
/* Now we have everything we need CAP_SYS_ADMIN for, so drop it */
- drop_caps ();
+ if (opt_unshare_user && !opt_retain_caps)
+ drop_caps ();
if (opt_block_fd != -1)
{
Which I think is where some of the patches in #101 are going. Basically, in the userns path (whether privileged or not) we rely solely on the kernel for security post-CLONE_NEWUSER
.
from bubblewrap.
Here's a good way to compare the patches - on kernels vulnerable to https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-3135 (the one we cite in README.md
as an example userns vulnerability) - I'm pretty sure your patch exposes the vulnerability, since a caller could inject code to do the iptables syscalls, even though we don't expose that in the binary. But mine does not, because we drop CAP_NET_ADMIN
(really all caps) in the namespace before executing the user code.
Basically, your patch makes bubblewrap's security equivalent to that of the full kernel user namespace feature set, whereas mine keeps things restricted to what we expose on the bwrap command line.
(But on the other hand, exposing the full userns feature set is where #101 is going)
from bubblewrap.
Basically, your patch makes bubblewrap's security equivalent to that of the full kernel user namespace feature set, whereas mine keeps things restricted to what we expose on the bwrap command line.
...
(But on the other hand, exposing the full userns feature set is where #101 is going)
This sounds like it's heading towards a sysadmin configuration mechanism for "how much exposure to kernel userns bugs am I willing to tolerate?"...
on kernels vulnerable to https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-3135 (the one we cite in README.md as an example userns vulnerability) - I'm pretty sure your patch exposes the vulnerability, since a caller could inject code to do the iptables syscalls, even though we don't expose that in the binary
This sounds to me like a good argument for taking the fsuid approach, at least short-term.
from bubblewrap.
@cgwalters Your test case is wrong, because getpid() is cached so returns the old value
Try it again with this instead:
fprintf (stderr, "** BEFORE SETTING DUMPABLE; pid=%u real pid=%ld **\n", getpid(), syscall(SYS_getpid));
sleep (15);
from bubblewrap.
So, yeah, i get your point, but even with the fsuid approach we are ptraceable.
However, i mentioned this to @eparis and eric biederman, and he said:
ebiederm: alex, eparis That corner case where we let user namespace caps override dumpable looks buggy. I am not certain how that got in there.
So, maybe this will change over time.
from bubblewrap.
Looks like it changed in http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8409cca7056113bee3236cb6a8e4d8d4d1eef102
from bubblewrap.
Maybe the end result is that we just can't expose user namespaces securely (above full kernel access to user namespaces) at all?
from bubblewrap.
OK, let's just paper over this rabbit hole for now and try to replace it after the release with something stronger. I'm fine going with #110 - I'll close #109 .
from bubblewrap.
Closed via #110
from bubblewrap.
https://github.com/projectatomic/bubblewrap/releases/tag/v0.1.3
from bubblewrap.
Related Issues (20)
- does bubblewrap blocks syscall utimensat ? HOT 2
- bwrap: Can't find source path /root/.cache/at-spi: Permission denied HOT 6
- bwrap with --unshare-pid runs twice and leaves a zombie process when ran inside a docker container HOT 4
- Directory at /proc/{PID}/root doesn't match root of the sandbox HOT 2
- [How-to] Handle 'chroot' system calls as an unprivileged user HOT 2
- Binding of joystick inside bubblewrap HOT 2
- bubblewrap should fall back to MS_MOVE if pivot_root() fails HOT 3
- What is a proper way to have a regular user with sudo and root in container? HOT 3
- "pivot_root: Invalid argument" when running on a SLURM cluster node from NFS HOT 12
- Overlayfs masking/whiteout layer
- Bubblewrap trying to access `/proc/sys/kernel/overflowuid` HOT 1
- Assessment of the difficulty in porting CPU architecture for bubblewrap HOT 1
- Best practices for running games on Linux with Nvidia HOT 6
- Fails to build with meson 1.3.0 rc1 due to broken bash-completion handling HOT 7
- Please specify the license in Github HOT 1
- [Question] How does bwrap handle nested bindings? HOT 3
- enhancement: --daemonize-with-child option
- not immediately obvious that `--file` can overwrite a file mounted rw from outside the container HOT 4
- bwrap processes not exiting cleanly under Linux 6.8 (likely kernel regression) HOT 24
- Is there like a native C Library?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bubblewrap.