Code Monkey home page Code Monkey logo

selinuxproject / selinux-kernel Goto Github PK

View Code? Open in Web Editor NEW
148.0 29.0 56.0 2.57 GB

GitHub mirror of the SELinux kernel repository

Home Page: https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git

License: Other

Makefile 0.20% C 98.33% Assembly 0.73% C++ 0.01% Shell 0.37% Perl 0.10% Awk 0.01% Python 0.21% Yacc 0.01% Lex 0.01% UnrealScript 0.01% Gherkin 0.01% XS 0.01% Roff 0.01% Clojure 0.01% M4 0.01% sed 0.01% SmPL 0.01% Raku 0.01% MATLAB 0.01%

selinux-kernel's Introduction

SELinux Kernel Subsystem

https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux.git
https://github.com/SELinuxProject/selinux-kernel

SELinux is a security enhancement to Linux which provides users and administrators a more granular, powerful access control mechanism. It consists of a kernel component which enforces the security policy, a set of userspace tools to manage and manipulate SELinux security policies, and the SELinux security policies themselves.

The main Linux Kernel README can be found at Documentation/admin-guide/README.rst

Online Resources

The canonical SELinux kernel repository is hosted by kernel.org:

There is also an officially maintained GitHub mirror:

Kernel Source Branches and Development Process

Kernel Source Branches

There are four primary git branches associated with the development process: stable-X.Y, dev, dev-staging, and next. In addition to these four primary branches there are also topic specific, work in progress branches that start with a "working-" prefix; these branches can generally be ignored unless you happen to be involved in the development of that particular topic. The management of these topic branches can vary depending on a number of factors, but the details of each branch will be communicated in the relevant discussion threads on the upstream mailing list.

stable-X.Y branch

The stable-X.Y branch is intended for stable kernel patches and is based on Linus' X.Y-rc1 tag, or a later X.Y.Z stable kernel release tag as needed. If serious problems are identified and a patch is developed during the kernel's release candidate cycle, it may be a candidate for stable kernel marking and inclusion into the stable-X.Y branch. The main Linux kernel's documentation on stable kernel patches has more information both on what patches may be stable kernel candidates, and how to mark those patches appropriately; upstream mailing list discussions on the merits of marking the patch for stable can also be expected. Once a patch has been merged into the stable-X.Y branch and spent a day or two in the next branch (see the next branch notes), it will be sent to Linus for merging into the next release candidate or final kernel release (see the notes on pull requests in this document). If the patch has been properly marked for stable, the other stable kernel trees will attempt to backport the patch as soon as it is present in Linus' tree, see the main Linux kernel documentation for more details.

Unless specifically requested, developers should not base their patches on the stable-X.Y branch. Any merge conflicts that arise from merging patches submitted upstream will be handled by the maintainer, although help and/or may be requested in extreme cases.

dev branch

The dev branch is intended for development patches targeting the upcoming merge window, and is based on Linus' latest X.Y-rc1 tag, or a later rc tag as needed to avoid serious bugs, merge conflicts, or other significant problems. This branch is the primary development branch where the majority of patches are merged during the normal kernel development cycle. Patches merged into the dev branch will be present in the next branch (see the next branch notes) and will be sent to Linus during the next merge window.

Developers should use the dev branch a stable basis for their own development work, only under extreme circumstances will the dev branch be rebased during the X.Y-rc cycle and the maintainer will be responsible for resolving any merge conflicts, although help and/or may be requested in extreme cases.

dev-staging branch

The dev-staging branch is intended for development patches that are not targeting a specific merge window. The dev-staging branch exists as a staging area for the main dev branch and as such its use will be unpredictable and it will be rebased as needed. Patches merged into the dev-staging branch should find their way into the primary dev branch at some point in the future, although that is not guaranteed.

Unless specifically requested, developers should not use the dev-staging branch as a basis for any development work.

next branch

The next branch is a composite branch built by merging the latest stable-X.Y and dev branches in that order. The main focus of the next branch is to provide a single branch for linux-next integration testing that contains all of the commits from the component branches. The next branch will be updated whenever there is a change to any one of the component branches, but it will remain frozen during the merge window so as to cooperate with the wishes of the linux-next team.

While developers can use the next branch as a basis for development, the dev branch would likely be a more suitable, and stable, base.

Kernel Development Process

After Linus closes the kernel merge window closes upstream, the stable-X.Y branch associated with the current kernel release candidate, the dev branch, and potentially the dev-staging branch (see the dev-staging branch notes) will be reset to match the latest vX.Y-rc1 tag in Linus' tree. The next branch, as a composite branch composed from these branches, will be updated as a result.

During the development cycle that starts with the close of the kernel merge window and ends with the tagged kernel release, patches will be accepted into the stable-X.Y and dev branches as described in their respective sections in this document. While patches will be accepted into the stable-X.Y branch at any point in time, significant changes will likely not be accepted into the dev branch when there are two or less weeks left in the development cycle; this typically means that only critical bugfixes are accepted once the vX.Y-rc6 kernel is released. During this time the next branch will be regenerated on an as needed basis based on changes in the component branches, and pull requests will be sent as needed to Linus for patches in the stable-X.Y branch.

Once Linus releases the final vX.Y kernel and the merge window opens, two things will happen. The first is that the dev branch will be duplicated into a new stable-X'.Y' branch, representing the new upcoming kernel release, and the second is that a pull request will be sent from this branch for inclusion into the current merge window. During the merge window process the dev and next branches should be frozen, although there is a possibility that some patches may be merged merged into dev-staging for testing or process related reasons.

Pull Requests for Linus

In order to send a pull request to Linus, either for a critical bugfix or as part of the merge window, a signed git tag must be created that points to the pull request point. The tag should be named using the "{subsystem}-pr-{date}" format and can be generated with the following git command:

% git tag -s -m "{subsystem}/stable-X'.Y' PR {date}" {subsystem}-pr-{date}

Once the signed tag has been created, it should be used as the basis for the pull request.

Reference Policy, Userspace Tools, and Test Suites

The SELinux reference policy, userspace tools, and test suites are hosted by GitHub:

selinux-kernel's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

selinux-kernel's Issues

RFE: Initial SIDs cannot be added or deleted without breaking compatibility

We need dynamic discovery of initial SIDs to fix this problem. Similar to dynamic discovery of classes/perms, map kernel initial SIDs to policy initial SIDs by string name rather than requiring identical index values, handle unknown initial SIDs cleanly (map to unlabeled), and allow future extensibility without causing problems (start regular SIDs at some fixed offset, e.g. 100, or start from the highest legal value and decrement, so that policy reload that changes the number of initial SIDs won't affect them). Even with this, we'll be limited by compatibility for a while until kernels without this feature are so old they no longer matter, but otherwise we'll never be free of it.

BUG: selinuxfs class directory not updated atomically on reload

As reported in https://lore.kernel.org/selinux/[email protected]/, selinuxfs does not atomically update its class subdirectory upon a policy reload, thereby creating a window during which userspace lookups of classes/permissions will fail. This can break userspace object managers like systemd or dbusd especially after more recent userspace changes to flush the class/perm cache upon a policy load notification from the kernel. If handle_unknown=deny, this can yield extraneous denials during the race window. Instead of deleting the old class subdirectory and then creating the new one in place, selinuxfs should create an unattached class directory tree from the new policy and then atomically exchange the old and new directories (ala RENAME_EXCHANGE). This is part of a broader set of issues around policy reload.

BUG: fsconfig(2) does not work with SELinux context options on btrfs

As reported in https://lore.kernel.org/selinux/[email protected]/ and https://lore.kernel.org/selinux/[email protected]/, the new fsconfig(2) system call will fail if one attempts to specify one or more of the SELinux context mount option on a nfs or btrfs mount. Both nfs and btrfs make their own calls to the security_sb_set_mnt_opts() LSM hook in addition to the call made from the vfs (fs/super.c:vfs_get_tree). As a result, multiple calls are made to the hook with different options (NULL versus non-NULL), which triggers an error from SELinux due to inconsistent application of security labeling options to a single superblock. It is not clear how to fix this in a manner that still preserves normal nfs and btrfs option handling (aside from dropping the SELinux checks and letting last-call-to-hook "win"). No response from the vfs developers to date. Adding this issue to keep track of this bug until such a time as it gets fixed.

RFE: dac_override false positives and inadequate audit info

At present, CAP_DAC_OVERRIDE is checked by the kernel first even if only read/search access is requested, and then CAP_DAC_READ_SEARCH is checked if CAP_DAC_OVERRIDE is not allowed. This causes SELinux to audit dac_override denials in many cases where dac_override is not truly required, which leads to overly liberal policy. Also, since capable() does not provide the inode information, dac_override and dac_read_search denials do not provide information about the relevant path unless system call auditing is enabled and at least one syscall audit filter is defined. However, the kernel now calls capable_wrt_inode_uidgid() for these checks, so we could pass down the inode to the security hook in those cases and allow auditing of the file with the avc denial itself.

RFE: Split open permission

Split the open permission into open_read and open_write so that we can better distinguish them in policy. Presently we rely upon the fact that we already check read and write permissions in addition to open; however, this is not sufficient because we sometimes have to allow read or write permission for a descriptor inherited across execve or received over IPC, but still do not want to allow direct open(2) with those permissions.

RFE: add netmasks to the SELinux network node cache

Currently the SELinux network node cache doesn't factor in the address mask provided with the policy, it maintains a cache entry for each IP. Expose the address mask via security_node_sid() and use it to increase the efficiency of the network node cache.

RFE: enable changing the number of AVC hash buckets at runtime

At present the number of AVC hash buckets is hard coded to 512, we should look into making this tunable at runtime. While 512 buckets tends to work well for most workloads, it is proving to be too small for systems with a large number of unique labels such as container hosts using MCS/sVirt.

BUG: False positives on CAP_WAKE_ALARM

The kernel checks CAP_WAKE_ALARM before testing whether it is truly needed (i.e. for CLOCK_REALTIME_ALARM and CLOCK_BOOTTIME_ALARM) in timerfd_create() and do_timerfd_settime(). This generates avc denials of wake_alarm permission when it is not truly required, which in turn will lead to either unnecessarily permissive policy (allowing it) or pervasive dontaudits. Should flip the order of the tests in those conditionals so we only perform capable(CAP_WAKE_ALARM) when needed. That's more efficient too in the common case.

RFE: always return a value from the netport/netnode/netif caches

Currently under memory pressure the netport/netnode/netif caches can fail to return a SID to the caller, thereby causing an operation to fail. However, the cache can always return the value obtained from the security server, even if it cannot allocate a cache node to save it for future lookups (just as the AVC does). Fix the caches to do so.

BUG: selinux-testsuite failes on binder tests in v5.1-rc1

When running the selinux-testsuite, the binder tests cause a kernel panic/BUG which causes the test to block.

The test output:

Running as user root with context unconfined_u:unconfined_r:unconfined_t

domain_trans/test ........... ok   
...
netlink_socket/test ......... ok   
prlimit/test ................ ok   
binder/test ................. 1/6
<test hang> 

The relevant console output:

[  823.210062] binder: release 3645:3645 transaction 2 out, still active
[  823.214047] binder: 3644:3644 transaction failed 29189/0, size 24-8 line 2926
[  823.218009] binder: send failed reply for transaction 2, target dead
[  823.221329] binder: 3646:3646 transaction failed 29201/-1, size 24-8 line 3002
[  823.232432] ------------[ cut here ]------------
[  823.234746] kernel BUG at drivers/android/binder_alloc.c:1141!
[  823.237447] invalid opcode: 0000 [#1] SMP PTI
[  823.239421] CPU: 1 PID: 3644 Comm: test_binder Not tainted 5.1.0-0.rc1.git0.1.2.secnext.fc31.x86_64 #1
[  823.243538] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  823.246079] RIP: 0010:binder_alloc_do_buffer_copy+0x34/0x210
[  823.248613] Code: 0a 41 55 49 89 fb 41 54 41 89 f4 48 8d 77 38 48 8b 42 58 55 53 48 39 f1 0f 84 17 01 00 00 48 8b 49 58 48 29 c1 49 39 c9 76 02 <0f> 0b 4c 29 c9 49 39 ca 77 f6 41 f6 c2 03 75 f0 0f b6 4a 28 f6 c1
[  823.256404] RSP: 0018:ffffb04e41093b68 EFLAGS: 00010202
[  823.258513] RAX: 00007fb600c52000 RBX: a0d48e24a0213e28 RCX: 0000000000000020
[  823.261375] RDX: ffff9c09b058a9c0 RSI: ffff9c09189165b0 RDI: ffff9c0918916578
[  823.264225] RBP: ffff9c09b058a9c0 R08: ffffb04e41093c80 R09: 0000000000000028
[  823.267044] R10: a0d48e24a0213e28 R11: ffff9c0918916578 R12: 0000000000000000
[  823.269758] R13: ffff9c09b67c9660 R14: ffff9c09b116fb40 R15: ffffffff8acd4d08
[  823.272482] FS:  00007fbeb3438800(0000) GS:ffff9c09b7a80000(0000) knlGS:0000000000000000
[  823.275595] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  823.277676] CR2: 000055b102d31cc9 CR3: 0000000234648000 CR4: 00000000001406e0
[  823.280347] Call Trace:
[  823.281287]  binder_get_object+0x60/0xf0
[  823.282728]  binder_transaction+0xc2e/0x2370
[  823.284268]  ? __check_object_size+0x41/0x15d
[  823.285849]  ? binder_thread_read+0x9e2/0x1460
[  823.287342]  ? binder_update_ref_for_handle+0x83/0x1a0
[  823.289066]  binder_thread_write+0x2ae/0xfc0
[  823.290513]  ? finish_wait+0x80/0x80
[  823.291729]  binder_ioctl+0x659/0x836
[  823.292980]  do_vfs_ioctl+0x40a/0x670
[  823.294234]  ksys_ioctl+0x5e/0x90
[  823.295364]  __x64_sys_ioctl+0x16/0x20
[  823.296609]  do_syscall_64+0x5b/0x150
[  823.297796]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  823.299423] RIP: 0033:0x7fbeb35e782b
[  823.300580] Code: 0f 1e fa 48 8b 05 5d 96 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d 96 0c 00 f7 d8 64 89 01 48
[  823.306473] RSP: 002b:00007ffdfae2f198 EFLAGS: 00000287 ORIG_RAX: 0000000000000010
[  823.308868] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbeb35e782b
[  823.311029] RDX: 00007ffdfae2f1b0 RSI: 00000000c0306201 RDI: 0000000000000003
[  823.313206] RBP: 00007ffdfae30210 R08: 00000000010fa330 R09: 0000000000000000
[  823.315379] R10: 0000000000400644 R11: 0000000000000287 R12: 0000000000401190
[  823.317459] R13: 00007ffdfae304c0 R14: 0000000000000000 R15: 0000000000000000
[  823.319510] Modules linked in: crypto_user nfnetlink xt_multiport bluetooth ecdh_generic rfkill sctp overlay ip6table_security xt_CONNSECMARK xt_SECMARK xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_security ah6 xfrm6_mode_transport ah4 xfrm4_mode_transport ip6table_mangle ip6table_filter ip6_tables iptable_mangle xt_mark xt_AUDIT ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser ib_umad ib_ipoib rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_uverbs ib_core sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon i2c_piix4 drm_kms_helper virtio_net net_failover failover ttm drm mlx5_core crc32c_intel virtio_blk ata_generic virtio_console mlxfw serio_raw pata_acpi qemu_fw_cfg [last unloaded: arp_tables]
[  823.339786] ---[ end trace 6f761f654b297775 ]---

The related code in Linus' tree (it's the BUG_ON(...) at the top):

static void binder_alloc_do_buffer_copy(struct binder_alloc *alloc,
                                        bool to_buffer,
                                        struct binder_buffer *buffer,
                                        binder_size_t buffer_offset,
                                        void *ptr,
                                        size_t bytes)
{
        /* All copies must be 32-bit aligned and 32-bit size */
        BUG_ON(!check_buffer(alloc, buffer, buffer_offset, bytes));

        while (bytes) {
                unsigned long size;
                struct page *page;
                pgoff_t pgoff;
                void *tmpptr;
                void *base_ptr;

                page = binder_alloc_get_page(alloc, buffer,
                                             buffer_offset, &pgoff);
                size = min_t(size_t, bytes, PAGE_SIZE - pgoff);
                base_ptr = kmap_atomic(page);
                tmpptr = base_ptr + pgoff;
                if (to_buffer)
                        memcpy(tmpptr, ptr, size);
                else
                        memcpy(ptr, tmpptr, size);
                /*
                 * kunmap_atomic() takes care of flushing the cache
                 * if this device has VIVT cache arch
                 */
                kunmap_atomic(base_ptr);
                bytes -= size;
                pgoff = 0;
                ptr = ptr + size;
                buffer_offset += size;
        }
}

RFE: SELinux should not contain hardcoded tests of filesystem type names

Generalize the current hardcoded tests of specific filesystem type names used to determine whether to support setting per-file security contexts via setxattr on a genfscon-labeled filesystem and whether to initially label the files from policy based on pathname from the root of the filesystem. The former is only safe if the filesystem either implements its own setxattr handler for security labels or the filesystem pins its inodes in memory, as otherwise the label may not be preserved for the lifetime of the file. The latter is only safe if the filesystem does not permit userspace to modify the directory tree (i.e. no .create/.link/.rename methods or filesystem is not mountable by userspace), as otherwise userspace can potentially cause files to move in and out of a given label or to be accessible under different labels depending on which path is first looked up. We currently permit the former for sysfs (implements its own handler that saves/restores the value when the inode is evicted and later re-created from a backing data structure), and for pstore, debugfs, and rootfs (all of which pin their inodes in memory). We currently permit the latter for debugfs, sysfs, and pstore, as the first two do not permit any userspace manipulation of directories and the latter only permits unlink, which causes no issues by itself. We either need some way to detect which filesystems are safe to use in the kernel or specify the whitelists of filesystem type names in the policy.

BUG: Normalize input to /sys/fs/selinux/enforce

At present, one can write any signed integer value to /sys/fs/selinux/enforce and it will be stored, e.g. echo -1 > /sys/fs/selinux/enforce or echo 1999 > /sys/fs/selinux/enforce. This makes no real difference to the kernel, since it only ever cares if it is zero or non-zero, but some userspace code compares it with 1 to decide if SELinux is enforcing, and this could confuse it. Only a process that is already root and is allowed the setenforce permission in SELinux policy can write to /sys/fs/selinux/enforce, so this is not considered to be a security issue, but it should be fixed.

BUG: sel_write_load error handling/logging

As discussed on the list, sel_write_load() can fail after successfully loading the new policy when re-creating the selinuxfs boolean, class/perm, or policy capability files, leaving the system in a broken state, and does not even log those failures to help with diagnosing/resolving them. Optimally, we'd make the policy load and selinuxfs file regeneration atomic but that's a stretch goal. At the least, we ought to be logging a error message when we have a failure in that code, and possibly the code could make certain errors non-fatal (for example, if security_genfs_sid() fails, we could just assign SECINITSID_SECURITY to the inode as the default rather than failing altogether).

RFE: Add SCM_SECURITY support to IPv6

As reported by Richard Haines, IPv6 stream sockets support SO_PEERCON, but IPv6 datagram sockets do not currently support SCM_SECURITY, unlike IPv4 datagram sockets. For IPv4, the support is implemented in net/ipv4/ip_sockglue.c:ip_cmsg_rcv_security(). We would need to implement similar support in the ipv6 code.

RFE: add genfscon support for regex paths

Currently one could not further restrict the access to the kernel pseudo filesystem sysfs.
Paths like /sys/bus/usb/devices/ or /sys/class/net/eth0 could be labeled, but these files are symlinks to hardware dependent files, e.g. /sys/class/net/eth0 -> ../../devices/pci0000:00/0000:00:1c.5/0000:05:00.0/net/eth0 or /sys/bus/usb/devices/usb1 -> ../../../devices/pci0000:00/0000:00:1a.0/usb1.
If genfscon would support regular expressions in the path argument one could label these files:

genfscon sysfs /devices/(.*/)+usb[0-9]* gen_context(system_u:object_r:sysfs_usb_t,s0)
genfscon sysfs /devices/(.*/)+net gen_context(system_u:object_r:sysfs_net_t,s0)

BUG: selinuxfs removes its directory entries in an unsafe way, causing soft lockups when tasks read selinuxfs directories

Letting the following set of commands run long enough on a multi-core machine causes soft lockups in the kernel:

(cd /sys/fs/selinux/; while true; do find >/dev/null 2>&1; done) &
(cd /sys/fs/selinux/; while true; do find >/dev/null 2>&1; done) &
(cd /sys/fs/selinux/; while true; do find >/dev/null 2>&1; done) &

while true; do load_policy; echo -n .; sleep 0.1; done

This happens on the upstream kernel, as well as on selinux/next.

The problem appears to be that sel_remove_entries calls d_genocide to remove the whole contents of certain subdirectories in /sys/fs/selinux/. This function is apparently only intended for removing filesystem entries that are no longer accessible to userspace, because it doesn't follow the rule that any code removing entries from a directory must hold the lock on the directory's inode RW semaphore (formerly this was a mutex, see 9902af7).

Note that before commit ad52184, SELinux used its own open-coded functions for removing entries, but these also had the same bug (they were not locking the directory inodes).

I think the best way to fix this will be to open code sel_remove_entries to remove the entries in a proper and robust way (similar to how it was done before ad52184 but with locking of the parent inode before removal). Either way. it looks like a really bad idea to call d_genocide on a tree that is mounted in userspace (no parent inode locks, no fsnotify events, ...). Based on its usage it looks like an internal function that is not at all designed for this pupose:
https://elixir.bootlin.com/linux/latest/ident/d_genocide

I have a patch prepared that seems to work and passes the above stress test. Let me polish it up and I'll post it to the list for review.

RFE: Ensure that SELinux is kept in sync with new capability definitions

SELinux tends to get belatedly updated when new capabilities are added to the kernel.
This can yield difficult scenarios where we do not get useful audit messages (because there is no string representation for the capability bit in include/classmap.h) and we may not even be able to allow the capability in policy (because there is nothing to which we can map it, even * won't always work in this scenario due to dynamic class/perm discovery and mapping).
Make it a build error to add a capability without updating SELinux, just as has been done in recent times for adding new netlink RTM_ or XFRM_MSG values via BUILD_BUG_ON() calls.
This should be possible just by adding a BUILD_BUG_ON(CAP_LAST_CAP > CAP_AUDIT_READ); statement along with appropriate information about how to update classmap.h.
We already have a guard against adding more than 64 capabilities (see the #if CAP_LAST_CAP > 63 in hooks.c, along with the default case in the switch statement) since there we have to define a new security class, but we do not presently catch when adding new capabilities within the current 64 bits.

BUG: selinux-testsuite overlayfs failures on v4.19-rc1

Running the selinux-testsuite results in the following failure:

# uname -r
4.19.0-0.rc1.git0.1.1.secnext.fc30.x86_64
# make test
{ ...snip... }
Running as user root with context unconfined_u:unconfined_r:unconfined_t

domain_trans/test ........... ok   
entrypoint/test ............. ok   
execshare/test .............. ok   
exectrace/test .............. ok   
execute_no_trans/test ....... ok   
fdreceive/test .............. ok   
inherit/test ................ ok   
link/test ................... ok   
mkdir/test .................. ok   
msg/test .................... ok     
open/test ................... ok   
ptrace/test ................. ok   
readlink/test ............... ok   
relabel/test ................ ok   
rename/test ................. ok   
rxdir/test .................. ok   
sem/test .................... ok     
setattr/test ................ ok   
setnice/test ................ ok   
shm/test .................... ok     
sigkill/test ................ ok     
stat/test ................... ok   
sysctl/test ................. ok   
task_create/test ............ ok   
task_setnice/test ........... ok   
task_setscheduler/test ...... ok   
task_getscheduler/test ...... ok   
task_getsid/test ............ ok   
task_getpgid/test ........... ok   
task_setpgid/test ........... ok   
file/test ................... ok     
ioctl/test .................. ok   
capable_file/test ........... ok     
capable_net/test ............ ok   
capable_sys/test ............ ok   
dyntrans/test ............... ok   
dyntrace/test ............... ok   
bounds/test ................. ok     
nnp_nosuid/test ............. ok     
mmap/test ................... ok     
unix_socket/test ............ ok   
inet_socket/test ............ ok     
overlay/test ................ 63/121 
#   Failed test at overlay/test line 592.
# Looks like you failed 1 test of 121.
overlay/test ................ Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/121 subtests 
checkreqprot/test ........... ok   
mqueue/test ................. ok     
mac_admin/test .............. ok   
atsecure/test ............... ok   
cap_userns/test ............. ok   
extended_socket_class/test .. ok     
sctp/test ................... ok     
netlink_socket/test ......... ok   
prlimit/test ................ ok   
binder/test ................. ok   
infiniband_endport/test ..... ok   
infiniband_pkey/test ........ ok   

Test Summary Report
-------------------
overlay/test              (Wstat: 256 Tests: 121 Failed: 1)
  Failed test:  111
  Non-zero exit status: 1
Files=55, Tests=628, 140 wallclock secs ( 0.30 usr  0.09 sys + 12.64 cusr 13.57 csys = 26.60 CPU)
Result: FAIL
Failed 1/55 test programs. 1/628 subtests failed.

BUG: overlay test failures on v4.20-rc1

From the selinux-testsuite:

Running as user root with context unconfined_u:unconfined_r:unconfined_t

domain_trans/test ........... ok   
entrypoint/test ............. ok   
execshare/test .............. ok   
exectrace/test .............. ok   
execute_no_trans/test ....... ok   
fdreceive/test .............. ok   
inherit/test ................ ok   
link/test ................... ok   
mkdir/test .................. ok   
msg/test .................... ok     
open/test ................... ok   
ptrace/test ................. ok   
readlink/test ............... ok   
relabel/test ................ ok   
rename/test ................. ok   
rxdir/test .................. ok   
sem/test .................... ok     
setattr/test ................ ok   
setnice/test ................ ok   
shm/test .................... ok     
sigkill/test ................ ok     
stat/test ................... ok   
sysctl/test ................. ok   
task_create/test ............ ok   
task_setnice/test ........... ok   
task_setscheduler/test ...... ok   
task_getscheduler/test ...... ok   
task_getsid/test ............ ok   
task_getpgid/test ........... ok   
task_setpgid/test ........... ok   
file/test ................... ok     
ioctl/test .................. ok   
capable_file/test ........... ok     
capable_net/test ............ ok   
capable_sys/test ............ ok   
dyntrans/test ............... ok   
dyntrace/test ............... ok   
bounds/test ................. ok     
nnp_nosuid/test ............. ok     
mmap/test ................... ok     
unix_socket/test ............ ok   
inet_socket/test ............ ok     
overlay/test ................ 76/119 
#   Failed test at overlay/test line 275.

#   Failed test at overlay/test line 293.

#   Failed test at overlay/test line 547.

#   Failed test at overlay/test line 622.
# Looks like you failed 4 tests of 119.
overlay/test ................ Dubious, test returned 4 (wstat 1024, 0x400)
Failed 4/119 subtests 
checkreqprot/test ........... ok   
mqueue/test ................. ok     
mac_admin/test .............. ok   
atsecure/test ............... ok   
cap_userns/test ............. ok   
extended_socket_class/test .. ok     
sctp/test ................... ok     
netlink_socket/test ......... ok   
prlimit/test ................ ok   
binder/test ................. ok   
infiniband_endport/test ..... ok   
infiniband_pkey/test ........ ok   

Test Summary Report
-------------------
overlay/test              (Wstat: 1024 Tests: 119 Failed: 4)
  Failed tests:  81, 83, 107, 112
  Non-zero exit status: 4
Files=55, Tests=626, 139 wallclock secs ( 0.30 usr  0.08 sys + 12.89 cusr 13.41 csys = 26.68 CPU)
Result: FAIL
Failed 1/55 test programs. 4/626 subtests failed.

RFE: display bad/deferred file labels in AVC audit records

If a file's on-disk SELinux label can not be represented it is mapped to the unlabeled initial SID which generally causes a access denials due to policy prohibiting access to unlabeled resources. When this happens, add the on-disk SELinux label to the AVC audit records to help diagnose the problem.

RFE: Extend SELinux /proc/pid labeling

Extend SELinux /proc/pid labeling support to support derived types on specific /proc/pid files based on both the associated task context and the file name, e.g. name-based type transitions. This would allow applying different restrictions to different /proc/pid files of the same process via SELinux.

BUG: calipso_req_setattr() calls into _copy_from_user()

While running tests with the selinux-testsuite, a kernel WARNING was uncovered with the following backtrace:

[ 3050.288154] WARNING: CPU: 1 PID: 3144 at lib/usercopy.c:11 _copy_from_user+0x85/0x90
[ 3050.294409] Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_security ip6_tables xt_CONNSECMARK xt_SECMARK nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c iptable_security ah6 xfrm6_mode_transport ah4 xfrm4_mode_transport ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_umad rpcrdma rdma_ucm ib_iser ib_ipoib rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_uverbs ib_core sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon i2c_piix4 crc32c_intel mlx5_core drm_kms_helper mlxfw ttm serio_raw devlink drm virtio_net virtio_console virtio_blk net_failover failover qemu_fw_cfg ata_generic pata_acpi
[ 3050.321805] CPU: 1 PID: 3144 Comm: client Not tainted 4.18.0-0.rc1.git1.1.1.secnext.fc29.x86_64 #1
[ 3050.325220] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 3050.327076] RIP: 0010:_copy_from_user+0x85/0x90
[ 3050.328569] Code: de 89 ea e8 4d 19 52 00 89 c0 eb e5 48 29 c5 49 01 ec 48 89 c5 48 89 ea 4c 89 e7 31 f6 e8 c3 3e 52 00 48 89 e8 5b 5d 41 5c c3 <0f> 0b eb a3 0f 1f 80 00 00 00 00 41 54 49 89 f4 be 19 00 00 00 55 
[ 3050.334450] RSP: 0018:ffff89603b4038b8 EFLAGS: 00010206
[ 3050.336047] RAX: 0000000080000305 RBX: ffff89602288b000 RCX: ffff895f00000000
[ 3050.338248] RDX: 0000000000000010 RSI: 000000000000000a RDI: ffffffff9334ead5
[ 3050.340357] RBP: 0000000000000010 R08: 0000000000000010 R09: 0000000000000060
[ 3050.342421] R10: 0000000000000040 R11: 0000000000000040 R12: ffff895f42e37160
[ 3050.344493] R13: ffff895f42e37130 R14: ffff89603b403928 R15: 0000000000000000
[ 3050.346557] FS:  00007fab6bde0f80(0000) GS:ffff89603b400000(0000) knlGS:0000000000000000
[ 3050.348943] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3050.350540] CR2: 00007fab6b6f7ae0 CR3: 0000000057750000 CR4: 00000000001406e0
[ 3050.352583] Call Trace:
[ 3050.353288]  <IRQ>
[ 3050.353886]  ipv6_renew_option+0xb2/0xf0
[ 3050.354973]  ipv6_renew_options+0x26a/0x340
[ 3050.356141]  ipv6_renew_options_kern+0x2c/0x40
[ 3050.357404]  calipso_req_setattr+0x72/0xe0
[ 3050.358553]  netlbl_req_setattr+0x126/0x1b0
[ 3050.359710]  selinux_netlbl_inet_conn_request+0x80/0x100
[ 3050.361105]  selinux_inet_conn_request+0x6d/0xb0
[ 3050.362337]  security_inet_conn_request+0x32/0x50
[ 3050.363588]  tcp_conn_request+0x35f/0xe00
[ 3050.364655]  ? __lock_acquire+0x250/0x16c0
[ 3050.365764]  ? selinux_socket_sock_rcv_skb+0x1ae/0x210
[ 3050.367149]  ? tcp_rcv_state_process+0x289/0x106b
[ 3050.368400]  tcp_rcv_state_process+0x289/0x106b
[ 3050.369594]  ? tcp_v6_do_rcv+0x1a7/0x3c0
[ 3050.370590]  tcp_v6_do_rcv+0x1a7/0x3c0
[ 3050.371543]  tcp_v6_rcv+0xc82/0xcf0
[ 3050.372445]  ip6_input_finish+0x10d/0x690
[ 3050.373462]  ip6_input+0x45/0x1e0
[ 3050.374334]  ? ip6_rcv_finish+0x1d0/0x1d0
[ 3050.375352]  ipv6_rcv+0x32b/0x880
[ 3050.376206]  ? ip6_make_skb+0x1e0/0x1e0
[ 3050.377191]  __netif_receive_skb_core+0x6f2/0xdf0
[ 3050.378401]  ? process_backlog+0x85/0x250
[ 3050.379409]  ? process_backlog+0x85/0x250
[ 3050.380388]  ? process_backlog+0xec/0x250
[ 3050.381371]  process_backlog+0xec/0x250
[ 3050.382390]  net_rx_action+0x153/0x480
[ 3050.383304]  __do_softirq+0xd9/0x4f7
[ 3050.384186]  do_softirq_own_stack+0x2a/0x40
[ 3050.385203]  </IRQ>
[ 3050.385736]  ? ip6_finish_output2+0x267/0x990
[ 3050.386792]  do_softirq.part.12+0x68/0x70
[ 3050.387783]  __local_bh_enable_ip+0xce/0xe0
[ 3050.388804]  ip6_finish_output2+0x290/0x990
[ 3050.389789]  ? __lock_is_held+0x5a/0xa0
[ 3050.390685]  ? ip6_output+0x7a/0x2b0
[ 3050.391506]  ip6_output+0x7a/0x2b0
[ 3050.392299]  ? ip6_fragment+0xb30/0xb30
[ 3050.393184]  ip6_xmit+0x2ec/0x860
[ 3050.393956]  ? ip6_append_data+0x150/0x150
[ 3050.394906]  ? inet6_csk_xmit+0x67/0x230
[ 3050.395792]  ? __lock_is_held+0x5a/0xa0
[ 3050.396691]  inet6_csk_xmit+0x10b/0x230
[ 3050.397586]  tcp_transmit_skb+0x4fd/0xb30
[ 3050.398512]  tcp_connect+0xcad/0x1080
[ 3050.399363]  tcp_v6_connect+0x65d/0x950
[ 3050.400228]  ? __inet_stream_connect+0xd1/0x370
[ 3050.401230]  ? tcp_v6_pre_connect+0x70/0x70
[ 3050.402159]  __inet_stream_connect+0xd1/0x370
[ 3050.403112]  ? mark_held_locks+0x57/0x80
[ 3050.403989]  ? __local_bh_enable_ip+0x80/0xe0
[ 3050.404948]  inet_stream_connect+0x36/0x50
[ 3050.405851]  __sys_connect+0xd3/0x100
[ 3050.406665]  ? trace_hardirqs_on_caller+0xed/0x180
[ 3050.407723]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 3050.408750]  __x64_sys_connect+0x16/0x20
[ 3050.409625]  do_syscall_64+0x60/0x1f0
[ 3050.410400]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3050.411465] RIP: 0033:0x7fab6b6da304
[ 3050.412292] Code: 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 01 44 2c 00 8b 00 85 c0 75 13 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 41 89 d4 55 48 89 f5 53 
[ 3050.416293] RSP: 002b:00007ffe529be1a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
[ 3050.417875] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fab6b6da304
[ 3050.419351] RDX: 000000000000001c RSI: 0000000000af72b0 RDI: 0000000000000003
[ 3050.420776] RBP: 00007ffe529be340 R08: 0000000000000010 R09: 0000000000000100
[ 3050.422201] R10: fffffffffffff3ad R11: 0000000000000246 R12: 0000000000400a10
[ 3050.423643] R13: 00007ffe529be420 R14: 0000000000000000 R15: 0000000000000000

... the issue would appear that calipso_req_setattr() ends up calling a function which assumes the IPv6 option data is coming from userspace, and ends up calling _copy_from_user() to safely copy the data. Unfortunately in this particular case the IPv6 option is not coming from userspace which triggers the warning we see above.

BUG: policy reload can trigger transient userspace failures with ENOMEM errors

As reported by Li Kun on selinux list, a policy reload can cause a transient failure in allocating a new context/SID, thereby breaking userspace, e.g. docker calls setexeccon(3) and this interleaves with a policy reload such that it fails with ENOMEM. This is due to sidtab_context_to_sid() returning ENOMEM when the sidtab has been shutdown. During policy reload, we shut down the sidtab to prevent creating allocating any new SIDs/contexts before we copy and remap all of the existing sidtab entries for the new policy. A short term fix for this would be to return EINTR in this case instead, which is already handled by libselinux and will cause the operation to be retried, allowing it to proceed once the policy switch has completed. A longer term fix is to rework the way policy reload occurs to avoid any need to shut down the sidtab.

RFE: add controls over userfaultfd

From @stephensmalley on the SELinux mailing list:

Commit cefdca0 introduced a vm.unprivileged_userfaultfd sysctl that can be set to 0 to restrict use of userfaultfd to processes with CAP_SYS_PTRACE (hence SELinux sys_ptrace), but that only restricts the userfaultfd() system call itself, not subsequent operations on the returned file.

This likely depends on issue #47.

RFE: add LSM/SELinux hooks for bus1

The bus1 effort seems to be on a path to upstreaming. The bus1 developers are including a set of LSM hooks based on the binder hooks and discussions with them; we will ultimately need to validate those hooks and implement them for SELinux, along with corresponding policy changes.

BUG: Missing checks on prlimit()

When SELinux was first added to the kernel, a process could only get and set its own resource limits via getrlimit(2) and setrlimit(2), so no MAC checks were required for those operations. Later, SELinux added a conditional check on setrlimit(2) if the hard limit (rlim_max) was being changed in order to be able to rely on the hard limit value as a safe reset point upon context transitions when rlimitinh permission (resource limit inherit) is not allowed between the two contexts.
Later on, prlimit(2) was added to the kernel with the ability to get or set resource limits (hard or soft) of another process. SELinux wasn't updated for the introduction of prlimit() other than to pass down the task being changed (actually the task's group leader since resource limits are per-process rather than per-task) to its setrlimit hook. So there is no MAC check on using prlimit() to get the limits of another process, and there is only a MAC check on using prlimit() to set the limit of another process if the hard limit is being changed (no check for setting soft limit). The security_task_setrlimit() hook is called from do_prlimit() but only if changing the limit and while the task lock is held since it compares to see if the limit is changing. In comparison, the DAC checks for prlimit() are performed earlier by check_prlimit_permission() before calling do_prlimit(), without holding the task locked.
Probably the simplest fix would be to leave the existing security_task_setrlimit() hook unchanged so it can remain atomic with setting the limits, and add a new security_cred_prlimit() hook called from check_prlimit_permission() after the DAC checks that unconditionally checks a permission between the same two creds used for the DAC checks. Could just reuse setrlimit permission there too, although it might be slightly misleading since prlimit() can be used to get resource limits as well. If we wanted to distinguish get vs set, we would need to pass a bool into check_prlimit_permission() to indicate whether the caller passed a new_rlim and add a new getrlimit permission.

RFE: Support namespacing of policy / security contexts

At present, usage of SELinux with containers is limited to using SELinux to isolate containers from each other, not to enforce any security goals within the container. Consequently, moving your apache web server instance from a host to a container costs you the ability to limit that apache web server to least privilege, and possibly to prevent exploitation altogether. Similarly, the use of MCS to isolate containers means we can't readily use MCS within containers to isolate/sandbox individual applications within the container. This is too limiting especially as many migrate from virtualization to containers. We need to investigate ways of supporting namespaced security contexts (so that category c1 within container A is not the same as category c1 within container B, and type T1 in container A is not the same as type T1 in container B) and policy (so that container admins can only affect policy for their container).

BUG: sporadic selinux-testsuite/inet_socket failures in Linux v4.14-rcX kernels

The initial Linux v4.14-rc1 kernel release caused a regression in the selinux-testsuite's inet_socket test (see below). After a few -rcX releases it appeared that the regression had been corrected, but with v4.14-rc5, and prehaps earlier releases as well, the regression appears to be sporadic and not reliably triggered.

inet_socket/test ......... 24/33 inet_socket/client: no reply from server
inet_socket/test ......... 25/33 
#   Failed test at inet_socket/test line 222.
inet_socket/test ......... 33/33 # Looks like you failed 1 test of 33.
inet_socket/test ......... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/33 subtests

Test system configuration:

# uname -r
4.14.0-0.rc5.git2.1.1.secnext.fc28.x86_64
# rpm -q selinux-policy
selinux-policy-3.13.1-298.fc28.1.noarch
# git log -n 1 --oneline --no-decorate
e0a361b selinux-testsuite: Stop Infiniband building if not enabled

BUG: open check can trigger recvfrom denials on sockets

open permission is currently only defined for files in the kernel (COMMON_FILE_PERMS rather than
COMMON_FILE_SOCK_PERMS). Construction of an artificial test case that tries to open a socket via /proc/pid/fd will generate a recvfrom avc denial because recvfrom and open happen to map to the same permission bit in socket vs file classes.

Now, technically, open of a socket via /proc/pid/fd is not supported by the kernel regardless and will ultimately return ENXIO. But we hit the permission check first and can thus produce these odd/misleading denials.

Options:

  • Move open to COMMON_FILE_SOCK_PERMS so that it is defined for socket classes too. Would also require defining it in all of the socket classes in refpolicy. Seems kind of pointless given that the kernel
    doesn't support open() of sockets anyway.

  • Test to see if we are dealing with a socket in the code and don't
    bother checking FILE__OPEN in that case. Seems more logical to me, and
    avoids any compatibility headaches.

RFE: improve handling of anonymous inodes

From @stephensmalley on the SELinux mailing list:

... the more general problem of how anonymous inodes are used and handled in the kernel. Presently they are marked S_PRIVATE and exempted by the security framework because they have no per-instance state and a single anon inode is typically shared by many users. Setting another label in the file security struct and using that instead for permission checks may be the only option, but that requires the callers of anon_inode_getfd/anon_inode_getfile to pass in additional information about the object being represented so we can label it meaningfully.

RFE: add network addresses to the network port objects

At present the SELinux network port objects only include the protocol/port information and not the traditional address/protocol/port information traditionally used to specify a network communication endpoint. This can be problematic for policy writers who wish to differentiate between the same port with different addresses; external networks vs localhost is a common example.

RFE: Improve support for the different network address families with more socket classes

Extend SELinux to support distinctions among more (all?) address families by defining new socket security classes in policy and updating the kernel logic to map them correctly. In the kernel, add the classes to security/selinux/include/classmap.h and update security/selinux/hooks.c:socket_type_to_security_class() to map the socket domain to its class. In the policy, add the classes to security_classes and access_vectors and add allow rules as appropriate. Otherwise, many sockets get mapped to the generic socket class and are indistinguishable in policy. This came up recently with a patch to add a class for AF_ALG sockets and was previously raised as a concern with AF_BLUETOOTH; it requires a new policy capability to provide compatibility.

BUG: kernel softlockup due to too many SIDs/contexts

As reported by yangjhong1 on selinux list, when too many SIDs/contexts have been allocated (e.g. 300000+ as a result of repeated docker container creations for 2 days), sidtab_search_context becomes very slow and can cause a kernel softlockup warning.
docker randomly selects a category pair for every container creation, so this can occur just from creating containers over time, even if old containers are removed promptly (category set reuse for removed containers will eventually occur but each selection is random). It can also occur from any other activity that allocates SIDs/contexts, even those that simply probe for context validity.
sidtab_search_context() is a reverse lookup in the sidtab and presently just walks the entire hash table.
At a minimum, we need to add a reverse hash table to help mitigate this, possibly using a SELinux hashtab or the core kernel's hashtable.h or rhashtable.h data structures. We might also want a fast check of the context category set to see if it has ever been previously used (i.e. maintain a ebitmap of used categories, and check whether it contains the context's category set) so that we can fail fast on a lookup of a new category set. However, the fact that we might need to support 300000+ SIDs/contexts also suggests that we should likely revisit the sidtab forward hash table since it is too small to efficiently handle that. That too is a candidate to be replaced by e.g. hashtable or rhashtable.
I can see both short term and long term fixes for this bug; short term might just be adding simple reverse hash table and perhaps a category ebitmap test; longer term might be reworking the forward hash and switching over to hashtable or rhashtable structures.

RFE: Add a map permission check for mmap

Add a 'map' check on mmap so that we can distinguish memory mapped access (since it has different implications for revocation) When a file is opened and then read or written via syscalls like read(2)/write(2), we revalidate access on each read/write operation via selinux_file_permission() and therefore can revoke access if the process context, the file context, or the policy changes in such a manner that access is no longer allowed. When a file is opened and then memory mapped via mmap(2) and then subsequently read or written directly in memory, we presently have no way to revalidate or revoke access. The purpose of a separate map permission check on mmap(2) is to permit policy to prohibit memory mapping of specific files for which we need to ensure that every access is revalidated, particularly useful for scenarios where we expect the file to be relabeled at runtime in order to reflect state changes (e.g. cross-domain solution, assured pipeline without data copying).

BUG: type bounds does not limit xperms

The type bounds logic has not yet been updated to deal with extended permissions (xperms) aka ioctl whitelisting. Consequently, a bounded type may be allowed more extended permissions / ioctls than its bounding type. Need to update the security server logic to do this to preserve the bounding relationship.

Q: investigate the use of ns_capable_noaudit() in the network bonding driver

The bonding driver has a number of CAP_NET_ADMIN checks which may not need to be audited, see the capable(CAP_NET_ADMIN) calls in the following commit:

commit 4cd6b4754492c08f00e6237fd7e5c8b443370d15
Author: Mahesh Bandewar <[email protected]>
Date:   Thu Jun 18 11:30:54 2015 -0700

bonding: Display LACP info only to CAP_NET_ADMIN capable user

Actor and Partner details can be accessed via proc-fs, sys-fs
entries or netlink interface. These interfaces are world readable
at this moment. The earlier patch-series made the LACP communication
secure to avoid nuisance attack from within the same L2 domain but
it did not prevent "someone unprivileged" looking at that information
on host and perform the same act.

This patch essentially avoids spitting those entries if the user
in question does not have enough privileges.

Signed-off-by: Mahesh Bandewar <[email protected]>
Signed-off-by: Andy Gospodarek <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

BUG: NFS mount point temporarily unlabeled with NFSv4.2 + security_label

As discussed in https://lore.kernel.org/selinux/[email protected]/ and https://bugzilla.redhat.com/show_bug.cgi?id=1625955, the security label of a NFSv4.2 mount point shows up with the unlabeled context for a brief period of time after mounting, and is then subsequently refreshed to the correct label. This can yield permission denials on unlabeled_t and can break the selinux-testsuite on NFS (with not-yet-merged patches to perform tests/filesystem,fs_filesystem tests on the filesystem in which the testsuite is located rather than on separate ext4 filesystems created for the tests). My reply to the email thread which unfortunately didn't reach the list was:

When does NFS set the security label for the root inode / mounted directory (this would be done via nfs_setsecurity() in the labeled NFS case)? This needs to happen before it is passed to any permission calls or exposed to userspace.

For a local filesystem using xattrs (SECURITY_FS_USE_XATTR in SELinux), the initialization of the security label of the root inode happens during security_sb_set_mnt_opts() -> selinux_set_mnt_opts() -> sb_finish_set_opts() -> inode_doinit_with_dentry(root_inode, root). In that situation, SELinux calls __vfs_getxattr() to fetch the security.selinux attribute, map it to a SID, and set it in the incore inode security structure. However, for labeled NFS, which uses "native labeling" aka SECURITY_FS_USE_NATIVE within SELinux aka SECURITY_LSM_NATIVE_LABELS, we need NFS to pass the label into SELinux rather than having SELinux fetch an xattr since MAC labels are directly supported in the protocol and not merely extended or named attributes. This would normally happen via nfs_setsecurity() -> security_inode_notifysecctx() -> selinux_inode_notifysecctx(). However, it isn't clear to me that this occurs for the root inode prior to any further use of it by NFS.

Opening an issue here to track upstream since there hasn't been any response or fix yet from the NFS developers, the RHEL7 bug was closed wontfix, and the RHEL8 bug is not open for viewing.

RFE: better labeling control over cgroupfs

Taken from RHBZ 1553803 created by @rhatdan:

When creating a new directory in a cgroup file system, the new directory by default should get the label of the parent directory.

If I label a directory

/sys/fs/cgroup/unified/system.slice/docker-UUID

system_u:object_r:container_file_t:s0:c1,c2

Now I go into this directory and create a new directory.

The directory ends up labeled as

system_u:object_r:cgroup_t:s0

Where it should have been labeled

system_u:object_r:container_file_t:s0:c1,c2

This bug is preventing us for further locking down containers, by allowing them to modify partial hiarchies.

BUG: NFSv4.2 does not handle context mounts correctly

NFSv4.2 introduces support for file security labels. However, context mounts should still operate in the same manner as before, i.e. all files in the NFS filesystem should appear to be labeled with the context mount label on the client, and the NFS client filesystem should not try to set the context of any newly created files on the server. At present, we get mixed behavior with NFSv4.2: the top-level mount directory is labeled with the context mount, files that already existed on the server show up with the server file labels, newly created files on the client have their contexts set on the server to the context mount value. It is unclear whether this was ever correct for NFSv4.2, probably not.

$ cat nfs-bug.sh
#!/bin/sh
#Remove security_label if testing with an older nfs-utils, e.g. RHEL7.
cat > /etc/exports <<EOF
/home localhost(rw,no_root_squash,security_label)
EOF
exportfs -a
systemctl start nfs-server
mkdir -p /mnt/home
mount -t nfs -o vers=4.0,context=system_u:object_r:etc_t:s0 localhost:/home /mnt/home
echo "Under NFSv4.0:"
#Everything should be labeled with the context mount value.
ls -Z /mnt/home
#When we create a new file, it should appear to be labeled with the context mount value on the client.
touch /mnt/home/foo
ls -Z /mnt/home/foo
#But we should not try to set it on the server.
ls -Z /home/foo
rm /mnt/home/foo
umount /mnt/home
mount -t nfs -o vers=4.2,context=system_u:object_r:etc_t:s0 localhost:/home /mnt/home
echo "Under NFSv4.2:"
ls -Z /mnt/home
touch /mnt/home/foo
ls -Z /mnt/home/foo
ls -Z /home/foo
rm /home/foo
umount /mnt/home
rmdir /mnt/home
rm /etc/exports
exportfs -ua
systemctl stop nfs-server

$ sudo ./nfs-bug.sh
./nfs-bug.sh
Under NFSv4.0:
system_u:object_r:etc_t:s0 lost+found
system_u:object_r:etc_t:s0 sds
system_u:object_r:etc_t:s0 /mnt/home/foo
system_u:object_r:home_root_t:s0 /home/foo
Under NFSv4.2:
system_u:object_r:lost_found_t:s0 lost+found
unconfined_u:object_r:user_home_dir_t:s0 sds
system_u:object_r:etc_t:s0 /mnt/home/foo
system_u:object_r:etc_t:s0 /home/foo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.