redhat-plumbers / systemd-rhel7 Goto Github PK
View Code? Open in Web Editor NEW7️⃣ systemd source-git for RHEL7
License: GNU General Public License v2.0
7️⃣ systemd source-git for RHEL7
License: GNU General Public License v2.0
systemd System and Service Manager DETAILS: http://0pointer.de/blog/projects/systemd.html WEB SITE: http://www.freedesktop.org/wiki/Software/systemd GIT: git://anongit.freedesktop.org/systemd/systemd ssh://git.freedesktop.org/git/systemd/systemd GITWEB: http://cgit.freedesktop.org/systemd/systemd MAILING LIST: http://lists.freedesktop.org/mailman/listinfo/systemd-devel http://lists.freedesktop.org/mailman/listinfo/systemd-commits IRC: #systemd on irc.freenode.org BUG REPORTS: https://bugs.freedesktop.org/enter_bug.cgi?product=systemd AUTHOR: Lennart Poettering Kay Sievers ...and many others LICENSE: LGPLv2.1+ for all code - except sd-readahead.[ch] which is MIT - except src/shared/MurmurHash2.c which is Public Domain - except src/shared/siphash24.c which is CC0 Public Domain - except src/journal/lookup3.c which is Public Domain - except src/udev/* which is (currently still) GPLv2, GPLv2+ REQUIREMENTS: Linux kernel >= 3.7 Linux kernel >= 3.8 for Smack support Kernel Config Options: CONFIG_DEVTMPFS CONFIG_CGROUPS (it is OK to disable all controllers) CONFIG_INOTIFY_USER CONFIG_SIGNALFD CONFIG_TIMERFD CONFIG_EPOLL CONFIG_NET CONFIG_SYSFS CONFIG_PROC_FS CONFIG_FHANDLE (libudev, mount and bind mount handling) udev will fail to work with the legacy sysfs layout: CONFIG_SYSFS_DEPRECATED=n Legacy hotplug slows down the system and confuses udev: CONFIG_UEVENT_HELPER_PATH="" Userspace firmware loading is not supported and should be disabled in the kernel: CONFIG_FW_LOADER_USER_HELPER=n Some udev rules and virtualization detection relies on it: CONFIG_DMIID Support for some SCSI devices serial number retrieval, to create additional symlinks in /dev/disk/ and /dev/tape: CONFIG_BLK_DEV_BSG Required for PrivateNetwork and PrivateDevices in service units: CONFIG_NET_NS CONFIG_DEVPTS_MULTIPLE_INSTANCES Note that systemd-localed.service and other systemd units use PrivateNetwork and PrivateDevices so this is effectively required. Optional but strongly recommended: CONFIG_IPV6 CONFIG_AUTOFS4_FS CONFIG_TMPFS_XATTR CONFIG_{TMPFS,EXT4,XFS,BTRFS_FS,...}_POSIX_ACL CONFIG_SECCOMP Required for CPUShares in resource control unit settings CONFIG_CGROUP_SCHED CONFIG_FAIR_GROUP_SCHED Required for CPUQuota in resource control unit settings CONFIG_CFS_BANDWIDTH For systemd-bootchart, several proc debug interfaces are required: CONFIG_SCHEDSTATS CONFIG_SCHED_DEBUG For UEFI systems: CONFIG_EFIVAR_FS CONFIG_EFI_PARTITION Note that kernel auditing is broken when used with systemd's container code. When using systemd in conjunction with containers, please make sure to either turn off auditing at runtime using the kernel command line option "audit=0", or turn it off at kernel compile time using: CONFIG_AUDIT=n If systemd is compiled with libseccomp support on architectures which do not use socketcall() and where seccomp is supported (this effectively means x86-64 and ARM, but excludes 32-bit x86!), then nspawn will now install a work-around seccomp filter that makes containers boot even with audit being enabled. This works correctly only on kernels 3.14 and newer though. TL;DR: turn audit off, still. glibc >= 2.14 libcap libmount >= 2.20 (from util-linux) libseccomp >= 1.0.0 (optional) libblkid >= 2.20 (from util-linux) (optional) libkmod >= 15 (optional) PAM >= 1.1.2 (optional) libcryptsetup (optional) libaudit (optional) libacl (optional) libselinux (optional) liblzma (optional) liblz4 >= 119 (optional) libgcrypt (optional) libqrencode (optional) libmicrohttpd (optional) libpython (optional) libidn (optional) gobject-introspection > 1.40.0 (optional) elfutils >= 158 (optional) make, gcc, and similar tools During runtime, you need the following additional dependencies: util-linux >= v2.19 (requires fsck -l, agetty -s), v2.21 required for tests in test/ dbus >= 1.4.0 (strictly speaking optional, but recommended) dracut (optional) PolicyKit (optional) When building from git, you need the following additional dependencies: docbook-xsl xsltproc automake autoconf libtool intltool gperf gtkdocize (optional) python (optional) python-lxml (optional, but required to build the indices) sphinx (optional) When systemd-hostnamed is used, it is strongly recommended to install nss-myhostname to ensure that, in a world of dynamically changing hostnames, the hostname stays resolvable under all circumstances. In fact, systemd-hostnamed will warn if nss-myhostname is not installed. To build HTML documentation for python-systemd using sphinx, please first install systemd (using 'make install'), and then invoke sphinx-build with 'make sphinx-<target>', with <target> being 'html' or 'latexpdf'. If using DESTDIR for installation, pass the same DESTDIR to 'make sphinx-html' invocation. USERS AND GROUPS: Default udev rules use the following standard system group names, which need to be resolvable by getgrnam() at any time, even in the very early boot stages, where no other databases and network are available: audio, cdrom, dialout, disk, input, kmem, lp, tape, tty, video During runtime, the journal daemon requires the "systemd-journal" system group to exist. New journal files will be readable by this group (but not writable), which may be used to grant specific users read access. In addition, system groups "wheel" and "adm" will be given read-only access to journal files using systemd-tmpfiles.service. The journal gateway daemon requires the "systemd-journal-gateway" system user and group to exist. During execution this network facing service will drop privileges and assume this uid/gid for security reasons. Similarly, the NTP daemon requires the "systemd-timesync" system user and group to exist. Similarly, the network management daemon requires the "systemd-network" system user and group to exist. Similarly, the name resolution daemon requires the "systemd-resolve" system user and group to exist. NSS: systemd ships with three NSS modules: nss-myhostname resolves the local hostname to locally configured IP addresses, as well as "localhost" to 127.0.0.1/::1. nss-resolve enables DNS resolution via the systemd-resolved DNS/LLMNR caching stub resolver "systemd-resolved". nss-mymachines enables resolution of all local containers registered with machined to their respective IP addresses. To make use of these NSS modules, please add them to the "hosts: " line in /etc/nsswitch.conf. The "resolve" module should replace the glibc "dns" module in this file. The three modules should be used in the following order: hosts: files mymachines resolve myhostname WARNINGS: systemd will warn you during boot if /etc/mtab is not a symlink to /proc/mounts. Please ensure that /etc/mtab is a proper symlink. systemd will warn you during boot if /usr is on a different file system than /. While in systemd itself very little will break if /usr is on a separate partition, many of its dependencies very likely will break sooner or later in one form or another. For example, udev rules tend to refer to binaries in /usr, binaries that link to libraries in /usr or binaries that refer to data files in /usr. Since these breakages are not always directly visible, systemd will warn about this, since this kind of file system setup is not really supported anymore by the basic set of Linux OS components. systemd requires that the /run mount point exists. systemd also requires that /var/run is a a symlink to /run. For more information on this issue consult http://freedesktop.org/wiki/Software/systemd/separate-usr-is-broken To run systemd under valgrind, compile with VALGRIND defined (e.g. ./configure CPPFLAGS='... -DVALGRIND=1'). Otherwise, false positives will be triggered by code which violates some rules but is actually safe. ENGINEERING AND CONSULTING SERVICES: ENDOCODE <https://endocode.com/> offers professional engineering and consulting services for systemd. Please contact Chris Kühl <[email protected]> for more information.
1, Suppose a service has properties such as CPUQuota, MemoryLimit, and TasksMax, such as:
[root@iZbp16y9caqsjb7sl49vyfZ ~]# cat /usr/lib/systemd/system/chronyd.service
...
[Service]
Type=forking
PIDFile=/var/run/chrony/chronyd.pid
EnvironmentFile=-/etc/sysconfig/chronyd
ExecStart=/usr/sbin/chronyd $OPTIONS
ExecStartPost=/usr/libexec/chrony-helper update-daemon
PrivateTmp=yes
ProtectHome=yes
ProtectSystem=full
CPUQuota=20%
MemoryLimit=100M
TasksMax=10
…
2, After the system is started, we can query the correct TasksCurrent and CGroup mask values, such as:
# systemctl show -p TasksCurrent chronyd
TasksCurrent=1
# systemd-analyze dump
…
-> Unit chronyd.service:
Description: NTP client/server
Instance: n/a
Unit Load State: loaded
Unit Active State: active
Inactive Exit Timestamp: Sat 2020-09-12 17:43:22 CST
Active Enter Timestamp: Sat 2020-09-12 17:43:22 CST
Active Exit Timestamp: Sat 2020-09-12 17:43:21 CST
Inactive Enter Timestamp: Sat 2020-09-12 17:43:21 CST
May GC: no
Need Daemon Reload: no
Transient: no
Slice: system.slice
CGroup: /system.slice/chronyd.service
CGroup realized: yes
CGroup mask: 0x2b
CGroup members mask: 0x0
Name: chronyd.service
...
3,If systemctl daemon-reload is executed, the above values will be quietly changed to illegal values, such as:
# systemctl daemon-reload
# systemctl show -p TasksCurrent chronyd
TasksCurrent=18446744073709551615 ----》 Changed to -1
# systemd-analyze dump
…
-> Unit chronyd.service:
Description: NTP client/server
Instance: n/a
Unit Load State: loaded
Unit Active State: active
Inactive Exit Timestamp: Sat 2020-09-12 17:43:22 CST
Active Enter Timestamp: Sat 2020-09-12 17:43:22 CST
Active Exit Timestamp: Sat 2020-09-12 17:43:21 CST
Inactive Enter Timestamp: Sat 2020-09-12 17:43:21 CST
May GC: no
Need Daemon Reload: no
Transient: no
Slice: system.slice
CGroup: /system.slice/chronyd.service
CGroup realized: yes
CGroup mask: 0x0 ----》 Changed to 0
CGroup members mask: 0x0
Name: chronyd.service
This is a copy of the following issues:
systemd/systemd#15221
We may need to port the following patches to the rhel's systemd to resolve the denial of service failure:
commit ba0d56f ("mount: don't propagate errors from mount_setup_unit() further up")
Thanks,
Recently we have merged #135 with the backport of 43b4e30. In the upstream code we now have following block of code,
suffix = strrchr(de->d_name, '.');
if (!STRPTR_IN_SET(suffix, ".wants", ".requires"))
continue;
Because in our RHEL-7 version we are missing STRPTR_IN_SET()
macro the code got rewritten as follows,
suffix = strrchr(de->d_name, '.');
if (!streq(suffix, ".wants") && !streq(suffix, ".requires"))
continue;
However, above piece of code has a bug. In case that de->d_name
doesn't contain .
thenstreq()
macro, which in turn calls strcmp()
function, will pass NULL
as first argument and that will cause a crash. We should fix this and replace calls to streq()
with streq_ptr()
.
I see in the bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1770158 that this repo is referenced. I want to understand how the RHEL systemd development cycle works in relation to github systemd projects. How does the code in the RHEL RPM systemd-219-67.el7_7.2.src.rpm relate to the systemd projects in github? Is this Repo used for generating the upstream RHEL systemd RPMs?
linux boot sometimes dead waiting for /sysroot of a RAID0 btrfs filesystem with SSD 1.92T *10
console output when dead waiting:
A start job is running for dev-disk-by\x2dlabel-OS_T604.device (3min28s / no limit)
OS: rhel 7.8/7.9
Frequency: about 10%
We add "rd.retry=30 rd.timeout=90" to GRUB_CMDLINE_LINUX, then
[ 102.916820] localhost systemd[1]: Job dev-disk-by\x2dlabel-OS_T640.device/start timed out.
[ 102.917137] localhost systemd[1]: Timed out waiting for device dev-disk-by\x2dlabel-OS_T640.device.
[ 102.917393] localhost systemd[1]: Dependency failed for File System Check on /dev/disk/by-label/OS_T640.
[ 102.917626] localhost systemd[1]: Dependency failed for /sysroot.
[ 102.917863] localhost systemd[1]: Dependency failed for Initrd Root File System.
[ 102.918103] localhost systemd[1]: Dependency failed for Reload Configuration from the Real Root.
But , 'mount -L OS_T640 /dir1' is OK in this dracut shell. and becuase of the 10% frequency, it seems a race problem of systemd or udev.
we attach these files too.
/run/initramfs/rdsosreport.txt
/boot/config-5.4.83-1.1.el7.x86_64
Hopefully this is the right place for this... wasn't received well at systemd/systemd#17841
It seems that after I upgraded from CentOS 7.8 (systemd-219-73.el7_8.9.x86_64) to 7.9 (systemd-219-78.el7_9.2.x86_64), this issue began.
I use systemd-journal-upload on my clients and systemd-journal-remote on a central journal server. When I want to review logs from all clients, I use journalctl -D /var/log/journal/remote
. I expect this command to show journal entries from all remote clients. After updating, only some remote clients' output appears.
By comparing ls -l /var/log/journal/remote/
against the files "added" in the following command, I can see that all of the expected files are supposedly being included in the journalctl query:
$ sudo SYSTEMD_LOG_LEVEL=debug journalctl -D /var/log/journal/remote
Considering root directory '/var/log/journal/remote'.
Root directory /var/log/journal/remote added.
File /var/log/journal/remote/remote-10.0.0.1.journal added.
File /var/log/journal/remote/remote-10.0.0.2.journal added.
File /var/log/journal/remote/remote-10.0.0.3.journal added.
File /var/log/journal/remote/remote-10.0.0.4.journal added.
....
If I generate journal entries on 10.0.0.1 or 10.0.0.3, for example by running echo test | systemd-cat
, I do not see those entries using journalctl -D /var/log/journal/remote
. However, if I directly inspect the journal file such as journalctl --file=/var/log/journal/remote/remote-10.0.0.1.journal
though, I see the entries just fine.
Version information:
$ journalctl --version
systemd 219
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN
$ yum list installed | grep systemd
systemd.x86_64 219-78.el7_9.2
$ head -1 /etc/*release
==> /etc/centos-release <==
CentOS Linux release 7.9.2009 (Core)
219
CentOS7
4.19.48-006
aarch64
systemd
systemctl daemon-reload reloads the daemon successfully without any exception, and the unit linked list is intact.
systemctl daemon-reload caused segmentation fault in systemd.
The coredump suggests it crashes in
unic.c: unit_free(Unit *u)
if (u->type != _UNIT_TYPE_INVALID)
LIST_REMOVE(units_by_type, u->manager->units_by_type[u->type], u);
Here, the linked list is broken.
u->units_by_type_prev points to a wrong address, 0xaaaa87549290, whereas the correct one is 0xaaaad7549290.
Comparing 0xaaaa87549290 and 0xaaaad7549290 , there is only 2 bit difference, 0x000050000000.
I know systemd 219 is a thing from the last decade. I'm sorry to ask help in this manner. But I've struggled for weeks. I searched all the commit history but found nothing related. I am grateful if someone has any suggestion on this. Thanks a lot.
Unknown
No response
#0 0x00007f644f19e8c7 in kill () from /lib64/libc.so.6
#1 0x00005556566edcdd in crash (sig=6) at src/core/main.c:206
#2 <signal handler called>
#3 0x00007f644f19e5f7 in raise () from /lib64/libc.so.6
#4 0x00007f644f19fce8 in abort () from /lib64/libc.so.6
#5 0x0000555656756882 in log_assert_failed (text=text@entry=0x5556567fc545 "dev_autofs_fd >= 0",
file=file@entry=0x5556567fc3b4 "src/core/automount.c", line=line@entry=370,
func=func@entry=0x5556567fd0b4 <__PRETTY_FUNCTION__.17397> "open_ioctl_fd") at src/shared/log.c:754
#6 0x00005556567b064a in open_ioctl_fd (dev_autofs_fd=-1, where=<optimized out>, devid=<optimized out>) at src/core/automount.c:370
#7 0x00005556567b10f6 in automount_send_ready (a=a@entry=0x555656b79110, tokens=0x555656c8b560, status=status@entry=0)
at src/core/automount.c:469
#8 0x00005556567b360e in automount_update_mount (a=0x555656b79110, old_state=old_state@entry=MOUNT_DEAD,
state=state@entry=MOUNT_MOUNTED) at src/core/automount.c:509
#9 0x00005556567ac9e8 in mount_notify_automount (state=MOUNT_MOUNTED, old_state=MOUNT_DEAD, m=0x555656b77000) at src/core/mount.c:588
#10 mount_set_state (m=m@entry=0x555656b77000, state=MOUNT_MOUNTED) at src/core/mount.c:619
#11 0x00005556567ad068 in mount_coldplug (u=0x555656b77000, deferred_work=<optimized out>) at src/core/mount.c:671
#12 0x000055565679c589 in unit_coldplug (u=0x555656b77000, deferred_work=deferred_work@entry=0x555656d3e070) at src/core/unit.c:2886
#13 0x00005556566f031e in manager_coldplug (m=m@entry=0x555656ac5980) at src/core/manager.c:1125
#14 0x00005556566f4a7a in manager_startup (m=0x555656ac5980, serialization=0x555656ac5230, fds=<optimized out>)
at src/core/manager.c:1288
#15 0x00005556566ea4e3 in main (argc=4, argv=0x7ffe78ac9848) at src/core/main.c:1798
(gdb) p *a
$11 = {meta = {manager = 0x555656ac5980, type = UNIT_AUTOMOUNT, load_state = UNIT_LOADED, merged_into = 0x0,
id = 0x555656b29ce0 "proc-sys-fs-binfmt_misc.automount", instance = 0x0, names = 0x555656b79450, dependencies = {0x555656b78500,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x555656b794f0, 0x0, 0x0, 0x0, 0x0, 0x555656b76980, 0x555656b784c0, 0x0, 0x555656b76710,
0x0, 0x0, 0x0, 0x0, 0x555656b769f0, 0x555656b79530}, requires_mounts_for = 0x555656b76750,
description = 0x555656b76eb0 "Arbitrary Executable File Formats File System Automount Point", documentation = 0x555656b76960,
fragment_path = 0x555656b6e540 "/usr/lib/systemd/system/proc-sys-fs-binfmt_misc.automount", source_path = 0x0, dropin_paths = 0x0,
fragment_mtime = 1595213181000000, source_mtime = 0, dropin_mtime = 0, job = 0x0, nop_job = 0x0, job_timeout = 0,
job_timeout_action = EMERGENCY_ACTION_NONE, job_timeout_reboot_arg = 0x0, refs_by_target = 0x0, conditions = 0x555656b769c0,
asserts = 0x0, condition_timestamp = {realtime = 1591608832758220, monotonic = 70060990954163}, assert_timestamp = {
realtime = 1591608832758232, monotonic = 70060990954175}, inactive_exit_timestamp = {realtime = 1591608832758666,
monotonic = 70060990954609}, active_enter_timestamp = {realtime = 1591608832758666, monotonic = 70060990954609},
active_exit_timestamp = {realtime = 1591608832758141, monotonic = 70060990954084}, inactive_enter_timestamp = {
realtime = 1591608832758141, monotonic = 70060990954084}, slice = {source = 0x0, target = 0x0, refs_by_target_next = 0x0,
refs_by_target_prev = 0x0}, units_by_type_next = 0x0, units_by_type_prev = 0x0, has_requires_mounts_for_next = 0x0,
has_requires_mounts_for_prev = 0x0, load_queue_next = 0x0, load_queue_prev = 0x0, dbus_queue_next = 0x0, dbus_queue_prev = 0x0,
cleanup_queue_next = 0x0, cleanup_queue_prev = 0x0, gc_queue_next = 0x555656b78840, gc_queue_prev = 0x555656b796c0,
cgroup_queue_next = 0x0, cgroup_queue_prev = 0x0, target_deps_queue_next = 0x0, target_deps_queue_prev = 0x0, pids = 0x0,
sigchldgen = 0, gc_marker = 0, auto_stop_ratelimit = {interval = 10000000, begin = 0, burst = 16, num = 0}, deserialized_job = -1,
load_error = 0, unit_file_state = _UNIT_FILE_STATE_INVALID, unit_file_preset = -1, cgroup_path = 0x0, cgroup_realized_mask = 0,
cgroup_subtree_mask = 0, cgroup_members_mask = 0, on_failure_job_mode = JOB_REPLACE, stop_when_unneeded = false,
default_dependencies = false, refuse_manual_start = false, refuse_manual_stop = false, allow_isolate = false,
ignore_on_isolate = true, ignore_on_snapshot = false, condition_result = true, assert_result = true, transient = false,
in_load_queue = false, in_dbus_queue = false, in_cleanup_queue = false, in_gc_queue = true, in_cgroup_queue = false,
in_target_deps_queue = false, sent_dbus_new_signal = true, no_gc = false, in_audit = false, cgroup_realized = false,
cgroup_members_mask_valid = true, cgroup_subtree_mask_valid = true}, state = AUTOMOUNT_DEAD,
deserialized_state = AUTOMOUNT_RUNNING, where = 0x555656b76fd0 "/proc/sys/fs/binfmt_misc", timeout_idle_usec = 0, pipe_fd = 24,
pipe_event_source = 0x0, directory_mode = 493, dev_id = 1048609, tokens = 0x555656c8b560, expire_tokens = 0x0,
expire_event_source = 0x0, result = AUTOMOUNT_SUCCESS}
(gdb) p *a->tokens
$10 = {b = {hash_ops = 0x555656a4a6d0 <trivial_hash_ops>, {indirect = {storage = 0x3 <Address 0x3 out of bounds>,
hash_key = '\000' <repeats 15 times>, n_entries = 0, n_buckets = 0, idx_lowest_entry = 4294967040, _pad = "\000\000"},
direct = {storage = "\003", '\000' <repeats 32 times>, "\377\377\377\000\000"}}, type = HASHMAP_TYPE_SET, has_indirect = false,
n_direct_entries = 1, from_pool = false}}
a->tokens is non-empty and dev_autofs_fd==-1,so the assertion fails:
458 static int automount_send_ready(Automount *a, Set *tokens, int status) {
459 _cleanup_close_ int ioctl_fd = -1;
460 unsigned token;
461 int r;
462
463 assert(a);
464 assert(status <= 0);
465
466 if (set_isempty(tokens))
467 return 0;
468
469 ioctl_fd = open_ioctl_fd(UNIT(a)->manager->dev_autofs_fd, a->where, a->dev_id);
2,a->tokens change process analysis
Precondition:
We can observe the following changes in a->tokens:
step a: first trigger packet.v5_packet.wait_queue_token to be added to a->tokens (via /dev/autofs), as follows:
manager_loop
-> sd_event_dispatch
-> source_dispatch
-> automount_dispatch_io
-> set_put(a->tokens, UINT_TO_PTR(packet.v5_packet.wait_queue_token));
step b:then trigger the deletion of tokens, and a->tokens becomes empty again (via /proc/1/mountinfo), as follows:
manager_loop
-> sd_event_dispatch
-> source_dispatch
-> manager_dispatch_signal_fd
-> manager_dispatch_sigchld
-> mount_sigchld_event
-> mount_set_state
-> mount_notify_automount
-> automount_update_mount
-> automount_send_ready
-> set_steal_first(tokens)
If we continue to execute systemctl daemon-reexec, even though manager->dev_autofs_fd is also -1, because a->tokens is empty, it will return directly, there will be no problem.
But if for some reason, step a is executed, step b is not executed, and then execute systemctl daemon-reexec, it will definitely trigger a failure.
Based on this, we could reproduce it.
3,How to reproduce:
Construct a mount path exceeding 256 characters:
# mkdir -p /run/kata-containers/shared/sandboxes/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74-a1bed3c11a474518-aaaaaa_xxxx_mix_xxxx_container_role_20200310112829109807.yyyy_container_role_20200310112829109807_15_81
# mkdir -p /tmp/test
# mount --bind /tmp/test /run/kata-containers/shared/sandboxes/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74-a1bed3c11a474518-aaaaaa_xxxx_mix_xxxx_container_role_20200310112829109807.yyyy_container_role_20200310112829109807_15_81
# ls -l /proc/1/fd | grep mount
/proc/1/mountinfo still exists
# systemctl daemon-reload
# ls -l /proc/1/fd | grep mount
/proc/1/mountinfo will disappear
Ensure that the proc-sys-fs-binfmt_misc.automount is active
Ensure that proc-sys-fs-binfmt_misc.mount is inactive
# ls /proc/sys/fs/binfmt_misc
proc-sys-fs-binfmt_misc.mount will change from inactive to acitve
# umount /proc/sys/fs/binfmt_misc
# stat /proc/sys/fs/binfmt_misc/
It will stay stuck, just like issue https://github.com/systemd/systemd/issues/15221
Finally, execute the following command in another shell terminal:
# systemctl daemon-reexec
systemd will crash immediately
4, How to fix
It may be necessary to merge the following patches:
a, commit ba0d56f55f2073164799be714b5bd1aad94d059a (“mount: don't propagate errors from mount_setup_unit() further up”)
commit ba0d56f ("mount: don't propagate errors from mount_setup_unit() further up")
-> prevent /proc/1/mountinfo from being affected when the mount path exceeds 256 characters;
It has been merged after 73.el7_8.5.
b, The following code snippet of commit fae03ed (“automount: rework propagation between automount and mount units”):
/* Don't propagate state changes from the mount if we are already down */
if (!IN_SET(a->state, AUTOMOUNT_WAITING, AUTOMOUNT_RUNNING))
return;
->when the automount status is down, do not propagate the status change.
This patch is also NEEDED, thanks
We have encountered a problem. All systemctl commands cannot be executed.
Some errors similar to the following are reported:
Jul 13 19:37:17 e99g07484.et2 dbus[2155]: [system] Activating via systemd: service name='org.freedesktop.PolicyKit1' unit='polkit.service'
Jul 13 19:37:17 e99g07484.et2 dbus[2155]: [system] Activation via systemd failed for unit 'polkit.service': Argument list too long
Jul 13 19:37:17 e99g07484.et2 dbus[2155]: [system] Activating via systemd: service name='org.freedesktop.PolicyKit1' unit='polkit.service'
Jul 13 19:37:17 e99g07484.et2 dbus[2155]: [system] Activation via systemd failed for unit 'polkit.service': Argument list too long
I collected a coredump for analysis and found that the number of n_entries in m->units reached 131072.
(gdb) p m->units
$2 = (Hashmap *) 0x56087eba5290
(gdb) p *m->units
$3 = {b = {hash_ops = 0x56087e59b6e0 <string_hash_ops>, {indirect = {storage = 0x7f87a35fc010 "0\222\275\205\bV",
hash_key = "H\206\243\250\273$\033\275\224\213\207\025\326p\214\300", n_entries = 131072, n_buckets = 246723,
idx_lowest_entry = 0, _pad = "\000\000"}, direct = {
storage = "\020\300_\243\207\177\000\000H\206\243\250\273$\033\275\224\213\207\025\326p\214\300\000\000\002\000\303\303\003\000\000\000\000\000\000\000"}}, type = HASHMAP_TYPE_PLAIN, has_indirect = true, n_direct_entries = 0, from_pool = false}}
(gdb)
* n_entries = 131072 *
#define MANAGER_MAX_NAMES 131072 /* 128K */
I went on to parse the units details and found that most of the units (13W +) are mounts.
Use the following GDB command to traverse the linked list:
$ cat .gdbinit
define dump_mount_list
set $_node = (Unit *)$arg0
set $_num = 0
while ($_node)
printf "addr: %p, mount->id: %s, source_path: %s\n", $_node, $_node->id, $_node->source_path
set $_node = $_node->units_by_type_next
set $_num = $_num + 1
end
printf "num is %d\n", $_num
end
enum UnitType {
UNIT_SERVICE = 0,
UNIT_SOCKET,
UNIT_BUSNAME,
UNIT_TARGET,
UNIT_SNAPSHOT,
UNIT_DEVICE,
UNIT_MOUNT,
UNIT_AUTOMOUNT,
(gdb) p m->units_by_type[6]
$1 = (Unit *) 0x5608a73630f0
130,000 + mount points will be printed:
addr: 0x5608a73630f0, mount->id: home-t4-pouch-containers-5b1dce60939b18d5661d9b6d498c65d08178121f7b95c1481920379acb45dcec-rootfs.mount, source_path: /proc/self/mountinfo
addr: 0x5608a73829f0, mount->id: home-t4-pouch-containerd-state-io.containerd.runtime.v1.linux-default-48b9c8cefd5c953bbf3303e8b4ea7b04a777ccfc789e05e5adf5ebddb834b958-rootfs.mount, source_path: /proc/self/mountinfo
addr: 0x5608a739c240, mount->id: home-t4-pouch-containers-48b9c8cefd5c953bbf3303e8b4ea7b04a777ccfc789e05e5adf5ebddb834b958-rootfs.mount, source_path: /proc/self/mountinfo
addr: 0x5608a7390680, mount->id: home-t4-pouch-containerd-state-io.containerd.runtime.v1.linux-default-8356eb46281e7fbe2c5da86d1a62eb4f93658cb7e3a4c4c854b656921649e1a4-rootfs.mount, source_path: /proc/self/mountinfo
addr: 0x5608a7377040, mount->id: home-t4-pouch-containers-8356eb46281e7fbe2c5da86d1a62eb4f93658cb7e3a4c4c854b656921649e1a4-rootfs.mount, source_path: /proc/self/mountinfo
addr: 0x5608a7267b40, mount->id: home-t4-pouch-containerd-state-io.containerd.runtime.v1.linux-default-6b3ce6d5a5f2126b6c0df9ac3663f8a4e3fc553e8952aa7665f622f662f7f154-rootfs.mount, source_path: /proc/self/mountinfo
addr: 0x5608a732e180, mount->id: home-t4-pouch-containers-6b3ce6d5a5f2126b6c0df9ac3663f8a4e3fc553e8952aa7665f622f662f7f154-rootfs.mount, source_path: /proc/self/mountinfo
……
After analyzing the code, I found the following possible bugs:
static int mount_dispatch_io(sd_event_source *source, int fd, uint32_t revents, void *userdata) {
……
r = mount_load_proc_self_mountinfo(m, true); -->Add the data from /proc/self/mountinfo to m->units
if (r < 0) {
/* Reset flags, just in case, for later calls */
LIST_FOREACH(units_by_type, u, m->units_by_type[UNIT_MOUNT]) {
Mount *mount = MOUNT(u);
mount->is_mounted = mount->just_mounted = mount->just_changed = false;
}
return 0; -->If returned here, the data in m->units will only increase, not decrease
}
manager_dispatch_load_queue(m);
LIST_FOREACH(units_by_type, u, m->units_by_type[UNIT_MOUNT]) {
… -->The code here will clean up the residual data in m->units
}
…
}
I also constructed a use case to reproduce the bug.
A, Construct a path greater than 256 characters (so mount_load_proc_self_mountinfo () returns an error code):
# mkdir -p /run/kata-containers/shared/sandboxes/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74-a1bed3c11a474518-aaaaaa_xxxx_mix_xxxx_container_role_20200310112829109807.yyyy_container_role_20200310112829109807_15_81
# mkdir -p /tmp/test
# mount --bind /tmp/test /run/kata-containers/shared/sandboxes/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74-a1bed3c11a474518-aaaaaa_xxxx_mix_xxxx_container_role_20200310112829109807.yyyy_container_role_20200310112829109807_15_81
B, Mount some directories, then umount them
# mkdir -p ./a1 ./b1
# mount --bind a1 b1
#
# mkdir -p ./a2 ./b2
# mount --bind a2 b2
#
# mkdir -p ./a3 ./b3
# mount --bind a3 b3
#
# umount b1
# umount b2
# umount b3
C, Finally, through GDB analysis, it can be found that these mount points are still in m->units:
Breakpoint 1, mount_dispatch_io (source=0x55a04d0f4910, fd=9, revents=10, userdata=0x55a04d0ec040) at src/core/mount.c:1711
1711 static int mount_dispatch_io(sd_event_source *source, int fd, uint32_t revents, void *userdata) {
(gdb) bt
#0 mount_dispatch_io (source=0x55a04d0f4910, fd=9, revents=10, userdata=0x55a04d0ec040) at src/core/mount.c:1711
#1 0x000055a04c7ab6f0 in source_dispatch (s=s@entry=0x55a04d0f4910) at src/libsystemd/sd-event/sd-event.c:2115
#2 0x000055a04c7ac78a in sd_event_dispatch (e=0x55a04d0ec5e0) at src/libsystemd/sd-event/sd-event.c:2472
#3 0x000055a04c7ac92f in sd_event_run (e=<optimized out>, timeout=<optimized out>) at src/libsystemd/sd-event/sd-event.c:2501
#4 0x000055a04c70c2c3 in manager_loop (m=0x55a04d0ec040) at src/core/manager.c:2274
#5 0x000055a04c7006b1 in main (argc=5, argv=0x7ffe1edfaae8) at src/core/main.c:1819
(gdb) frame 5
#5 0x000055a04c7006b1 in main (argc=5, argv=0x7ffe1edfaae8) at src/core/main.c:1819
1819 r = manager_loop(m);
(gdb) p m->units_by_type[6]
$1 = (Unit *) 0x55a04d108c80
(gdb) dump_mount_list 0x55a04d108c80
addr: 0x55a04d108c80, mount->id: mnt-work-issues-systemd_maxunits-b3.mount, source_path: /proc/self/mountinfo
addr: 0x55a04d15e120, mount->id: mnt-work-issues-systemd_maxunits-b2.mount, source_path: /proc/self/mountinfo
...
addr: 0x55a04d0ef580, mount->id: run-user-0.mount, source_path: /proc/self/mountinfo
addr: 0x55a04d17ca10, mount->id: mnt-work-issues-systemd_maxunits-b1.mount, source_path: /proc/self/mountinfo
....
Similar bugs:
https://access.redhat.com/solutions/4620671
systemd/systemd#15221
kubernetes/kubernetes#57345
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.