Code Monkey home page Code Monkey logo

systemd-rhel7's Introduction

systemd System and Service Manager

DETAILS:
        http://0pointer.de/blog/projects/systemd.html

WEB SITE:
        http://www.freedesktop.org/wiki/Software/systemd

GIT:
        git://anongit.freedesktop.org/systemd/systemd
        ssh://git.freedesktop.org/git/systemd/systemd

GITWEB:
        http://cgit.freedesktop.org/systemd/systemd

MAILING LIST:
        http://lists.freedesktop.org/mailman/listinfo/systemd-devel
        http://lists.freedesktop.org/mailman/listinfo/systemd-commits

IRC:
        #systemd on irc.freenode.org

BUG REPORTS:
        https://bugs.freedesktop.org/enter_bug.cgi?product=systemd

AUTHOR:
        Lennart Poettering
        Kay Sievers
        ...and many others

LICENSE:
        LGPLv2.1+ for all code
        - except sd-readahead.[ch] which is MIT
        - except src/shared/MurmurHash2.c which is Public Domain
        - except src/shared/siphash24.c which is CC0 Public Domain
        - except src/journal/lookup3.c which is Public Domain
        - except src/udev/* which is (currently still) GPLv2, GPLv2+

REQUIREMENTS:
        Linux kernel >= 3.7
        Linux kernel >= 3.8 for Smack support

        Kernel Config Options:
          CONFIG_DEVTMPFS
          CONFIG_CGROUPS (it is OK to disable all controllers)
          CONFIG_INOTIFY_USER
          CONFIG_SIGNALFD
          CONFIG_TIMERFD
          CONFIG_EPOLL
          CONFIG_NET
          CONFIG_SYSFS
          CONFIG_PROC_FS
          CONFIG_FHANDLE (libudev, mount and bind mount handling)

        udev will fail to work with the legacy sysfs layout:
          CONFIG_SYSFS_DEPRECATED=n

        Legacy hotplug slows down the system and confuses udev:
          CONFIG_UEVENT_HELPER_PATH=""

        Userspace firmware loading is not supported and should
        be disabled in the kernel:
          CONFIG_FW_LOADER_USER_HELPER=n

        Some udev rules and virtualization detection relies on it:
          CONFIG_DMIID

        Support for some SCSI devices serial number retrieval, to
        create additional symlinks in /dev/disk/ and /dev/tape:
          CONFIG_BLK_DEV_BSG

        Required for PrivateNetwork and PrivateDevices in service units:
          CONFIG_NET_NS
          CONFIG_DEVPTS_MULTIPLE_INSTANCES
        Note that systemd-localed.service and other systemd units use
        PrivateNetwork and PrivateDevices so this is effectively required.

        Optional but strongly recommended:
          CONFIG_IPV6
          CONFIG_AUTOFS4_FS
          CONFIG_TMPFS_XATTR
          CONFIG_{TMPFS,EXT4,XFS,BTRFS_FS,...}_POSIX_ACL
          CONFIG_SECCOMP

        Required for CPUShares in resource control unit settings
          CONFIG_CGROUP_SCHED
          CONFIG_FAIR_GROUP_SCHED

        Required for CPUQuota in resource control unit settings
          CONFIG_CFS_BANDWIDTH

        For systemd-bootchart, several proc debug interfaces are required:
          CONFIG_SCHEDSTATS
          CONFIG_SCHED_DEBUG

        For UEFI systems:
          CONFIG_EFIVAR_FS
          CONFIG_EFI_PARTITION

        Note that kernel auditing is broken when used with systemd's
        container code. When using systemd in conjunction with
        containers, please make sure to either turn off auditing at
        runtime using the kernel command line option "audit=0", or
        turn it off at kernel compile time using:
          CONFIG_AUDIT=n
        If systemd is compiled with libseccomp support on
        architectures which do not use socketcall() and where seccomp
        is supported (this effectively means x86-64 and ARM, but
        excludes 32-bit x86!), then nspawn will now install a
        work-around seccomp filter that makes containers boot even
        with audit being enabled. This works correctly only on kernels
        3.14 and newer though. TL;DR: turn audit off, still.

        glibc >= 2.14
        libcap
        libmount >= 2.20 (from util-linux)
        libseccomp >= 1.0.0 (optional)
        libblkid >= 2.20 (from util-linux) (optional)
        libkmod >= 15 (optional)
        PAM >= 1.1.2 (optional)
        libcryptsetup (optional)
        libaudit (optional)
        libacl (optional)
        libselinux (optional)
        liblzma (optional)
        liblz4 >= 119 (optional)
        libgcrypt (optional)
        libqrencode (optional)
        libmicrohttpd (optional)
        libpython (optional)
        libidn (optional)
        gobject-introspection > 1.40.0 (optional)
        elfutils >= 158 (optional)
        make, gcc, and similar tools

        During runtime, you need the following additional
        dependencies:

        util-linux >= v2.19 (requires fsck -l, agetty -s),
                      v2.21 required for tests in test/
        dbus >= 1.4.0 (strictly speaking optional, but recommended)
        dracut (optional)
        PolicyKit (optional)

        When building from git, you need the following additional
        dependencies:

        docbook-xsl
        xsltproc
        automake
        autoconf
        libtool
        intltool
        gperf
        gtkdocize (optional)
        python (optional)
        python-lxml (optional, but required to build the indices)
        sphinx (optional)

        When systemd-hostnamed is used, it is strongly recommended to
        install nss-myhostname to ensure that, in a world of
        dynamically changing hostnames, the hostname stays resolvable
        under all circumstances. In fact, systemd-hostnamed will warn
        if nss-myhostname is not installed.

        To build HTML documentation for python-systemd using sphinx,
        please first install systemd (using 'make install'), and then
        invoke sphinx-build with 'make sphinx-<target>', with <target>
        being 'html' or 'latexpdf'. If using DESTDIR for installation,
        pass the same DESTDIR to 'make sphinx-html' invocation.

USERS AND GROUPS:
        Default udev rules use the following standard system group
        names, which need to be resolvable by getgrnam() at any time,
        even in the very early boot stages, where no other databases
        and network are available:

        audio, cdrom, dialout, disk, input, kmem, lp, tape, tty, video

        During runtime, the journal daemon requires the
        "systemd-journal" system group to exist. New journal files will
        be readable by this group (but not writable), which may be used
        to grant specific users read access. In addition, system
        groups "wheel" and "adm" will be given read-only access to
        journal files using systemd-tmpfiles.service.

        The journal gateway daemon requires the
        "systemd-journal-gateway" system user and group to
        exist. During execution this network facing service will drop
        privileges and assume this uid/gid for security reasons.

        Similarly, the NTP daemon requires the "systemd-timesync" system
        user and group to exist.

        Similarly, the network management daemon requires the
        "systemd-network" system user and group to exist.

        Similarly, the name resolution daemon requires the
        "systemd-resolve" system user and group to exist.

NSS:
        systemd ships with three NSS modules:

        nss-myhostname resolves the local hostname to locally
        configured IP addresses, as well as "localhost" to
        127.0.0.1/::1.

        nss-resolve enables DNS resolution via the systemd-resolved
        DNS/LLMNR caching stub resolver "systemd-resolved".

        nss-mymachines enables resolution of all local containers
        registered with machined to their respective IP addresses.

        To make use of these NSS modules, please add them to the
        "hosts: " line in /etc/nsswitch.conf. The "resolve" module
        should replace the glibc "dns" module in this file.

        The three modules should be used in the following order:

                hosts: files mymachines resolve myhostname

WARNINGS:
        systemd will warn you during boot if /etc/mtab is not a
        symlink to /proc/mounts. Please ensure that /etc/mtab is a
        proper symlink.

        systemd will warn you during boot if /usr is on a different
        file system than /. While in systemd itself very little will
        break if /usr is on a separate partition, many of its
        dependencies very likely will break sooner or later in one
        form or another. For example, udev rules tend to refer to
        binaries in /usr, binaries that link to libraries in /usr or
        binaries that refer to data files in /usr. Since these
        breakages are not always directly visible, systemd will warn
        about this, since this kind of file system setup is not really
        supported anymore by the basic set of Linux OS components.

        systemd requires that the /run mount point exists. systemd also
        requires that /var/run is a a symlink to /run.

        For more information on this issue consult
        http://freedesktop.org/wiki/Software/systemd/separate-usr-is-broken

        To run systemd under valgrind, compile with VALGRIND defined
        (e.g. ./configure CPPFLAGS='... -DVALGRIND=1'). Otherwise,
        false positives will be triggered by code which violates
        some rules but is actually safe.

ENGINEERING AND CONSULTING SERVICES:
        ENDOCODE <https://endocode.com/> offers professional
        engineering and consulting services for systemd. Please
        contact Chris Kühl <[email protected]> for more information.

systemd-rhel7's People

Contributors

ahkok avatar crrodriguez avatar davidstrauss avatar davidz25 avatar dbuch avatar falconindy avatar filbranden avatar grawity avatar gregkh avatar haraldh avatar holtmann avatar hreinecke avatar jengelh avatar kaisforza avatar kaysievers avatar keszybz avatar lnykryn avatar mbiebl avatar mfwitten avatar michaelolbrich avatar michich avatar msekletar avatar pfl avatar phomes avatar poettering avatar rfc1036 avatar ronnychevalier avatar teg avatar zonque avatar zzam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

systemd-rhel7's Issues

Fix potential NULL pointer dereference

Recently we have merged #135 with the backport of 43b4e30. In the upstream code we now have following block of code,

suffix = strrchr(de->d_name, '.');
if (!STRPTR_IN_SET(suffix, ".wants", ".requires"))
        continue;

Because in our RHEL-7 version we are missing STRPTR_IN_SET() macro the code got rewritten as follows,

suffix = strrchr(de->d_name, '.');
if (!streq(suffix, ".wants") && !streq(suffix, ".requires"))
        continue;

However, above piece of code has a bug. In case that de->d_name doesn't contain . thenstreq() macro, which in turn calls strcmp() function, will pass NULL as first argument and that will cause a crash. We should fix this and replace calls to streq() with streq_ptr().

journalctl -D does not show entries from some .journal files in the directory

Hopefully this is the right place for this... wasn't received well at systemd/systemd#17841

It seems that after I upgraded from CentOS 7.8 (systemd-219-73.el7_8.9.x86_64) to 7.9 (systemd-219-78.el7_9.2.x86_64), this issue began.

I use systemd-journal-upload on my clients and systemd-journal-remote on a central journal server. When I want to review logs from all clients, I use journalctl -D /var/log/journal/remote. I expect this command to show journal entries from all remote clients. After updating, only some remote clients' output appears.

By comparing ls -l /var/log/journal/remote/ against the files "added" in the following command, I can see that all of the expected files are supposedly being included in the journalctl query:

$ sudo SYSTEMD_LOG_LEVEL=debug journalctl -D /var/log/journal/remote
Considering root directory '/var/log/journal/remote'.
Root directory /var/log/journal/remote added.
File /var/log/journal/remote/remote-10.0.0.1.journal added.
File /var/log/journal/remote/remote-10.0.0.2.journal added.
File /var/log/journal/remote/remote-10.0.0.3.journal added.
File /var/log/journal/remote/remote-10.0.0.4.journal added.
....

If I generate journal entries on 10.0.0.1 or 10.0.0.3, for example by running echo test | systemd-cat, I do not see those entries using journalctl -D /var/log/journal/remote. However, if I directly inspect the journal file such as journalctl --file=/var/log/journal/remote/remote-10.0.0.1.journal though, I see the entries just fine.

Version information:
$ journalctl --version
systemd 219
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN

$ yum list installed | grep systemd
systemd.x86_64 219-78.el7_9.2

$ head -1 /etc/*release
==> /etc/centos-release <==
CentOS Linux release 7.9.2009 (Core)

systemd crashes when manager->dev_autofs_fd is -1 and a->tokens is not empty

  1. Coredump analysis
#0  0x00007f644f19e8c7 in kill () from /lib64/libc.so.6
#1  0x00005556566edcdd in crash (sig=6) at src/core/main.c:206
#2  <signal handler called>
#3  0x00007f644f19e5f7 in raise () from /lib64/libc.so.6
#4  0x00007f644f19fce8 in abort () from /lib64/libc.so.6
#5  0x0000555656756882 in log_assert_failed (text=text@entry=0x5556567fc545 "dev_autofs_fd >= 0",
    file=file@entry=0x5556567fc3b4 "src/core/automount.c", line=line@entry=370,
    func=func@entry=0x5556567fd0b4 <__PRETTY_FUNCTION__.17397> "open_ioctl_fd") at src/shared/log.c:754
#6  0x00005556567b064a in open_ioctl_fd (dev_autofs_fd=-1, where=<optimized out>, devid=<optimized out>) at src/core/automount.c:370
#7  0x00005556567b10f6 in automount_send_ready (a=a@entry=0x555656b79110, tokens=0x555656c8b560, status=status@entry=0)
    at src/core/automount.c:469
#8  0x00005556567b360e in automount_update_mount (a=0x555656b79110, old_state=old_state@entry=MOUNT_DEAD,
    state=state@entry=MOUNT_MOUNTED) at src/core/automount.c:509
#9  0x00005556567ac9e8 in mount_notify_automount (state=MOUNT_MOUNTED, old_state=MOUNT_DEAD, m=0x555656b77000) at src/core/mount.c:588
#10 mount_set_state (m=m@entry=0x555656b77000, state=MOUNT_MOUNTED) at src/core/mount.c:619
#11 0x00005556567ad068 in mount_coldplug (u=0x555656b77000, deferred_work=<optimized out>) at src/core/mount.c:671
#12 0x000055565679c589 in unit_coldplug (u=0x555656b77000, deferred_work=deferred_work@entry=0x555656d3e070) at src/core/unit.c:2886
#13 0x00005556566f031e in manager_coldplug (m=m@entry=0x555656ac5980) at src/core/manager.c:1125
#14 0x00005556566f4a7a in manager_startup (m=0x555656ac5980, serialization=0x555656ac5230, fds=<optimized out>)
    at src/core/manager.c:1288
#15 0x00005556566ea4e3 in main (argc=4, argv=0x7ffe78ac9848) at src/core/main.c:1798


(gdb) p *a
$11 = {meta = {manager = 0x555656ac5980, type = UNIT_AUTOMOUNT, load_state = UNIT_LOADED, merged_into = 0x0,
    id = 0x555656b29ce0 "proc-sys-fs-binfmt_misc.automount", instance = 0x0, names = 0x555656b79450, dependencies = {0x555656b78500,
      0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x555656b794f0, 0x0, 0x0, 0x0, 0x0, 0x555656b76980, 0x555656b784c0, 0x0, 0x555656b76710,
      0x0, 0x0, 0x0, 0x0, 0x555656b769f0, 0x555656b79530}, requires_mounts_for = 0x555656b76750,
    description = 0x555656b76eb0 "Arbitrary Executable File Formats File System Automount Point", documentation = 0x555656b76960,
    fragment_path = 0x555656b6e540 "/usr/lib/systemd/system/proc-sys-fs-binfmt_misc.automount", source_path = 0x0, dropin_paths = 0x0,
    fragment_mtime = 1595213181000000, source_mtime = 0, dropin_mtime = 0, job = 0x0, nop_job = 0x0, job_timeout = 0,
    job_timeout_action = EMERGENCY_ACTION_NONE, job_timeout_reboot_arg = 0x0, refs_by_target = 0x0, conditions = 0x555656b769c0,
    asserts = 0x0, condition_timestamp = {realtime = 1591608832758220, monotonic = 70060990954163}, assert_timestamp = {
      realtime = 1591608832758232, monotonic = 70060990954175}, inactive_exit_timestamp = {realtime = 1591608832758666,
      monotonic = 70060990954609}, active_enter_timestamp = {realtime = 1591608832758666, monotonic = 70060990954609},
    active_exit_timestamp = {realtime = 1591608832758141, monotonic = 70060990954084}, inactive_enter_timestamp = {
      realtime = 1591608832758141, monotonic = 70060990954084}, slice = {source = 0x0, target = 0x0, refs_by_target_next = 0x0,
      refs_by_target_prev = 0x0}, units_by_type_next = 0x0, units_by_type_prev = 0x0, has_requires_mounts_for_next = 0x0,
    has_requires_mounts_for_prev = 0x0, load_queue_next = 0x0, load_queue_prev = 0x0, dbus_queue_next = 0x0, dbus_queue_prev = 0x0,
    cleanup_queue_next = 0x0, cleanup_queue_prev = 0x0, gc_queue_next = 0x555656b78840, gc_queue_prev = 0x555656b796c0,
    cgroup_queue_next = 0x0, cgroup_queue_prev = 0x0, target_deps_queue_next = 0x0, target_deps_queue_prev = 0x0, pids = 0x0,
    sigchldgen = 0, gc_marker = 0, auto_stop_ratelimit = {interval = 10000000, begin = 0, burst = 16, num = 0}, deserialized_job = -1,
    load_error = 0, unit_file_state = _UNIT_FILE_STATE_INVALID, unit_file_preset = -1, cgroup_path = 0x0, cgroup_realized_mask = 0,
    cgroup_subtree_mask = 0, cgroup_members_mask = 0, on_failure_job_mode = JOB_REPLACE, stop_when_unneeded = false,
    default_dependencies = false, refuse_manual_start = false, refuse_manual_stop = false, allow_isolate = false,
    ignore_on_isolate = true, ignore_on_snapshot = false, condition_result = true, assert_result = true, transient = false,
    in_load_queue = false, in_dbus_queue = false, in_cleanup_queue = false, in_gc_queue = true, in_cgroup_queue = false,
    in_target_deps_queue = false, sent_dbus_new_signal = true, no_gc = false, in_audit = false, cgroup_realized = false,
    cgroup_members_mask_valid = true, cgroup_subtree_mask_valid = true}, state = AUTOMOUNT_DEAD,
  deserialized_state = AUTOMOUNT_RUNNING, where = 0x555656b76fd0 "/proc/sys/fs/binfmt_misc", timeout_idle_usec = 0, pipe_fd = 24,
  pipe_event_source = 0x0, directory_mode = 493, dev_id = 1048609, tokens = 0x555656c8b560, expire_tokens = 0x0,
  expire_event_source = 0x0, result = AUTOMOUNT_SUCCESS}

(gdb) p *a->tokens
$10 = {b = {hash_ops = 0x555656a4a6d0 <trivial_hash_ops>, {indirect = {storage = 0x3 <Address 0x3 out of bounds>,
        hash_key = '\000' <repeats 15 times>, n_entries = 0, n_buckets = 0, idx_lowest_entry = 4294967040, _pad = "\000\000"},
      direct = {storage = "\003", '\000' <repeats 32 times>, "\377\377\377\000\000"}}, type = HASHMAP_TYPE_SET, has_indirect = false,
    n_direct_entries = 1, from_pool = false}}

a->tokens is non-empty and dev_autofs_fd==-1,so the assertion fails:

 458 static int automount_send_ready(Automount *a, Set *tokens, int status) {
 459         _cleanup_close_ int ioctl_fd = -1;
 460         unsigned token;
 461         int r;
 462
 463         assert(a);
 464         assert(status <= 0);
 465
 466         if (set_isempty(tokens))
 467                 return 0;
 468
 469         ioctl_fd = open_ioctl_fd(UNIT(a)->manager->dev_autofs_fd, a->where, a->dev_id);

2,a->tokens change process analysis

Precondition:

  • Ensure that the proc-sys-fs-binfmt_misc.automount service is active;
  • Ensure that proc-sys-fs-binfmt_misc.mount is inactive;
  • Execute command: ls /proc/sys/fs/binfmt_misc

We can observe the following changes in a->tokens:

step a: first trigger packet.v5_packet.wait_queue_token to be added to a->tokens (via /dev/autofs), as follows:

 manager_loop
-> sd_event_dispatch
-> source_dispatch
-> automount_dispatch_io
-> set_put(a->tokens, UINT_TO_PTR(packet.v5_packet.wait_queue_token));

step b:then trigger the deletion of tokens, and a->tokens becomes empty again (via /proc/1/mountinfo), as follows:

manager_loop
-> sd_event_dispatch
-> source_dispatch
-> manager_dispatch_signal_fd
-> manager_dispatch_sigchld
-> mount_sigchld_event
-> mount_set_state
-> mount_notify_automount
-> automount_update_mount
-> automount_send_ready
-> set_steal_first(tokens)

If we continue to execute systemctl daemon-reexec, even though manager->dev_autofs_fd is also -1, because a->tokens is empty, it will return directly, there will be no problem.

But if for some reason, step a is executed, step b is not executed, and then execute systemctl daemon-reexec, it will definitely trigger a failure.
Based on this, we could reproduce it.

3,How to reproduce:

Construct a mount path exceeding 256 characters:

# mkdir -p /run/kata-containers/shared/sandboxes/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74-a1bed3c11a474518-aaaaaa_xxxx_mix_xxxx_container_role_20200310112829109807.yyyy_container_role_20200310112829109807_15_81

# mkdir -p /tmp/test

# mount --bind /tmp/test   /run/kata-containers/shared/sandboxes/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74-a1bed3c11a474518-aaaaaa_xxxx_mix_xxxx_container_role_20200310112829109807.yyyy_container_role_20200310112829109807_15_81

#  ls -l /proc/1/fd | grep mount
/proc/1/mountinfo still exists

# systemctl daemon-reload
#  ls -l /proc/1/fd | grep mount

/proc/1/mountinfo will disappear

Ensure that the proc-sys-fs-binfmt_misc.automount is active
Ensure that proc-sys-fs-binfmt_misc.mount is inactive

# ls  /proc/sys/fs/binfmt_misc
proc-sys-fs-binfmt_misc.mount will change from inactive to acitve


# umount  /proc/sys/fs/binfmt_misc

# stat /proc/sys/fs/binfmt_misc/  
It will stay stuck, just like issue https://github.com/systemd/systemd/issues/15221

Finally, execute the following command in another shell terminal:

# systemctl daemon-reexec

systemd will crash immediately

4, How to fix

It may be necessary to merge the following patches:

a, commit ba0d56f55f2073164799be714b5bd1aad94d059a (“mount: don't propagate errors from mount_setup_unit() further up”)
commit ba0d56f ("mount: don't propagate errors from mount_setup_unit() further up")

-> prevent /proc/1/mountinfo from being affected when the mount path exceeds 256 characters;

It has been merged after 73.el7_8.5.

b, The following code snippet of commit fae03ed (“automount: rework propagation between automount and mount units”):

     /* Don't propagate state changes from the mount if we are already down */
     if (!IN_SET(a->state, AUTOMOUNT_WAITING, AUTOMOUNT_RUNNING))
             return;

->when the automount status is down, do not propagate the status change.

This patch is also NEEDED, thanks

Some cgroup attributes are not serialized and will be lost after systemctl daemon-reload

1, Suppose a service has properties such as CPUQuota, MemoryLimit, and TasksMax, such as:

[root@iZbp16y9caqsjb7sl49vyfZ ~]# cat /usr/lib/systemd/system/chronyd.service 
...
[Service]
Type=forking
PIDFile=/var/run/chrony/chronyd.pid
EnvironmentFile=-/etc/sysconfig/chronyd
ExecStart=/usr/sbin/chronyd $OPTIONS
ExecStartPost=/usr/libexec/chrony-helper update-daemon
PrivateTmp=yes
ProtectHome=yes
ProtectSystem=full
CPUQuota=20%
MemoryLimit=100M
TasksMax=10
…

2, After the system is started, we can query the correct TasksCurrent and CGroup mask values, such as:

# systemctl show -p TasksCurrent chronyd
TasksCurrent=1


# systemd-analyze dump
…
-> Unit chronyd.service:
        Description: NTP client/server
        Instance: n/a
        Unit Load State: loaded
        Unit Active State: active
        Inactive Exit Timestamp: Sat 2020-09-12 17:43:22 CST
        Active Enter Timestamp: Sat 2020-09-12 17:43:22 CST
        Active Exit Timestamp: Sat 2020-09-12 17:43:21 CST
        Inactive Enter Timestamp: Sat 2020-09-12 17:43:21 CST
        May GC: no
        Need Daemon Reload: no
        Transient: no
        Slice: system.slice
        CGroup: /system.slice/chronyd.service
        CGroup realized: yes
        CGroup mask: 0x2b
        CGroup members mask: 0x0
        Name: chronyd.service
...

3,If systemctl daemon-reload is executed, the above values will be quietly changed to illegal values, such as:

# systemctl daemon-reload
# systemctl show -p TasksCurrent chronyd
TasksCurrent=18446744073709551615                        ----》 Changed to -1

# systemd-analyze dump
…
-> Unit chronyd.service:
        Description: NTP client/server
        Instance: n/a
        Unit Load State: loaded
        Unit Active State: active
        Inactive Exit Timestamp: Sat 2020-09-12 17:43:22 CST
        Active Enter Timestamp: Sat 2020-09-12 17:43:22 CST
        Active Exit Timestamp: Sat 2020-09-12 17:43:21 CST
        Inactive Enter Timestamp: Sat 2020-09-12 17:43:21 CST
        May GC: no
        Need Daemon Reload: no
        Transient: no
        Slice: system.slice
        CGroup: /system.slice/chronyd.service
        CGroup realized: yes
        CGroup mask: 0x0                                  ----》 Changed to 0
        CGroup members mask: 0x0
        Name: chronyd.service

linux boot sometimes dead waiting for /sysroot of a RAID0 btrfs filesystem with SSD 1.92T *10

linux boot sometimes dead waiting for /sysroot of a RAID0 btrfs filesystem with SSD 1.92T *10

console output when dead waiting:
A start job is running for dev-disk-by\x2dlabel-OS_T604.device (3min28s / no limit)
OS: rhel 7.8/7.9
Frequency: about 10%

We add "rd.retry=30 rd.timeout=90" to GRUB_CMDLINE_LINUX, then

[ 102.916820] localhost systemd[1]: Job dev-disk-by\x2dlabel-OS_T640.device/start timed out.
[ 102.917137] localhost systemd[1]: Timed out waiting for device dev-disk-by\x2dlabel-OS_T640.device.
[ 102.917393] localhost systemd[1]: Dependency failed for File System Check on /dev/disk/by-label/OS_T640.
[ 102.917626] localhost systemd[1]: Dependency failed for /sysroot.
[ 102.917863] localhost systemd[1]: Dependency failed for Initrd Root File System.
[ 102.918103] localhost systemd[1]: Dependency failed for Reload Configuration from the Real Root.

But , 'mount -L OS_T640 /dir1' is OK in this dracut shell. and becuase of the 10% frequency, it seems a race problem of systemd or udev.

we attach these files too.
/run/initramfs/rdsosreport.txt
/boot/config-5.4.83-1.1.el7.x86_64

Systemctl daemon-reload fails with SEGV. The fatal pointer is only 2 bit different from the expected value

systemd version the issue has been seen with

219

Used distribution

CentOS7

Linux kernel version used

4.19.48-006

CPU architectures issue was seen on

aarch64

Component

systemd

Expected behaviour you didn't see

systemctl daemon-reload reloads the daemon successfully without any exception, and the unit linked list is intact.

Unexpected behaviour you saw

systemctl daemon-reload caused segmentation fault in systemd.
The coredump suggests it crashes in

unic.c: unit_free(Unit *u)
if (u->type != _UNIT_TYPE_INVALID)
LIST_REMOVE(units_by_type, u->manager->units_by_type[u->type], u);

image

Here, the linked list is broken.
u->units_by_type_prev points to a wrong address, 0xaaaa87549290, whereas the correct one is 0xaaaad7549290.
Comparing 0xaaaa87549290 and 0xaaaad7549290 , there is only 2 bit difference, 0x000050000000.

image

I know systemd 219 is a thing from the last decade. I'm sorry to ask help in this manner. But I've struggled for weeks. I searched all the commit history but found nothing related. I am grateful if someone has any suggestion on this. Thanks a lot.

Steps to reproduce the problem

Unknown

Additional program output to the terminal or log subsystem illustrating the issue

No response

Mount points continue to increase without decreasing, exhausting MANAGER_MAX_NAMES and causing DoS

We have encountered a problem. All systemctl commands cannot be executed.
Some errors similar to the following are reported:

Jul 13 19:37:17 e99g07484.et2 dbus[2155]: [system] Activating via systemd: service name='org.freedesktop.PolicyKit1' unit='polkit.service'
Jul 13 19:37:17 e99g07484.et2 dbus[2155]: [system] Activation via systemd failed for unit 'polkit.service': Argument list too long
Jul 13 19:37:17 e99g07484.et2 dbus[2155]: [system] Activating via systemd: service name='org.freedesktop.PolicyKit1' unit='polkit.service'
Jul 13 19:37:17 e99g07484.et2 dbus[2155]: [system] Activation via systemd failed for unit 'polkit.service': Argument list too long

I collected a coredump for analysis and found that the number of n_entries in m->units reached 131072.

(gdb) p m->units
$2 = (Hashmap *) 0x56087eba5290
(gdb) p *m->units
$3 = {b = {hash_ops = 0x56087e59b6e0 <string_hash_ops>, {indirect = {storage = 0x7f87a35fc010 "0\222\275\205\bV",
        hash_key = "H\206\243\250\273$\033\275\224\213\207\025\326p\214\300", n_entries = 131072, n_buckets = 246723,
        idx_lowest_entry = 0, _pad = "\000\000"}, direct = {
        storage = "\020\300_\243\207\177\000\000H\206\243\250\273$\033\275\224\213\207\025\326p\214\300\000\000\002\000\303\303\003\000\000\000\000\000\000\000"}}, type = HASHMAP_TYPE_PLAIN, has_indirect = true, n_direct_entries = 0, from_pool = false}}
(gdb)


 * n_entries = 131072 *

#define MANAGER_MAX_NAMES 131072 /* 128K */

I went on to parse the units details and found that most of the units (13W +) are mounts.

Use the following GDB command to traverse the linked list:

$ cat .gdbinit
define dump_mount_list
set $_node = (Unit *)$arg0
set $_num = 0
while ($_node)
printf "addr: %p, mount->id: %s, source_path: %s\n", $_node, $_node->id, $_node->source_path
set $_node = $_node->units_by_type_next
set $_num = $_num + 1
end
printf "num is %d\n", $_num
end

enum UnitType {
        UNIT_SERVICE = 0,
        UNIT_SOCKET,
        UNIT_BUSNAME,
        UNIT_TARGET,
        UNIT_SNAPSHOT,
        UNIT_DEVICE,
        UNIT_MOUNT,
        UNIT_AUTOMOUNT,

(gdb) p m->units_by_type[6]
$1 = (Unit *) 0x5608a73630f0

130,000 + mount points will be printed:

addr: 0x5608a73630f0, mount->id: home-t4-pouch-containers-5b1dce60939b18d5661d9b6d498c65d08178121f7b95c1481920379acb45dcec-rootfs.mount, source_path: /proc/self/mountinfo
addr: 0x5608a73829f0, mount->id: home-t4-pouch-containerd-state-io.containerd.runtime.v1.linux-default-48b9c8cefd5c953bbf3303e8b4ea7b04a777ccfc789e05e5adf5ebddb834b958-rootfs.mount, source_path: /proc/self/mountinfo
addr: 0x5608a739c240, mount->id: home-t4-pouch-containers-48b9c8cefd5c953bbf3303e8b4ea7b04a777ccfc789e05e5adf5ebddb834b958-rootfs.mount, source_path: /proc/self/mountinfo
addr: 0x5608a7390680, mount->id: home-t4-pouch-containerd-state-io.containerd.runtime.v1.linux-default-8356eb46281e7fbe2c5da86d1a62eb4f93658cb7e3a4c4c854b656921649e1a4-rootfs.mount, source_path: /proc/self/mountinfo
addr: 0x5608a7377040, mount->id: home-t4-pouch-containers-8356eb46281e7fbe2c5da86d1a62eb4f93658cb7e3a4c4c854b656921649e1a4-rootfs.mount, source_path: /proc/self/mountinfo
addr: 0x5608a7267b40, mount->id: home-t4-pouch-containerd-state-io.containerd.runtime.v1.linux-default-6b3ce6d5a5f2126b6c0df9ac3663f8a4e3fc553e8952aa7665f622f662f7f154-rootfs.mount, source_path: /proc/self/mountinfo
addr: 0x5608a732e180, mount->id: home-t4-pouch-containers-6b3ce6d5a5f2126b6c0df9ac3663f8a4e3fc553e8952aa7665f622f662f7f154-rootfs.mount, source_path: /proc/self/mountinfo
……

After analyzing the code, I found the following possible bugs:

static int mount_dispatch_io(sd_event_source *source, int fd, uint32_t revents, void *userdata) {
……
        r = mount_load_proc_self_mountinfo(m, true);         -->Add the data from /proc/self/mountinfo to m->units
        if (r < 0) {
                /* Reset flags, just in case, for later calls */
                LIST_FOREACH(units_by_type, u, m->units_by_type[UNIT_MOUNT]) {
                        Mount *mount = MOUNT(u);

                        mount->is_mounted = mount->just_mounted = mount->just_changed = false;
                }

                return 0;             -->If returned here, the data in m->units will only increase, not decrease
        }

        manager_dispatch_load_queue(m);

        LIST_FOREACH(units_by_type, u, m->units_by_type[UNIT_MOUNT]) {
…                             -->The code here will clean up the residual data in m->units
        } 
…
}

I also constructed a use case to reproduce the bug.

A, Construct a path greater than 256 characters (so mount_load_proc_self_mountinfo () returns an error code):

# mkdir -p /run/kata-containers/shared/sandboxes/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74-a1bed3c11a474518-aaaaaa_xxxx_mix_xxxx_container_role_20200310112829109807.yyyy_container_role_20200310112829109807_15_81
# mkdir -p /tmp/test
# mount --bind /tmp/test   /run/kata-containers/shared/sandboxes/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74/f0ea3efdb417f442128830e86118cf216d1d236d6f970205a680972bcd062f74-a1bed3c11a474518-aaaaaa_xxxx_mix_xxxx_container_role_20200310112829109807.yyyy_container_role_20200310112829109807_15_81

B, Mount some directories, then umount them

# mkdir -p ./a1 ./b1
# mount --bind a1 b1
# 
# mkdir -p ./a2 ./b2
# mount --bind a2 b2
# 
# mkdir -p ./a3 ./b3
# mount --bind a3 b3
# 
# umount  b1
# umount  b2
# umount  b3

C, Finally, through GDB analysis, it can be found that these mount points are still in m->units:

Breakpoint 1, mount_dispatch_io (source=0x55a04d0f4910, fd=9, revents=10, userdata=0x55a04d0ec040) at src/core/mount.c:1711
1711	static int mount_dispatch_io(sd_event_source *source, int fd, uint32_t revents, void *userdata) {
(gdb) bt
#0  mount_dispatch_io (source=0x55a04d0f4910, fd=9, revents=10, userdata=0x55a04d0ec040) at src/core/mount.c:1711
#1  0x000055a04c7ab6f0 in source_dispatch (s=s@entry=0x55a04d0f4910) at src/libsystemd/sd-event/sd-event.c:2115
#2  0x000055a04c7ac78a in sd_event_dispatch (e=0x55a04d0ec5e0) at src/libsystemd/sd-event/sd-event.c:2472
#3  0x000055a04c7ac92f in sd_event_run (e=<optimized out>, timeout=<optimized out>) at src/libsystemd/sd-event/sd-event.c:2501
#4  0x000055a04c70c2c3 in manager_loop (m=0x55a04d0ec040) at src/core/manager.c:2274
#5  0x000055a04c7006b1 in main (argc=5, argv=0x7ffe1edfaae8) at src/core/main.c:1819
(gdb) frame 5
#5  0x000055a04c7006b1 in main (argc=5, argv=0x7ffe1edfaae8) at src/core/main.c:1819
1819	                r = manager_loop(m);
(gdb) p  m->units_by_type[6]
$1 = (Unit *) 0x55a04d108c80
(gdb) dump_mount_list  0x55a04d108c80
addr: 0x55a04d108c80, mount->id: mnt-work-issues-systemd_maxunits-b3.mount, source_path: /proc/self/mountinfo
addr: 0x55a04d15e120, mount->id: mnt-work-issues-systemd_maxunits-b2.mount, source_path: /proc/self/mountinfo
...
addr: 0x55a04d0ef580, mount->id: run-user-0.mount, source_path: /proc/self/mountinfo
addr: 0x55a04d17ca10, mount->id: mnt-work-issues-systemd_maxunits-b1.mount, source_path: /proc/self/mountinfo
....

Similar bugs:
https://access.redhat.com/solutions/4620671
systemd/systemd#15221
kubernetes/kubernetes#57345

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.