Code Monkey home page Code Monkey logo

Comments (11)

Rid avatar Rid commented on September 2, 2024

Perhaps if IO is overloaded you could just discard the output, that would be better than a hard exit, for example:

def read_log_files(self, units):
    BUFSIZE = 8192
    for unit in units:
        if unit in self._log_file:
            new_text = b""
            while True:
                buf = os.read(self._log_file[unit], BUFSIZE)
                if not buf:
                    break
                new_text += buf
            text = self._log_hold[unit] + new_text
            if not text:
                continue
            lines = text.split(b"\n")
            if not text.endswith(b"\n"):
                self._log_hold[unit] = lines[-1]
                lines = lines[:-1]
            for line in lines:
                prefix = unit.encode("utf-8")
                content = prefix + b": " + line + b"\n"
                try:
                    os.write(1, content)
                    try:
                        os.fsync(1)
                    except Exception:
                        pass
                except BlockingIOError:
                    pass 

from docker-systemctl-replacement.

Rid avatar Rid commented on September 2, 2024

Sorry, looking at this a bit further it seems that systemctl might be contributing to the high IO. I applied the above patch so that it's no longer exiting, but where there are failing servicehelpers it seems to be producing excessive log outputs such as the below.

Unfortunately we do not have full control over the services inside the container as it's used much like a VPS (with strict security policies).

Should there be a back-off and then eventual failure for these servicehelpers? or can we rate-limit them?

strace: Process 3231129 attached
read(3, " [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15262' (uid=0 pid=386450 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15262' (uid=0 pid=386450 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15263' (uid=0 pid=386699 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15263' (u"..., 8192) = 8192
mmap(NULL, 75100160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a5106800000
munmap(0x7a510b000000, 75091968)        = 0
read(3, "us-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15275' (uid=0 pid=389750 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15276' (uid=0 pid=389982 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15276' (uid=0 pid=389982 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested "..., 8192) = 8192
mmap(NULL, 75108352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a510f800000
munmap(0x7a5106800000, 75100160)        = 0
read(3, "return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15289' (uid=0 pid=393387 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15289' (uid=0 pid=393387 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15290' (uid=0 pid=393652 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timed"..., 8192) = 8192
mmap(NULL, 75116544, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a510b000000
munmap(0x7a510f800000, 75108352)        = 0
read(3, "ed with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15302' (uid=0 pid=396348 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15303' (uid=0 pid=396611 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15303' (uid=0 pid=396611 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.f"..., 8192) = 8192
mmap(NULL, 75124736, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a5106800000
munmap(0x7a510b000000, 75116544)        = 0
read(3, "unch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15316' (uid=0 pid=399622 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15316' (uid=0 pid=399622 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15317' (uid=0 pid=399841 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating ser"..., 8192) = 8192
mmap(NULL, 75132928, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a510f800000
munmap(0x7a5106800000, 75124736)        = 0
read(3, "ate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15329' (uid=0 pid=402640 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15330' (uid=0 pid=402851 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15330' (uid=0 pid=402851 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system"..., 8192) = 8192
mmap(NULL, 75141120, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a510b000000
munmap(0x7a510f800000, 75132928)        = 0
read(3, "reedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15343' (uid=0 pid=405936 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15343' (uid=0 pid=405936 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15344' (uid=0 pid=406154 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemo"..., 8192) = 8192
mmap(NULL, 75149312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a5106800000
munmap(0x7a510b000000, 75141120)        = 0
read(3, "d service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15356' (uid=0 pid=408942 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15357' (uid=0 pid=409205 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15357' (uid=0 pid=409205 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return c"..., 8192) = 8192

from docker-systemctl-replacement.

gdraheim avatar gdraheim commented on September 2, 2024

The proposed patch is correct anyway.

But what I see is a number of services that can not start in a container anyway. There are some "igno_" lists in the python-file that excludes some enabled units from being included in the init-list of modules. Do we need to extend that list?

Additionally, try to run the container as a command-shell and check what services are included in the init-list

systemctl3.py default-services

If you see anything that looks like coming from the original systemd then it should be kicked out.

from docker-systemctl-replacement.

Rid avatar Rid commented on September 2, 2024

Thanks, here's the output:

root@ubtvnc:/# systemctl default-services
appbox_dbus.service
cron.service
nginx.service
panel.service
php8.1-fpm.service
snapd.aa-prompt-listener.service
snapd.apparmor.service
snapd.autoimport.service
snapd.core-fixup.service
snapd.recovery-chooser-trigger.service
snapd.seeded.service
snapd.service
ssh.service
vnstat.service
cups.service
docker.service
fail2ban.service
plymouth.service
pulseaudio-enable-autospawn.service
rsync.service
unattended-upgrades.service
uuidd.service

Is snapd already included the in ignore list?

from docker-systemctl-replacement.

gdraheim avatar gdraheim commented on September 2, 2024

I never had containerized a snapd app. Probably kill the appbox_dbus service as well as I am suspicious that it runs addtional services. May be you can test a bit with adding patterns to the igno_-lists?

from docker-systemctl-replacement.

Rid avatar Rid commented on September 2, 2024

Snapd doesn't work in docker as it requires real systemd, so we can add

    igno_always = ["network*", "dbus*", "systemd-*", "kdump*", "kmod*", "snapd*"]

However some really big journal logs have been created for example:

-rw-r--r--+ 1 root systemd-journal  92M Mar 21 09:37 snapd.service.log

It seems now like it's using 100% CPU reading these log files as I can see it reading errors which have been resolved. Sorry for my ignorance, I'm not a systemd expert, but what's the reason we're reading the whole log file?

EDIT: Testing this on a few other containers with the issue, deleting the log files reduces the CPU usage from 100% to 0.7% even when services are failing, so this seems like the real issue regarding the resource usage.

from docker-systemctl-replacement.

Rid avatar Rid commented on September 2, 2024

Setting the buffersize to 10MB resolves the issue:

    def read_log_files(self, units):
        BUFSIZE=10485760

Are logs ever rotated? I know on normal systemd systemd-journald rotates the journal files

EDIT: Looks like it's not implemented (#41) are there any work arounds for this other than running a custom service to just rotate the log files (we would need to add jitter as we run many containers per node)?

from docker-systemctl-replacement.

gdraheim avatar gdraheim commented on September 2, 2024

Logs are not rotated, you are right, and the read-log-loop is about duplicating the log lines from each application service into the container log (docker log), so that messages from the applications become visible outside the container.

The current implementation runs the service processes with the log files attached, so it is not really possible to just rotate them as systemd does.

There has been just an idea floating around to punch holes in the log files aka sparse files. It may be possible to analyse systemd-journald taking over some idea. But I did not try so far. In reality it was good enough to fix the problems on the applications themselves that led to too many output lines.

from docker-systemctl-replacement.

Rid avatar Rid commented on September 2, 2024

Thanks @gdraheim, so I could use fallocate to in essence delete the logs from the offset (0) to the length? I could setup a service to do this once per day and it would suffice.

from docker-systemctl-replacement.

gdraheim avatar gdraheim commented on September 2, 2024

Well, that sounds like a reasonable workaround. I hope that one day, the systemctl initloop could throw away parts beyond a certain limit.

from docker-systemctl-replacement.

Rid avatar Rid commented on September 2, 2024

I am thinking to just extend the code in a fork to add a function to fallocate the log files if they're over 5MB (and truncate them to 5MB), I would add a last run variable to check if it has run during the current day, and if not schedule it for a random time.

it's probably too opinionated to be merged, but I can make a PR if you're interested.

from docker-systemctl-replacement.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.