Comments (11)
Perhaps if IO is overloaded you could just discard the output, that would be better than a hard exit, for example:
def read_log_files(self, units):
BUFSIZE = 8192
for unit in units:
if unit in self._log_file:
new_text = b""
while True:
buf = os.read(self._log_file[unit], BUFSIZE)
if not buf:
break
new_text += buf
text = self._log_hold[unit] + new_text
if not text:
continue
lines = text.split(b"\n")
if not text.endswith(b"\n"):
self._log_hold[unit] = lines[-1]
lines = lines[:-1]
for line in lines:
prefix = unit.encode("utf-8")
content = prefix + b": " + line + b"\n"
try:
os.write(1, content)
try:
os.fsync(1)
except Exception:
pass
except BlockingIOError:
pass
from docker-systemctl-replacement.
Sorry, looking at this a bit further it seems that systemctl might be contributing to the high IO. I applied the above patch so that it's no longer exiting, but where there are failing servicehelpers it seems to be producing excessive log outputs such as the below.
Unfortunately we do not have full control over the services inside the container as it's used much like a VPS (with strict security policies).
Should there be a back-off and then eventual failure for these servicehelpers? or can we rate-limit them?
strace: Process 3231129 attached
read(3, " [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15262' (uid=0 pid=386450 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15262' (uid=0 pid=386450 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15263' (uid=0 pid=386699 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15263' (u"..., 8192) = 8192
mmap(NULL, 75100160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a5106800000
munmap(0x7a510b000000, 75091968) = 0
read(3, "us-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15275' (uid=0 pid=389750 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15276' (uid=0 pid=389982 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15276' (uid=0 pid=389982 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested "..., 8192) = 8192
mmap(NULL, 75108352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a510f800000
munmap(0x7a5106800000, 75100160) = 0
read(3, "return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15289' (uid=0 pid=393387 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15289' (uid=0 pid=393387 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15290' (uid=0 pid=393652 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timed"..., 8192) = 8192
mmap(NULL, 75116544, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a510b000000
munmap(0x7a510f800000, 75108352) = 0
read(3, "ed with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15302' (uid=0 pid=396348 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15303' (uid=0 pid=396611 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15303' (uid=0 pid=396611 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.f"..., 8192) = 8192
mmap(NULL, 75124736, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a5106800000
munmap(0x7a510b000000, 75116544) = 0
read(3, "unch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15316' (uid=0 pid=399622 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15316' (uid=0 pid=399622 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15317' (uid=0 pid=399841 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating ser"..., 8192) = 8192
mmap(NULL, 75132928, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a510f800000
munmap(0x7a5106800000, 75124736) = 0
read(3, "ate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15329' (uid=0 pid=402640 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15330' (uid=0 pid=402851 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15330' (uid=0 pid=402851 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system"..., 8192) = 8192
mmap(NULL, 75141120, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a510b000000
munmap(0x7a510f800000, 75132928) = 0
read(3, "reedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15343' (uid=0 pid=405936 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15343' (uid=0 pid=405936 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15344' (uid=0 pid=406154 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemo"..., 8192) = 8192
mmap(NULL, 75149312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7a5106800000
munmap(0x7a510b000000, 75141120) = 0
read(3, "d service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15356' (uid=0 pid=408942 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15357' (uid=0 pid=409205 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return code 1\ndbus-daemon[1068]: [system] Activating service name='org.freedesktop.timedate1' requested by ':1.15357' (uid=0 pid=409205 comm=\"/usr/lib/snapd/snapd \") (using servicehelper)\ndbus-daemon[1068]: [system] Activated service 'org.freedesktop.timedate1' failed: Launch helper exited with unknown return c"..., 8192) = 8192
from docker-systemctl-replacement.
The proposed patch is correct anyway.
But what I see is a number of services that can not start in a container anyway. There are some "igno_" lists in the python-file that excludes some enabled units from being included in the init-list of modules. Do we need to extend that list?
Additionally, try to run the container as a command-shell and check what services are included in the init-list
systemctl3.py default-services
If you see anything that looks like coming from the original systemd then it should be kicked out.
from docker-systemctl-replacement.
Thanks, here's the output:
root@ubtvnc:/# systemctl default-services
appbox_dbus.service
cron.service
nginx.service
panel.service
php8.1-fpm.service
snapd.aa-prompt-listener.service
snapd.apparmor.service
snapd.autoimport.service
snapd.core-fixup.service
snapd.recovery-chooser-trigger.service
snapd.seeded.service
snapd.service
ssh.service
vnstat.service
cups.service
docker.service
fail2ban.service
plymouth.service
pulseaudio-enable-autospawn.service
rsync.service
unattended-upgrades.service
uuidd.service
Is snapd already included the in ignore list?
from docker-systemctl-replacement.
I never had containerized a snapd app. Probably kill the appbox_dbus service as well as I am suspicious that it runs addtional services. May be you can test a bit with adding patterns to the igno_-lists?
from docker-systemctl-replacement.
Snapd doesn't work in docker as it requires real systemd, so we can add
igno_always = ["network*", "dbus*", "systemd-*", "kdump*", "kmod*", "snapd*"]
However some really big journal logs have been created for example:
-rw-r--r--+ 1 root systemd-journal 92M Mar 21 09:37 snapd.service.log
It seems now like it's using 100% CPU reading these log files as I can see it reading errors which have been resolved. Sorry for my ignorance, I'm not a systemd expert, but what's the reason we're reading the whole log file?
EDIT: Testing this on a few other containers with the issue, deleting the log files reduces the CPU usage from 100% to 0.7% even when services are failing, so this seems like the real issue regarding the resource usage.
from docker-systemctl-replacement.
Setting the buffersize to 10MB resolves the issue:
def read_log_files(self, units):
BUFSIZE=10485760
Are logs ever rotated? I know on normal systemd systemd-journald rotates the journal files
EDIT: Looks like it's not implemented (#41) are there any work arounds for this other than running a custom service to just rotate the log files (we would need to add jitter as we run many containers per node)?
from docker-systemctl-replacement.
Logs are not rotated, you are right, and the read-log-loop is about duplicating the log lines from each application service into the container log (docker log), so that messages from the applications become visible outside the container.
The current implementation runs the service processes with the log files attached, so it is not really possible to just rotate them as systemd does.
There has been just an idea floating around to punch holes in the log files aka sparse files. It may be possible to analyse systemd-journald taking over some idea. But I did not try so far. In reality it was good enough to fix the problems on the applications themselves that led to too many output lines.
from docker-systemctl-replacement.
Thanks @gdraheim, so I could use fallocate
to in essence delete the logs from the offset (0) to the length? I could setup a service to do this once per day and it would suffice.
from docker-systemctl-replacement.
Well, that sounds like a reasonable workaround. I hope that one day, the systemctl initloop could throw away parts beyond a certain limit.
from docker-systemctl-replacement.
I am thinking to just extend the code in a fork to add a function to fallocate the log files if they're over 5MB (and truncate them to 5MB), I would add a last run variable to check if it has run during the current day, and if not schedule it for a random time.
it's probably too opinionated to be merged, but I can make a PR if you're interested.
from docker-systemctl-replacement.
Related Issues (20)
- Version tag 1.5.1706 HOT 1
- 'systemctl' not working HOT 16
- Starting a service during image build / docker file, hangs HOT 7
- Possible to work with ansible service_facts module? HOT 1
- systemctl3.py not working on Rocky Linux 8 HOT 1
- Misleading error message interaction between ExecStart= and ExecStartPost= HOT 3
- systemctl enable doesn't copy service file to right place. HOT 1
- When control-C on systemctl log -f xxx, it raise an ugly KeyboardInterrupt HOT 2
- systemctl disable not disabling service at container restart HOT 1
- can this script be used on ubuntu 22.04? HOT 1
- How is this suppoused to be ussed? HOT 1
- ERROR:systemctl: nextcloud.service: Failed to parse service type, ignoring: exec HOT 1
- Install to Centos6.8 HOT 1
- StartLimitIntervalSec=0 or how to Restart Forever HOT 1
- status command shows unit files ignored by SystemD HOT 2
- systemctl show does not respect multiple `-p/--property` switches HOT 1
- Can't understand how logs and tty are working
- Support for systemd-inhibit? HOT 1
- systemctl start virtstoraged.service Startup exception HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from docker-systemctl-replacement.