containers / conmon Goto Github PK
View Code? Open in Web Editor NEWAn OCI container runtime monitor.
License: Apache License 2.0
An OCI container runtime monitor.
License: Apache License 2.0
It should also support more, but these log drivers should be fairly trivial to implement
Based on how runc creates a console socket (only one fd for stdin and stdout), conmon currently isn't correctly figuring out if a character written to a tty is a new stdout line or a character of stdin. As such, it is automatically treated as the former, and running with tty causes odd journald output like:
/ #
e
x
i
t
Where it should really be
/ # exit
when a user types exit into the console.
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
When pasting a large amount of text into a container environment (usually while editing a file in vim but also to a /bin/bash and /bin/sh shell prompt) the session hangs and there is no way to cancel, quit, exit, etc.
Steps to reproduce the issue:
podman run --rm -it busybox /bin/sh
vi test
Describe the results you received:
Terminal hangs. I have to open new session and stop the podman container and sometimes that doesn't stop the process so I have to search for it and kill by the process id.
Describe the results you expected:
All content pasted is now in file being edited in vim.
Additional information you deem important (e.g. issue happens only occasionally):
This happens when running PuTTY 0.74. It doesn't happen if running SSH from Windows 10 command line.
It also doesn't happen if I run the container using docker run --rm -it busybox /bin/sh
instead of using podman.
Output of podman version
:
Version: 1.9.3
RemoteAPI Version: 1
Go Version: go1.14.2
OS/Arch: linux/amd64
Output of podman info --debug
:
debug:
compiler: gc
gitCommit: ""
goVersion: go1.14.2
podmanVersion: 1.9.3
host:
arch: amd64
buildahVersion: 1.14.9
cgroupVersion: v1
conmon:
package: conmon-2.0.18-1.fc32.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.18, commit: 6e8799f576f11f902cd8a8d8b45b2b2caf636a85'
cpus: 12
distribution:
distribution: fedora
version: "32"
eventLogger: file
hostname: coreos.
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 5.7.8-200.fc32.x86_64
memFree: 6526885888
memTotal: 67493203968
ociRuntime:
name: runc
package: runc-1.0.0-144.dev.gite6555cc.fc32.x86_64
path: /usr/bin/runc
version: |-
runc version 1.0.0-rc10+dev
commit: fbdbaf85ecbc0e077f336c03062710435607dbf1
spec: 1.0.1-dev
os: linux
rootless: true
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.1.1-1.fc32.x86_64
version: |-
slirp4netns version 1.1.1
commit: bbf27c5acd4356edb97fa639b4e15e0cd56a39d5
libslirp: 4.3.1
SLIRP_CONFIG_VERSION_MAX: 2
swapFree: 0
swapTotal: 0
uptime: 168h 33m 41.65s (Approximately 7.00 days)
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- registry.centos.org
- docker.io
store:
configFile: /var/home/core/.config/containers/storage.conf
containerStore:
number: 19
paused: 0
running: 19
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mount_program:
Executable: /usr/bin/fuse-overlayfs
Package: fuse-overlayfs-1.1.2-1.fc32.x86_64
Version: |-
fusermount3 version: 3.9.1
fuse-overlayfs: version 1.1.0
FUSE library version 3.9.1
using FUSE kernel interface version 7.31
graphRoot: /var/home/core/.local/share/containers/storage
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "false"
imageStore:
number: 74
runRoot: /run/user/1000/containers
volumePath: /var/home/core/.local/share/containers/storage/volumes
Package info (e.g. output of rpm -q podman
or apt list podman
):
podman-1.9.3-1.fc32.x86_64
Additional environment details (AWS, VirtualBox, physical, etc.):
Fedora CoreOS
PuTTY 0.74
We'd like to implement a json file logger so managers have the ability to log in a json file format
The repository contains a CHANGELOG file, but the most recent recorded release is 1.0, and the latest release is 2.0.x - would be nice to have updates in there
on latest master
make static
builder for '/nix/store/8hk9h8rc1vp258haxrnhfhpa6nbhx6ly-e2fsprogs-1.45.5.drv' failed with exit code 2; last 10 log lines:
t_iexpand_full: expand inodes on a totally full filesystem: ok
t_uninit_bg_rm: remove uninit_bg: ok
r_move_itable_nostride: resize with flex_bg and stride value set: ok
r_move_itable_realloc: don't allocate inode table from in-use blocks: ok
r_bigalloc_big_expand: ext4 with bigalloc: ok
349 tests succeeded 1 tests failed
Tests failed: d_fallocate_blkmap
make[1]: *** [Makefile:395: test_post] Error 1
make[1]: Leaving directory '/build/e2fsprogs-1.45.5/tests'
make: *** [Makefile:419: check-recursive] Error 1
cannot build derivation '/nix/store/r3xg075n678skdqp0ic33gk92zsaj3p5-libarchive-3.4.3.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/s7hdrvklcm4qgpf41am0j0yyihjy7gp7-cmake-3.17.3.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/dayx2k511mwgljszcsd536xr0137r2fd-libfido2-1.4.0.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/lxaxnkbkzp5lka1b3y5kk1q5k9j9mw3z-libipt-2.0.1.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/a1ffd5ig4m4wjyamm5ifhf3d34wy3m29-gdb-9.2.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/a7hj4vb2s35b05jagy9zy6kz4n23sqhd-openssh-8.2p1.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/csim3fhx6f9padssqfkizcw7f43nsml5-git-2.27.0.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/cvp9mnlvgc4vgn6m0qkxpm5mpjv34b6p-python3.8-Cython-0.29.19.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/z5sl4v7pgwyx0kghsm1ycj044xdzpndl-conmon.drv': 1 dependencies couldn't be built
[1 built (1 failed), 0.0 MiB DL]
error: build of '/nix/store/z5sl4v7pgwyx0kghsm1ycj044xdzpndl-conmon.drv' failed
Hi,
the LICENSE file does not specify the copyright holder.
Line 189 in 8f7cb2a
thanks!
With respect to container logging and /dev/log, conmon is our proxy layer - it maintains the stderr/stdout/stdin fd's, allows console connections and attachments, and proxies those logs from the container's cgroup+namespaces up into the host's journald/syslog/whatever. This maintains appropriate metadata (i.e. systemd units and LogExtraFields) for messages generated from the conmon process.
Bind mounting /dev/log causes systemd to find the metadata from the machine.slice, which doesn't have the unit metadata on it. If we instead provide a simple proxy in conmon, which creates a "slproxy" dgram socket in the bundle folder, and forwards anything written to it to /dev/log on the host, then the origin of those packets will come from conmon, and the metadata in the journal will be correctly attributed to the unit.
Open to feedback on the PR, it's a proof of concept at this stage, but it seems to fit in conman as our "proxy to the host" role. I'd REALLY like to also have CONTAINER_ID and CONTAINER_NAME and CONTAINER_TAG in the journal, but the only way I can think to do that would be to have conmon read the dgram, parse syslog format, and then rebroadcast to the journal directly - and I'm thinking that's beyond the scope of what conmon should do.
Often, users come with problems that happen in conmon and want to know what is happening. We should allow a --conmon-log-file option for conmon to write to a log file as well as syslog. that way, it will be easier to get the logs from users on non-systemd distributions.
I create a custom compiled binary of conmon from conmon latest repo
When using podman with this customer conmon binary using the above command line option, it is not reflecting those changes in the conmon.
podman --version
podman version 2.1.1
Custom conmon binary:
conmon version 2.0.22-dev
Even if it's very rudimentary, there should be an install guide for conmon.
It would be nice if you could use $(PKG_CONFIG)
instead of just hard-coding pkg-config
.
sed -e 's/pkg-config/$(PKG_CONFIG)/g' -i Makefile
This way it will get the path and flags and other local overrides, it can still default as previous.
PKG_CONFIG ?= pkg-config
I have
$ rpm -q conmon podman
conmon-2.0.16-2.fc32.x86_64
podman-1.9.2-1.fc32.x86_64
(On a locally built f32 silverblue style system)
Seeing this after I exited one of my toolbox containers from conmon, it's looping infinitely using 100% of one core:
[pid 301357] poll([{fd=6, events=POLLIN}, {fd=10, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLIN}, {fd=16, events=POLLIN}, {fd=19, events=POLLIN}], 6, -1) = 2 ([{fd=6, revents=POLLIN}, {fd=10, revents=POLLNVAL}])
[pid 301357] read(6, "\2\0\0\0\0\0\0\0", 16) = 8
[pid 301357] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 301357] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 301357] poll([{fd=6, events=POLLIN}, {fd=10, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLIN}, {fd=16, events=POLLIN}, {fd=19, events=POLLIN}], 6, -1) = 2 ([{fd=6, revents=POLLIN}, {fd=10, revents=POLLNVAL}])
[pid 301357] read(6, "\2\0\0\0\0\0\0\0", 16) = 8
[pid 301357] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 301357] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 301357] poll([{fd=6, events=POLLIN}, {fd=10, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLIN}, {fd=16, events=POLLIN}, {fd=19, events=POLLIN}], 6, -1) = 2 ([{fd=6, revents=POLLIN}, {fd=10, revents=POLLNVAL}])
[pid 301357] read(6, "\2\0\0\0\0\0\0\0", 16) = 8
[pid 301357] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 301357] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 301357] poll([{fd=6, events=POLLIN}, {fd=10, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLIN}, {fd=16, events=POLLIN}, {fd=19, events=POLLIN}], 6, -1) = 2 ([{fd=6, revents=POLLIN}, {fd=10, revents=POLLNVAL}])
Stack trace looks like:
(gdb) t a a bt
Thread 2 (Thread 0x7f4512522700 (LWP 301359)):
#0 0x00007f4512878b6f in poll () from target:/lib64/libc.so.6
#1 0x00007f4512a55ace in g_main_context_iterate.constprop () from target:/lib64/libglib-2.0.so.0
#2 0x00007f4512a55c03 in g_main_context_iteration () from target:/lib64/libglib-2.0.so.0
#3 0x00007f4512a55c51 in glib_worker_main () from target:/lib64/libglib-2.0.so.0
#4 0x00007f4512a7f812 in g_thread_proxy () from target:/lib64/libglib-2.0.so.0
#5 0x00007f45126f0432 in start_thread () from target:/lib64/libpthread.so.0
#6 0x00007f45128839d3 in clone () from target:/lib64/libc.so.6
Thread 1 (Thread 0x7f45125237c0 (LWP 301357)):
#0 0x00007f451287461f in write () from target:/lib64/libc.so.6
#1 0x00007f4512a9f5fa in g_wakeup_signal () from target:/lib64/libglib-2.0.so.0
#2 0x00007f4512a515c4 in block_source () from target:/lib64/libglib-2.0.so.0
#3 0x00007f4512a558b8 in g_main_context_dispatch () from target:/lib64/libglib-2.0.so.0
#4 0x00007f4512a55b38 in g_main_context_iterate.constprop () from target:/lib64/libglib-2.0.so.0
#5 0x00007f4512a55e53 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0
#6 0x00000000004049d3 in main ()
(gdb)
I think this must be something like incorrect file descriptor management; something like conmon managing to add GLib's own internal eventfd to the mainloop set.
Currently conmon just truncates the log file when max is reached, this means we loose all of the data.
I think we should allow for a backup file. So when log-size-max is reached we rename the existing file to a .1 version. Now when the user reads the full log file they would read the .1 file first and then the main log file. This would mean we would always have log-size-max data available, after we fill it up. If .1 already existed, we would loose that data.
Since user asked for log-size-max, we should probably do the rename at log-size-max/2 to grant their wishes.
Hello!
I have setup a single-machine Kubernetes cluster. I am using CRI-O as a container runtime and podman as a replacement for Docker. As my network-plugin I use flannel. When I check the logs using journalctl I get the following error over and over again:
Jun 19 19:02:51 masternode conmon[7524]: E0619 17:02:51.479533 1 iptables.go:115] Failed to ensure iptables rules: Error checking rule existence: failed to check rule existence: running [/sbin/iptables -t filter -C FORWARD -s 10.244.0.0/16 -j ACCEPT --wait]: exit status 3: iptables v1.8.3 (legacy): can't initialize iptables table `filter': Permission denied (you must be root)
Jun 19 19:02:51 masternode conmon[7524]: Perhaps iptables or your kernel needs to be upgraded.
A quick internet search didn't produce any results. When I try to execute the command manually I get the following error.
$ sudo /sbin/iptables -t filter -C FORWARD -s 10.244.0.0/16 -j ACCEPT --wait
iptables: Bad rule (does a matching rule exist in that chain?).
However, if I check my firewall with iptables -L, I have the FORWARD Chain with rules. So what seems to be the problem?
$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
f2b-sshd tcp -- anywhere anywhere multiport dports ssh
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL all -- anywhere anywhere
Chain FORWARD (policy ACCEPT)
target prot opt source destination
KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL all -- anywhere anywhere
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
DROP all -- !127.0.0.0/8 127.0.0.0/8 /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT
Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination
Chain KUBE-PROXY-CANARY (0 references)
target prot opt source destination
Chain KUBE-EXTERNAL-SERVICES (1 references)
target prot opt source destination
Chain KUBE-SERVICES (3 references)
target prot opt source destination
Chain KUBE-FORWARD (1 references)
target prot opt source destination
DROP all -- anywhere anywhere ctstate INVALID
ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT all -- anywhere anywhere /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
Chain f2b-sshd (1 references)
target prot opt source destination
REJECT all -- ns3133419.ip-51-75-131.eu anywhere reject-with icmp-port-unreachable
RETURN all -- anywhere anywhere
$
kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:41:22Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.4", GitCommit:"c96aede7b5205121079932896c4ad89bb93260af", GitTreeState:"clean", BuildDate:"2020-06-17T11:33:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
conmon --version
conmon version 2.0.6
commit: 2721f230f94894671f141762bd0d1af2fb263239
$ crio --version
crio version
Version: 1.18.1
GitCommit: 5cbf694c34f8d1af19eb873e39057663a4830635
GitTreeState: clean
BuildDate: 2020-05-25T19:01:44Z
GoVersion: go1.13.4
Compiler: gc
Platform: linux/amd64
Linkmode: dynamic
$ uname -a
Linux masternode 4.18.0-147.8.1.el8_1.x86_64 #1 SMP Thu Apr 9 13:49:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
podman --version
podman version 1.6.4
I'm trying to use podman-compose to develop a compose file for an application, and I'm having issues when I try to start single containers after podman-compose up
failed. I'm having an issue like that:
$ sudo podman-compose start web
using podman version: podman version 1.8.2
podman start myapp_web_1
Error: unable to start container "myapp_web_1": cannot listen on the TCP port: listen tcp4 :3000: bind: address already in use
125
And when I look, port 3000 is held by conmon that keeps running for the previous version of the same container that exited long ago.
I believe conmon should not keep opened ports for stopped containers.
Version information (all running on Fedora 32 silverblue):
The conmon package does not contain any binary. Only docs:
root@main-vps:~# dpkg -L conmon
/.
/usr
/usr/share
/usr/share/doc
/usr/share/doc/conmon
/usr/share/doc/conmon/changelog.gz
/usr/share/doc/conmon/copyright
root@main-vps:~# apt show conmon
Package: conmon
Version: 2.0.11~8
Priority: optional
Section: devel
Maintainer: Lokesh Mandvekar <[email protected]>
Installed-Size: 10.2 kB
Homepage: https://github.com/containers/conmon.git
Download-Size: 3,776 B
APT-Manual-Installed: no
APT-Sources: http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/testing/xUbuntu_18.04 Packages
Description: OCI container runtime monitor
The package 2.0.11-2 contained in the stable repository has the same issue. Who can I talk to to have this fixed?
I use Fedora.
Under conmon local repo (as of commit 89b2478), I ran:
$ make test
go test -tags "" github.com/containers/conmon/runner/conmon_test/
Running Suite: Conmon Suite
...
conmon
/home/zyu/go-workspace/src/github.com/containers/conmon/runner/conmon_test/conmon_test.go:36
ctr logs
/home/zyu/go-workspace/src/github.com/containers/conmon/runner/conmon_test/conmon_test.go:83
log driver as journald should pass [It]
/home/zyu/go-workspace/src/github.com/containers/conmon/runner/conmon_test/conmon_test.go:117
Expected
<string>: [conmon:e] Include journald in compilation path to log to systemd journal
to be empty
I am doing a build inside Docker container and face the following errors during build time. This is working fine with v2.0.14
COMMAND:
git clone https://github.com/containers/conmon
&& cd conmon
&& make
&& make podman
ERROR:
mkdir -p bin
cc -std=c99 -Os -Wall -Wextra -Werror -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -DVERSION="2.0.15-dev" -DGIT_COMMIT=""89b2478b507c6f285cd97ae8e55c85b9cafe6e81"" -D USE_JOURNALD=0 -o src/conmon.o -c src/conmon.c
In file included from src/conmon.c:8:0:
src/conmon.c: In function โmainโ:
src/utils.h:47:45: error: implicit declaration of function โstrerrorโ [-Werror=implicit-function-declaration]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:53:4: note: in expansion of macro โpexitโ
pexit("start-pipe read failed");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:53:4: note: in expansion of macro โpexitโ
pexit("start-pipe read failed");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:53:4: note: in expansion of macro โpexitโ
pexit("start-pipe read failed");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:63:3: note: in expansion of macro โpexitโ
pexit("Failed to open /dev/null");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:63:3: note: in expansion of macro โpexitโ
pexit("Failed to open /dev/null");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:67:3: note: in expansion of macro โpexitโ
pexit("Failed to open /dev/null");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:67:3: note: in expansion of macro โpexitโ
pexit("Failed to open /dev/null");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:74:3: note: in expansion of macro โpexitโ
pexit("Failed to fork the create command");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:74:3: note: in expansion of macro โpexitโ
pexit("Failed to fork the create command");
^~~~~
src/conmon.c:80:59: error: implicit declaration of function โstrlenโ [-Werror=implicit-function-declaration]
if (!g_file_set_contents(opt_conmon_pid_file, content, strlen(content), &err)) {
^~~~~~
src/conmon.c:80:59: error: incompatible implicit declaration of built-in function โstrlenโ [-Werror]
src/conmon.c:80:59: note: include โ<string.h>โ or provide a declaration of โstrlenโ
In file included from src/conmon.c:8:0:
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:94:4: note: in expansion of macro โpexitโ
pexit("--attach specified but _OCI_ATTACHPIPE was not");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:94:4: note: in expansion of macro โpexitโ
pexit("--attach specified but _OCI_ATTACHPIPE was not");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:103:3: note: in expansion of macro โpexitโ
pexit("Failed to dup over stdin");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:103:3: note: in expansion of macro โpexitโ
pexit("Failed to dup over stdin");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:105:3: note: in expansion of macro โpexitโ
pexit("Failed to dup over stdout");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:105:3: note: in expansion of macro โpexitโ
pexit("Failed to dup over stdout");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:107:3: note: in expansion of macro โpexitโ
pexit("Failed to dup over stderr");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:107:3: note: in expansion of macro โpexitโ
pexit("Failed to dup over stderr");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:118:3: note: in expansion of macro โpexitโ
pexit("Failed to set as subreaper");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:118:3: note: in expansion of macro โpexitโ
pexit("Failed to set as subreaper");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:142:5: note: in expansion of macro โpexitโ
pexit("Failed to create !terminal stdin pipe");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:142:5: note: in expansion of macro โpexitโ
pexit("Failed to create !terminal stdin pipe");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:152:4: note: in expansion of macro โpexitโ
pexit("Failed to create !terminal stdout pipe");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:152:4: note: in expansion of macro โpexitโ
pexit("Failed to create !terminal stdout pipe");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:166:3: note: in expansion of macro โpexitโ
pexit("Failed to create stderr pipe");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:166:3: note: in expansion of macro โpexitโ
pexit("Failed to create stderr pipe");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:190:3: note: in expansion of macro โpexitโ
pexit("Failed to block signals");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:190:3: note: in expansion of macro โpexitโ
pexit("Failed to block signals");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:202:3: note: in expansion of macro โpexitโ
pexit("Failed to fork the create command");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:202:3: note: in expansion of macro โpexitโ
pexit("Failed to fork the create command");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:205:4: note: in expansion of macro โpexitโ
pexit("Failed to set PDEATHSIG");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:205:4: note: in expansion of macro โpexitโ
pexit("Failed to set PDEATHSIG");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:207:4: note: in expansion of macro โpexitโ
pexit("Failed to unblock signals");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:207:4: note: in expansion of macro โpexitโ
pexit("Failed to unblock signals");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:212:4: note: in expansion of macro โpexitโ
pexit("Failed to dup over stdin");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:212:4: note: in expansion of macro โpexitโ
pexit("Failed to dup over stdin");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:219:4: note: in expansion of macro โpexitโ
pexit("Failed to dup over stdout");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:219:4: note: in expansion of macro โpexitโ
pexit("Failed to dup over stdout");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:226:4: note: in expansion of macro โpexitโ
pexit("Failed to dup over stderr");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:226:4: note: in expansion of macro โpexitโ
pexit("Failed to dup over stderr");
^~~~~
src/utils.h:55:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: " fmt " %s\n", ##VA_ARGS, strerror(errno));
^
src/conmon.c:237:5: note: in expansion of macro โpexitfโ
pexitf("Invalid LISTEN_PID %.10s", listenpid);
^~~~~~
src/utils.h:57:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : " fmt ": %s\n", log_cid, ##VA_ARGS, strerror(errno));
^
src/conmon.c:237:5: note: in expansion of macro โpexitfโ
pexitf("Invalid LISTEN_PID %.10s", listenpid);
^~~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:241:6: note: in expansion of macro โpexitโ
pexit("Failed to g_strdup_sprintf pid");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:241:6: note: in expansion of macro โpexitโ
pexit("Failed to g_strdup_sprintf pid");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:243:6: note: in expansion of macro โpexitโ
pexit("Failed to setenv LISTEN_PID");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:243:6: note: in expansion of macro โpexitโ
pexit("Failed to setenv LISTEN_PID");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:257:6: note: in expansion of macro โpexitโ
pexit("start-pipe read failed");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:257:6: note: in expansion of macro โpexitโ
pexit("start-pipe read failed");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:269:3: note: in expansion of macro โpexitโ
pexit("Failed to register the signal handler");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:269:3: note: in expansion of macro โpexitโ
pexit("Failed to register the signal handler");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:273:3: note: in expansion of macro โpexitโ
pexit("Failed to unblock signals");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:273:3: note: in expansion of macro โpexitโ
pexit("Failed to unblock signals");
^~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:286:3: note: in expansion of macro โpexitโ
pexit("Failed to set handler for SIGCHLD");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:286:3: note: in expansion of macro โpexitโ
pexit("Failed to set handler for SIGCHLD");
^~~~~
src/utils.h:55:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: " fmt " %s\n", ##VA_ARGS, strerror(errno));
^
src/conmon.c:319:4: note: in expansion of macro โpexitfโ
pexitf("Failed to wait for runtime %s
", opt_exec ? "exec" : "create");
^~~~~~
src/utils.h:57:20: error: format โ%sโ expects argument of type โchar *โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : " fmt ": %s\n", log_cid, ##VA_ARGS, strerror(errno));
^
src/conmon.c:319:4: note: in expansion of macro โpexitfโ
pexitf("Failed to wait for runtime %s
", opt_exec ? "exec" : "create");
^~~~~~
src/utils.h:47:19: error: format โ%sโ expects argument of type โchar *โ, but argument 4 has type โintโ [-Werror=format=]
fprintf(stderr, "[conmon:e]: %s %s\n", s, strerror(errno));
^
src/conmon.c:474:3: note: in expansion of macro โpexitโ
pexit("Failed to remove symlink for attach socket directory");
^~~~~
src/utils.h:49:20: error: format โ%sโ expects argument of type โchar โ, but argument 5 has type โintโ [-Werror=format=]
syslog(LOG_ERR, "conmon %.20s : %s %s\n", log_cid, s, strerror(errno));
^
src/conmon.c:474:3: note: in expansion of macro โpexitโ
pexit("Failed to remove symlink for attach socket directory");
^~~~~
cc1: all warnings being treated as errors
Makefile:71: recipe for target 'src/conmon.o' failed
make: *** [src/conmon.o] Error 1
The command '/bin/sh -c mkdir -p $GOPATH && mkdir -p /run/systemd/journal/socket && git clone https://go.googlesource.com/go $GOPATH && cd $GOPATH && cd src && ./all.bash && cd /tmp && git clone https://github.com/containers/conmon && cd conmon && make && make podman && cp /usr/local/libexec/podman/conmon /usr/local/bin/ && git clone https://github.com/containernetworking/plugins.git $GOPATH/src/github.com/containernetworking/plugins && cd $GOPATH/src/github.com/containernetworking/plugins && ./build_linux.sh && mkdir -p /usr/libexec/cni && cp bin/ /usr/libexec/cni && mkdir -p /etc/cni/net.d && curl -qsSL https://raw.githubusercontent.com/containers/libpod/master/cni/87-podman-bridge.conflist | tee /etc/cni/net.d/99-loopback.conf && mkdir -p /etc/containers && curl https://raw.githubusercontent.com/projectatomic/registries/master/registries.fedora -o /etc/containers/registries.conf && curl https://raw.githubusercontent.com/containers/skopeo/master/default-policy.json -o /etc/containers/policy.json && git clone https://github.com/containers/libpod/ $GOPATH/src/github.com/containers/libpod && cd $GOPATH/src/github.com/containers/libpod && make && make install' returned a non-zero code: 2
cross post of: cri-o/cri-o#1778
The code opens files cgroup.event_control
and memory.oom_control
, which are specific to cgroupv1 only.
Commenting out the only call to setup_oom_handling()
seems to be enough to make podman/conmon to work on a machine booting with systemd.unified_cgroup_hierarchy=1
.
So it would be great to update the code to detect and handle cgroupv2 properly (particularly under the unified cgroup, where cgroupv2 is mounted directly under /sys/fs/cgroup
and cgroupv1 is not available at all.)
Apparently cgroupv2 can do inotify, so maybe doing inotify on memory.events
and looking for the oom
or oom_kill
counters would do the same as the current code? I also found references claiming cgroupv2 can do eventfd as well (though I haven't really found any references to it in the cgroupv2 documentation on the kernel tree.)
Opening this issue to track progress of this feature.
containers/podman#7196 (comment)
[root@sz-test ~]# podman run --rm alpine echo 1 | cat -A
1$
[root@sz-test ~]# podman run --rm --tty alpine echo 1 | cat -A
1^M$
[root@sz-test ~]#
The problem seems to be here
https://github.com/containers/conmon/blob/master/src/ctr_logging.c#L355-L360
@rhatdan PTAL
I am now maintenance an Ansible Role for conmon installation (https://github.com/alvistack/ansible-role-conmon), by downloading the static binary release archive and extract it for following OS:
Could we also introduce this as an alternative installation method, e.g. under https://github.com/containers/conmon/blob/master/install.md#conmon-installation-instructions?
continuing conversation here
I'm using podman on Fedora 31 as a normal unprivileged user. I've created a pod that contains a fairly restricted elasticsearch, postgresql, and redis instance for development and keep running into a weird issue. When running my unit tests against the services they'll complete once but my system will get incredibly sluggish. top reveals 3 conmon all running at 100% CPU.
At this point there are no connections between anything local on my system, and the three services don't interact with each other. I've confirmed there are no connections with ss
. Eventually something on my system kills them, the pods, and usually my terminal with it.
This seems like a bug in conmon getting stuck in some kind of spin lock or infinite loop but I don't know enough about the project to attempt to diagnose it. Are there logs or any additional information that I can provide to help diagnose this? Is this even the correct place to report it?
The packaged version on conmon I'm currently using is: conmon-2.0.2-1.fc31.x86_64
I am currently working on a CRI-O issue and hit:
option parsing failed: Unknown option -n
DEBU[2019-07-09 13:28:16.371889320+02:00] received signal [...]
It took me a while to realize that the option-parsing error came from conmon and think that a log prefix could be helpful.
Similar as runc (https://github.com/opencontainers/runc/releases/tag/v1.0.0-rc10) and cri-o (https://github.com/cri-o/cri-o/releases/tag/v1.17.0) which now provide pre-complied statically linked binary for download, could we also have that for conmon so simplify the difficulties for cross-OS deployment?
I am running:
walters@quicksilver ~> rpm-ostree status -b
State: idle
AutomaticUpdates: disabled
BootedDeployment:
โ ostree://fedora/31/x86_64/silverblue
Version: 31.12 (2019-11-19T12:28:06Z)
BaseCommit: 18f6f9c7926abe8964fd3383354d5d6e77caf82f4b41751308f748735e9558b1
RemovedBasePackages: pulseaudio-module-bluetooth 13.0-1.fc31
LayeredPackages: clevis-dracut firefox-wayland fish git-evtag gnome-tweak-tool krb5-workstation libvirt-client libvirt-daemon-driver-qemu opensc pcsc-lite-ccid pulseaudio-module-bluetooth-freeworld qemu-kvm strace tilix tmux virt-manager xsel
ykclient ykpers
walters@quicksilver ~> rpm -q podman conmon
podman-1.6.2-2.fc31.x86_64
conmon-2.0.2-1.fc31.x86_64
walters@quicksilver ~>
I am using https://github.com/cgwalters/coretoolbox and tried to run https://pypi.org/project/diffoscope/ on some disk images which seemed to allocate a ton of memory only to get OOM killed.
However, after that I had a few conmon
processes in a tight infinite loop chewing up CPU:
4694 walters 20 0 77908 1336 1100 R 99.5 0.0 11:55.36 conmon
7224 walters 20 0 77908 1168 932 R 99.0 0.0 11:45.48 conmon
383375 walters 20 0 77908 1472 1236 R 99.0 0.0 11:37.95 conmon
$ strace -p 4964
openat(AT_FDCWD, "/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/dbus\\x2d:1.4\\x2dcom.gexperts.Tilix.slice/dbus-:[email protected]/memory.events", O_RDONLY|O_CLOEXEC) = 11
fstat(11, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(11, "low 0\nhigh 0\nmax 0\noom 0\noom_kil"..., 4096) = 36
close(11) = 0
write(6, "\1\0\0\0\0\0\0\0", 8) = 8
poll([{fd=6, events=POLLIN}, {fd=10, events=POLLIN}, {fd=12, events=POLLIN}, {fd=14, events=POLLIN}, {fd=17, events=POLLIN}], 5, -1) = 2 ([{fd=6, revents=POLLIN}, {fd=17, revents=POLLIN}])
read(6, "\2\0\0\0\0\0\0\0", 16) = 8
write(6, "\1\0\0\0\0\0\0\0", 8) = 8
read(17, 0x7fffe4b515b0, 0) = -1 EINVAL (Invalid argument)
write(2, "[conmon:w]: failed to read event"..., 51) = 51
Repeating indefinitely.
Today conmon
is engaging in a byte-capture of stdout
/stderr
pipes, recording data as a series of JSON documents in a log file. All bytes from both file descriptors are captured without discrimination. We also see that conmon
tries to write to disk as fast as possible (but sequentially) with what it reads from both pipes.
There is a natural back-pressure presented by conmon
to the processes in the container writing to stdout
/stderr
that results from the rate the disk can accepts write()
s. It is usually the case that all conmon
processes server containers write to the same disk. Which means it is possible for "noisy" containers to take up much of the available disk bandwidth leading to unwanted impacts of one container on another.
Further, the rate at which containers write to disk is independent of the rate at which log file readers (or scrapers; e.g. fluentd, fluent-bit, rsyslog, syslog-ng, filebeat, etc.) can read. Because log file rotation is performed independent of the readers and conmon
on most platforms, it is possible for conmon
to write more data than what a reader can take in before a log file is deleted behind the reader's back with it ever seeing it.
These two problems suggest that log behavior policies applied by conmon
to the log stream will give administrators the chance to solve them.
There are three policy behaviors we can begin to consider, each applied to stdout
/stderr
independently:
back-pressure
drop
back-pressure
, drop bytes read over the limit for the remainder of the interval, accepting them again at the start of the next intervalignore
This will allow an administrator to apply a policy to all conmon
processes. Since conmon
processes don't "know" about each other, coordination of the setting of that policy would have to be managed externally to conmon
.
It is possible that conmon
processes could periodically poll for configuration changes, or accept some sort of signal to re-evaluate the change.
Could you upload v2.0.20 binary as in v2.0.19?
https://github.com/containers/conmon/releases
Also it would be good if the binaries are signed with GPG.
Get conmon packaged on fedora
We hit this on conmon 2.0.10.
We've been experiencing an issue on a new CRI-O based Kubernetes cluster where, occasionally, a node would have every user-land process (except init) killed with SIGKILL. This was most likely to happen during a rolling restart as we deployed new code. Our first thought was some sort of memory exhaustion issue, but this was happening on nodes with lots of free memory, and the kernel logs didn't indicate any OOM-killing activity (or anything else suspicious). The only heavily-contended resource was CPU (rolling restarts use a lot of CPU).
Our next best guess was a bug in some process in the root PID namespace accidentally killing everything. (I noted to a coworker that the behavior was a lot like running kill -9 -1
as root.) We don't run much on the host besides the CRI and kubelet, and we don't have any pods with hostPID
set, so we guessed it must be one of those two.
Given that every process gets killed whenever the bug occurs, this made nailing the cause down a little challenging. Eventually we shipped a SystemTap module to watch for SIGKILLs (using detached "flight recorder" mode to avoid a killable user-space component) and were able to catch conmon red-handed:
Script:
probe signal.send {
if (sig == 9 || pid_name == "auditd") {
printf("[%ld] %s(%d) sent %s to %s(%d)\n", gettimeofday_ms(), execname(), pid(), sig_name, pid_name, sig_pid)
}
}
Output:
[1582945630705] conmon(13331) sent SIGKILL to kthreadd(2)
[1582945630705] conmon(13331) sent SIGKILL to kworker/0:0H(4)
[1582945630705] conmon(13331) sent SIGKILL to mm_percpu_wq(6)
[1582945630705] conmon(13331) sent SIGKILL to ksoftirqd/0(7)
[1582945630705] conmon(13331) sent SIGKILL to rcu_sched(8)
... 250 omitted ...
[1582945630710] conmon(13331) sent SIGKILL to kubelet(13496)
[1582945630710] conmon(13331) sent SIGKILL to conntrack(13513)
[1582945630710] conmon(13331) sent SIGKILL to iptables-legacy(13514)
[1582945630710] conmon(13331) sent SIGKILL to kube-proxy(13515)
In the logs we can see that one conmon process sent (or tried to send) a SIGKILL to every
process and kernel thread in numerical order in the span of about 5ms.
I wasn't able to find any code in conmon that loops over PIDs trying to kill everything, so I gathered that this must have been the result of an errant kill(-1, SIGKILL)
. Per the manpage:
If
pid
equals -1, then sig is sent to every process for which the calling process has permission to send signals, except for process 1 (init
)
When I went looking, I was dismayed to discover that -1
is the value conmon uses for invalid PIDs:
Line 923 in 7a830be
So any call to kill(container_pid, SIGKILL)
has the potential to kill a lot more than it bargained for (0 might be a better choice, so the target is conmon
itself).
Most callsites of kill
explicitly guard against a negative pid value. There's two exceptions:
Lines 1746 to 1754 in 7a830be
There's two ways to get kill(-1, SIGKILL)
: if process_group
is 1, then the first branch will do it, or if container_pid
is -1
, then the second branch will (getpgid(-1)
returns -1). The former seems pretty unlikely to me (you'd need a pretty weird bug in the container runtime), so I'll focus on the latter. (Also, all pidfiles had reasonable PIDs when I checked on the host post-mortem.)
To trigger that case, you need timed_out
to be true and container_pid
to be -1. timed_out
means timeout_cb
has to fire, which only happens in the glib event loop. There's only one call to the event loop after the timer is set, so it has to happen here:
Lines 1724 to 1725 in 7a830be
We already established that the container runtime isn't messing with us by writing -1
to the pidfile, so the only place container_pid
can get set to -1 is in container_exit_cb
:
Line 923 in 7a830be
That in turn, can only be called from check_child_processes
, which can be called from either the event loop SIGUSR1 (fake SIGCHLD) handler, or right before the event loop is entered:
Line 1710 in 7a830be
container_exit_cb
and timeout_cb
both exit the event loop; we know from above that timeout_cb
had to happen in an event loop call, so that out-of-event-loop call must be the place where container_pid
gets set to -1 (i.e. the container process already exited before this point).
The dead container process means there should be a pending SIGUSR1 going into the event loop. However, if the timeout is also pending going into the event loop, I believe the order of callbacks is undefined. (Reading the source, the most-recently-added event source is checked first, which is the timeout.)
In our case, it looks like the offending call was a periodic Kubernetes exec health check with a 1 second timeout. If that process (which is normally fast) completes before the initial check_child_process
call, and conmon is CPU starved for long enough that the timeout fires before entering g_main_loop_run
, then it appears we can get to line 1746 with both container_pid == -1
and timed_out
true, at which point conmon kills everything else on the host.
I haven't added enough debug logging to 100% confirm this is what's happening, but after carefully reading the code I'm pretty sure this is a plausible explanation, and is consistent with the state of the hosts.
PR forthcoming!
This is a ticket originating from the discussion starting here:
containers/podman#3126 (comment)
At the request of @haircommander, i am creating this ticket to track the work required to get journald
support built into the podman
binaries that are currently being packaged up for the ubuntu/ppa users.
As pointed out here it would seem that the libsystemd-journal
package needs to be installed on . whichever machine is doing the build... so . my incoredibly ignorant take is that this should be a simple fix:
libsystemd-journal
package to the default packages that are installed on the build machine that creates the deb file that is pushed to the ubuntu PPAs.If my assumption is correct, i'd be happy to get a PR in... if somebody could point me to the file that drives the build environment config, i can probably figure out how to get an apt install libsystemd-journal
or similar run before the make common
call is issued...
Starting with the release of conmon 2.0.3 container restoring is broken. I saw it first in CRIU's CI:
https://travis-ci.org/checkpoint-restore/criu/jobs/612285703
Going back to 2.0.2 fixes it. As the CRIU CI setup uses the Ubuntu ppa it would be good to go back to 2.0.2 in the ppa.
The upcoming Podman v1.6.0 ships a new generate-systemd for Pods command. The restart policy of the unit files won't work without a newer conmon.
Is the pause
static binary still necessary for proper operation of conmon, specifically in the context of podman?
Thank you.
hello, there's this issue: google/gvisor#2233
The issue is around cri-o and gVisor(runsc) containers using conmon, in the loggs attached to the mentioned issue there seems to be some process closing the stderr pipe before the container kernel receives the SIGKILL which results in a unclean exit of the container. cri-o has other runtimes (runc and kata-containers) registered which work fine for this case.
I opened this issue here for conmon devs to consider if this could be overcome from within conmon using some existing command-line arguments or maybe consider making one.
greetings.
/kind bug
Description
Piping input to podman run
hands indefinitely. Ctrl-C
doesn't help.
$ cat dbdump | podman run -i --rm postgres pg_restore
Killing doesn't work immediately.
$ podman stop dff
Error: timed out waiting for file /run/user/1000/libpod/tmp/exits/dff4ea513af0e6d49cbb866fa6352e67e8682a041bea71ccf12d229189707ccb: internal libpod error
Container, however. quits.
Running cat dbdump | pg_restore
from inside the container completes immediately. dbdump
size is 300k.
Output of podman version
:
Version: 1.6.2
RemoteAPI Version: 1
Go Version: go1.13.1
OS/Arch: linux/amd64
Output of podman info --debug
:
debug:
compiler: gc
git commit: ""
go version: go1.13.1
podman version: 1.6.2
host:
BuildahVersion: 1.11.3
CgroupVersion: v2
Conmon:
package: conmon-2.0.2-1.fc31.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.2, commit: 186a550ba0866ce799d74006dab97969a2107979'
Distribution:
distribution: fedora
version: "31"
IDMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
MemFree: 3069919232
MemTotal: 8039636992
OCIRuntime:
name: crun
package: crun-0.10.6-1.fc31.x86_64
path: /usr/bin/crun
version: |-
crun version 0.10.6
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
SwapFree: 8196714496
SwapTotal: 8196714496
arch: amd64
cpus: 4
eventlogger: journald
hostname: blackred
kernel: 5.3.15-300.fc31.x86_64
os: linux
rootless: true
slirp4netns:
Executable: /usr/bin/slirp4netns
Package: slirp4netns-0.4.0-20.1.dev.gitbbd6f25.fc31.x86_64
Version: |-
slirp4netns version 0.4.0-beta.3+dev
commit: bbd6f25c70d5db2a1cd3bfb0416a8db99a75ed7e
uptime: 56m 4.69s
registries:
blocked: null
insecure: null
search:
- docker.io
- registry.fedoraproject.org
- registry.access.redhat.com
- registry.centos.org
- quay.io
store:
ConfigFile: /home/anatoli/.config/containers/storage.conf
ContainerStore:
number: 11
GraphDriverName: overlay
GraphOptions:
overlay.mount_program:
Executable: /usr/bin/fuse-overlayfs
Package: fuse-overlayfs-0.7.2-2.fc31.x86_64
Version: |-
fusermount3 version: 3.6.2
fuse-overlayfs: version 0.7.2
FUSE library version 3.6.2
using FUSE kernel interface version 7.29
GraphRoot: /home/anatoli/.local/share/containers/storage
GraphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "false"
ImageStore:
number: 11
RunRoot: /run/user/1000
VolumePath: /home/anatoli/.local/share/containers/storage/volumes
Package info (e.g. output of rpm -q podman
or apt list podman
):
podman-1.6.2-2.fc31.x86_64
Add a lot of logic and checks to log-path.
Issues:
--log-path ""
segfaults
--log-path :
segfaults
--log-path k8s-file:
writes to a file called k8s-file (instead of asking for a path)
--log-path k8s-file
writes to a file called k8s-file (instead of asking for a path)
--log-path :journald
turns on journald logging, instead of writing to a file called "journald"
PR prepared to resolve all of the above w/ proper unit tests.
It seems that at least check_child_processes()
can be called as a part of a signal handler. And this function may in turn call printf-family functions, which are not white-listed here http://man7.org/linux/man-pages/man7/signal-safety.7.html. Is it considered a normal practice?
conmon could use a man page. while it's not really intended to be called by the end user, other documentation often references to it, like discussed here
if this fork succeeds
Line 1496 in f6d23b5
And any of the following code before the waitpid
call fails, shouldn't the child process (create_pid
) be stopped correctly and waited for with waitpid
? I think my issue in containers/podman#3696 is a side effect of an injected fd not being cleaned up correctly when
Line 1595 in f6d23b5
setup_terminal_control_fifo
calls exit
when it fails but by then the fd has already been inherited by the child process and because the child process is not cleaned up when exit
is called, the socket stays open and a subsequent podman start
fails because the port is already in use.
kubernetes/kubectl#810, cri-o/cri-o#3175 โ ๐ถ
Following on from the issues above, piping to a cri-o container using either kubectl
or crictl
leads to a variable delay (in the realm of 2-35 seconds) before handing back to the calling terminal.
To reproduce, start some temporary pod with eg.
$ kubectl run --restart=Never temp --image=alpine -- tail -f /dev/null
and pipe something to the container with kubectl
eg.
$ echo date | DEBUG=1 kubectl exec -i temp -- ash
I0210 18:38:43.848015 14658 log.go:172] (0xc000498160) (0xc000674820) Create stream
I0210 18:38:43.848287 14658 log.go:172] (0xc000498160) (0xc000674820) Stream added, broadcasting: 1
I0210 18:38:43.880065 14658 log.go:172] (0xc000498160) Reply frame received for 1
I0210 18:38:43.880167 14658 log.go:172] (0xc000498160) (0xc0006e8000) Create stream
I0210 18:38:43.880190 14658 log.go:172] (0xc000498160) (0xc0006e8000) Stream added, broadcasting: 3
I0210 18:38:43.906711 14658 log.go:172] (0xc000498160) Reply frame received for 3
I0210 18:38:43.906829 14658 log.go:172] (0xc000498160) (0xc0006e2000) Create stream
I0210 18:38:43.906901 14658 log.go:172] (0xc000498160) (0xc0006e2000) Stream added, broadcasting: 5
I0210 18:38:43.940385 14658 log.go:172] (0xc000498160) Reply frame received for 5
I0210 18:38:43.940448 14658 log.go:172] (0xc000498160) (0xc0006e80a0) Create stream
I0210 18:38:43.940469 14658 log.go:172] (0xc000498160) (0xc0006e80a0) Stream added, broadcasting: 7
I0210 18:38:43.972047 14658 log.go:172] (0xc000498160) Reply frame received for 7
I0210 18:38:43.972393 14658 log.go:172] (0xc0006e8000) (3) Writing data frame
I0210 18:38:43.972596 14658 log.go:172] (0xc0006e8000) (3) Writing data frame
I0210 18:38:44.177922 14658 log.go:172] (0xc000498160) Data frame received for 5
I0210 18:38:44.177988 14658 log.go:172] (0xc0006e2000) (5) Data frame handling
I0210 18:38:44.178021 14658 log.go:172] (0xc0006e2000) (5) Data frame sent
Mon Feb 10 18:38:44 UTC 2020
I0210 18:38:59.743240 14658 log.go:172] (0xc000498160) (0xc0006e8000) Stream removed, broadcasting: 3
I0210 18:38:59.743327 14658 log.go:172] (0xc000498160) Data frame received for 1
I0210 18:38:59.743350 14658 log.go:172] (0xc000674820) (1) Data frame handling
I0210 18:38:59.743368 14658 log.go:172] (0xc000674820) (1) Data frame sent
I0210 18:38:59.743543 14658 log.go:172] (0xc000498160) (0xc000674820) Stream removed, broadcasting: 1
I0210 18:38:59.743641 14658 log.go:172] (0xc000498160) (0xc0006e2000) Stream removed, broadcasting: 5
I0210 18:38:59.743670 14658 log.go:172] (0xc000498160) (0xc0006e80a0) Stream removed, broadcasting: 7
I0210 18:38:59.743776 14658 log.go:172] (0xc000498160) Go away received
I0210 18:38:59.744136 14658 log.go:172] (0xc000498160) (0xc000674820) Stream removed, broadcasting: 1
I0210 18:38:59.744161 14658 log.go:172] (0xc000498160) (0xc0006e8000) Stream removed, broadcasting: 3
I0210 18:38:59.744178 14658 log.go:172] (0xc000498160) (0xc0006e2000) Stream removed, broadcasting: 5
I0210 18:38:59.744224 14658 log.go:172] (0xc000498160) (0xc0006e80a0) Stream removed, broadcasting: 7
Observe that the invocation of date
command on the container completes almost immediately, but the kubectl
command doesn't complete until (in this case) 15 seconds later. The same behaviour can be observed by using crictl
on the node, eg.
# echo date | DEBUG=1 crictl exec -i $(crictl ps --name temp -q) ash
2020/02/10 18:45:26 (0xc00028a000) (0xc00025c140) Create stream
2020/02/10 18:45:26 (0xc00028a000) (0xc00025c140) Stream added, broadcasting: 1
2020/02/10 18:45:26 (0xc00028a000) Reply frame received for 1
2020/02/10 18:45:26 (0xc00028a000) (0xc0001994a0) Create stream
2020/02/10 18:45:26 (0xc00028a000) (0xc0001994a0) Stream added, broadcasting: 3
2020/02/10 18:45:26 (0xc00028a000) Reply frame received for 3
2020/02/10 18:45:26 (0xc00028a000) (0xc00039c000) Create stream
2020/02/10 18:45:26 (0xc00028a000) (0xc00039c000) Stream added, broadcasting: 5
2020/02/10 18:45:26 (0xc00028a000) Reply frame received for 5
2020/02/10 18:45:26 (0xc00028a000) (0xc00025c1e0) Create stream
2020/02/10 18:45:26 (0xc00028a000) (0xc00025c1e0) Stream added, broadcasting: 7
2020/02/10 18:45:26 (0xc00028a000) Reply frame received for 7
2020/02/10 18:45:26 (0xc0001994a0) (3) Writing data frame
2020/02/10 18:45:26 (0xc0001994a0) (3) Writing data frame
2020/02/10 18:45:26 (0xc00028a000) Data frame received for 5
2020/02/10 18:45:26 (0xc00039c000) (5) Data frame handling
2020/02/10 18:45:26 (0xc00039c000) (5) Data frame sent
Mon Feb 10 18:45:26 UTC 2020
2020/02/10 18:45:51 (0xc00028a000) Data frame received for 1
2020/02/10 18:45:51 (0xc00025c140) (1) Data frame handling
2020/02/10 18:45:51 (0xc00025c140) (1) Data frame sent
2020/02/10 18:45:51 (0xc00028a000) (0xc0001994a0) Stream removed, broadcasting: 3
2020/02/10 18:45:51 (0xc00028a000) (0xc00025c140) Stream removed, broadcasting: 1
2020/02/10 18:45:51 (0xc00028a000) (0xc00039c000) Stream removed, broadcasting: 5
2020/02/10 18:45:51 (0xc00028a000) (0xc00025c1e0) Stream removed, broadcasting: 7
2020/02/10 18:45:51 (0xc00028a000) (0xc00025c140) Stream removed, broadcasting: 1
2020/02/10 18:45:51 (0xc00028a000) (0xc0001994a0) Stream removed, broadcasting: 3
2020/02/10 18:45:51 (0xc00028a000) (0xc00039c000) Stream removed, broadcasting: 5
2020/02/10 18:45:51 (0xc00028a000) (0xc00025c1e0) Stream removed, broadcasting: 7
The node is Ubuntu 19.10 and versions of kubelet, cri-o and conmon are current at the time of writing.
# kubelet --version
Kubernetes v1.17.2
# crio --version
crio version 1.15.3-dev
commit: unknown
# conmon --version
conmon version 2.0.3
commit: unknown
From CRI-O toolchain point of view following packages already coming with "nix Package Manager" (https://nixos.org/nix/) for static build support:
As long as conmon already coming with static build support in Podman style (https://github.com/containers/conmon/blob/master/Makefile#L57-L65), we should also able to simplify it with nix-based solution, in order to get rid of the "chicken or the egg" issue.
This could also unify the static build process with nix-based, at least from crun >> conmon >> crio; afterward we may also promote it for skopeo >> buildah >> podman, which is also partly ready.
Shall we also go for a similar solution?
Firstly, I noticed that this section of conmon
was recently refactored in 59c2817. I haven't tried to replicate this issue with that version but suspect that those changes would also fix this error. I'm submitting this issue primarily as an advisory and to help anyone else who might run into this problem.
Environment:
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)
$ podman version
Version: 1.6.4
RemoteAPI Version: 1
Go Version: go1.12.12
OS/Arch: linux/amd64
$ conmon --version
conmon version 2.0.15
commit: 372b4a12f1c2df4f70c280d41173b60acd3f1260
One of our users has been reporting that attempting to run anything with podman fails with the following error: Error: container create failed (no logs from conmon): EOF
. All other users on the system have had no issues. Debugging suggested that the error was triggered by a combination of the length of the user's UID and temp directory path.
We were able to reproduce the issue by creating a new test user with the same length of UID and temp directory path:
username=test
uid=12345
TMPDIR=/var/tmp/test
Here is the strace output leading up to the error:
open("/var/tmp/test/conmon-term.2U29S0", O_RDWR|O_CREAT|O_EXCL, 0600) = 7
close(7) = 0
write(2, "[conmon:i]: addr{sun_family=AF_UNIX, sun_path=/var/tmp/test/conmon-term.2U29S0}\n", 80) = 80
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 7
fchmod(7, 0700) = 0
unlink("/var/tmp/test/conmon-term.2U29S0") = 0
bind(7, {sa_family=AF_UNIX, sun_path="/var/tmp/test/conmon-term.2U29S0"}, 110) = 0
listen(7, 128) = 0
pipe2([8, 9], O_CLOEXEC) = 0
unlink("/var/tmp/test/run-12345/libpod/tmp/socket/4dd4e6a3714e25e5fc0ceb68a599cb38db7c0a49063813f631929bf2a4545414") = 0
symlink("/home/test/.local/share/containers/storage/overlay-containers/4dd4e6a3714e25e5fc0ceb68a599cb38db7c0a49063813f631929bf2a4545414/testdata", "/var/tmp/test/run-12345/libpod/tmp/socket/4dd4e6a3714e25e5fc0ceb68a599cb38db7c0a49063813f631929bf2a4545414") = 0
write(2, "[conmon:i]: attach sock path: /var/tmp/test/run-12345/libpod/tmp/socket/4dd4e6a3714e25e5fc0ceb68a599cb38db7c0a49063813f631929bf2a4545414/attach\n", 144) = 144
write(2, "[conmon:i]: addr{sun_family=AF_UNIX, sun_path=/var/tmp/test/run-12345/libpod/tmp/socket/4dd4e6a3714e25e5fc0ceb68a599cb38db7c0a49063813f631929bf2a4545414/}\n", 155) = 155
socket(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 10
fchmod(10, 0700) = 0
unlink("/var/tmp/test/run-12345/libpod/tmp/socket/4dd4e6a3714e25e5fc0ceb68a599cb38db7c0a49063813f631929bf2a4545414/") = -1 ENOTDIR (Not a directory)
write(2, "[conmon:e]: Failed to remove existing attach socket: /var/tmp/test/run-12345/libpod/tmp/socket/4dd4e6a3714e25e5fc0ceb68a599cb38db7c0a49063813f631929bf2a4545414/ Not a directory\n", 177) = 177
exit_group(1) = ?
+++ exited with 1 +++
It appears that the check that's failing is the one at https://github.com/containers/conmon/blob/v2.0.15/src/conn_sock.c#L105 which expects unlink
to succeed or return ENOENT
, but is instead getting ENOTDIR
because the value of attach_sock_path
is truncated at 108 characters and includes a trailing /
under these conditions.
Specifically, the value of attach_sock_path
:
/var/tmp/test/run-12345/libpod/tmp/socket/4dd4e6a3714e25e5fc0ceb68a599cb38db7c0a49063813f631929bf2a4545414/attach
is truncated to
/var/tmp/test/run-12345/libpod/tmp/socket/4dd4e6a3714e25e5fc0ceb68a599cb38db7c0a49063813f631929bf2a4545414/
when copied into attach_addr.sun_path
, which includes a trailing /
and triggers this error.
Adding or removing a single character from the user's UID or the temp directory path causes attach_addr.sun_path
to be truncated without a trailing /
and avoids the error:
open("/var/tmp/test/conmon-term.83X8S0", O_RDWR|O_CREAT|O_EXCL, 0600) = 7
close(7) = 0
write(2, "[conmon:i]: addr{sun_family=AF_UNIX, sun_path=/var/tmp/test/conmon-term.83X8S0}\n", 79) = 79
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 7
fchmod(7, 0700) = 0
unlink("/var/tmp/test/conmon-term.83X8S0") = 0
bind(7, {sa_family=AF_UNIX, sun_path="/var/tmp/test/conmon-term.83X8S0"}, 110) = 0
listen(7, 128) = 0
pipe2([8, 9], O_CLOEXEC) = 0
unlink("/var/tmp/test/run-1234/libpod/tmp/socket/5c22d46123ccf393d113759a7eefdd412472e9aadd961f068cda6e027692c2b8") = -1 ENOENT (No such file or directory)
symlink("/home/test/.local/share/containers/storage/overlay-containers/5c22d46123ccf393d113759a7eefdd412472e9aadd961f068cda6e027692c2b8/userdata", "/var/tmp/test/run-1234/libpod/tmp/socket/5c22d46123ccf393d113759a7eefdd412472e9aadd961f068cda6e027692c2b8") = 0
write(2, "[conmon:i]: attach sock path: /var/tmp/test/run-1234/libpod/tmp/socket/5c22d46123ccf393d113759a7eefdd412472e9aadd961f068cda6e027692c2b8/attach\n", 143) = 143
write(2, "[conmon:i]: addr{sun_family=AF_UNIX, sun_path=/var/tmp/test/run-1234/libpod/tmp/socket/5c22d46123ccf393d113759a7eefdd412472e9aadd961f068cda6e027692c2b8/a}\n", 155) = 155
write(2, "[conmon:i]: addr{sun_family=AF_UNIX, sun_path=/var/tmp/test/run-1234/libpod/tmp/socket/5c22d46123ccf393d113759a7eefdd412472e9aadd961f068cda6e027692c2b8/a}\n", 155) = 155
socket(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 10
fchmod(10, 0700) = 0
unlink("/var/tmp/test/run-1234/libpod/tmp/socket/5c22d46123ccf393d113759a7eefdd412472e9aadd961f068cda6e027692c2b8/a") = -1 ENOENT (No such file or directory)
bind(10, {sa_family=AF_UNIX, sun_path="/var/tmp/test/run-1234/libpod/tmp/socket/5c22d46123ccf393d113759a7eefdd412472e9aadd961f068cda6e027692c2b8/a"}, 110) = 0
listen(10, 10) = 0
I'm looking for a timeout feature for conmon. Specifically, I want conmon to kill the container if it doesn't exit before a timeout period elapses.
Currently, conmon has a mostly undocumented command line option --timeout
. It is not clear what this feature actually does. But I suspect that it just exits conmon after a period of time. It is not obvious to me that it actually kills the container at the timeout.
I'm not sure why it is useful to exit conmon at a timeout without killing the container. But would conmon consider a patch to kill the container at the timeout (either as the default behavior or behind another CLI option)?
The run for the static binary seems to succeed, but we have no binary being uploaded:
https://cirrus-ci.com/task/6040168039710720
cc @hswong3i
Hey there ๐, do we need static binary builds of conmon? We currently have some dependencies in:
> ldd bin/conmon
linux-vdso.so.1 (0x00007fffafde9000)
libglib-2.0.so.0 => /usr/lib64/libglib-2.0.so.0 (0x00007f6325932000)
libsystemd.so.0 => /usr/lib64/libsystemd.so.0 (0x00007f6325876000)
libc.so.6 => /lib64/libc.so.6 (0x00007f63256b1000)
libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007f6325621000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6325600000)
librt.so.1 => /lib64/librt.so.1 (0x00007f63255f5000)
liblzma.so.5 => /usr/lib64/liblzma.so.5 (0x00007f63255bc000)
liblz4.so.1 => /usr/lib64/liblz4.so.1 (0x00007f632559c000)
libcap.so.2 => /usr/lib64/libcap.so.2 (0x00007f6325595000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f6325569000)
libgcrypt.so.20 => /usr/lib64/libgcrypt.so.20 (0x00007f6325449000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6325a76000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f6325444000)
libgpg-error.so.0 => /usr/lib64/libgpg-error.so.0 (0x00007f632541e000)
Feel free to close if this is out of scope.
I am running into an error when building version 2.0.17
from source. The error didn't occur with version 2.0.13
. I'm hesitant to call this a regression as it is far more likely that I did something wrong. Both builds ran inside the same container image, golang:1.13.8-alpine3.11
.
Thank you!
Step 8/27 : RUN git clone --branch v$CONMON_VERSION https://github.com/containers/conmon $GOPATH/src/github.com/containers/conmon && cd $GOPATH/src/github.com/containers/conmon && make
---> Running in 7d8bf8ff93c2
Cloning into '/go/src/github.com/containers/conmon'...
mkdir -p bin
cc -std=c99 -Os -Wall -Wextra -Werror -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -DVERSION=\"2.0.17\" -DGIT_COMMIT=\""41877362fc4685d55e0473d2e4a1cbe5e1debee0"\" -o src/conmon.o -c src/conmon.c
cc -std=c99 -Os -Wall -Wextra -Werror -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -DVERSION=\"2.0.17\" -DGIT_COMMIT=\""41877362fc4685d55e0473d2e4a1cbe5e1debee0"\" -o src/cmsg.o -c src/cmsg.c
cc -std=c99 -Os -Wall -Wextra -Werror -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -DVERSION=\"2.0.17\" -DGIT_COMMIT=\""41877362fc4685d55e0473d2e4a1cbe5e1debee0"\" -o src/ctr_logging.o -c src/ctr_logging.c
cc -std=c99 -Os -Wall -Wextra -Werror -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -DVERSION=\"2.0.17\" -DGIT_COMMIT=\""41877362fc4685d55e0473d2e4a1cbe5e1debee0"\" -o src/utils.o -c src/utils.c
cc -std=c99 -Os -Wall -Wextra -Werror -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -DVERSION=\"2.0.17\" -DGIT_COMMIT=\""41877362fc4685d55e0473d2e4a1cbe5e1debee0"\" -o src/cli.o -c src/cli.c
In file included from src/cli.c:1:
src/cli.h:33:8: error: unknown type name 'int64_t'
33 | extern int64_t opt_log_size_max;
| ^~~~~~~
In file included from src/cli.c:3:
src/ctr_logging.h:10:49: error: unknown type name 'int64_t'
10 | void configure_log_drivers(gchar **log_drivers, int64_t log_size_max_, char *cuuid_, char *name_, char *tag);
| ^~~~~~~
src/cli.c:40:1: error: unknown type name 'int64_t'
40 | int64_t opt_log_size_max = -1;
| ^~~~~~~
src/cli.c: In function 'process_cli':
src/cli.c:164:2: error: implicit declaration of function 'configure_log_drivers' [-Werror=implicit-function-declaration]
164 | configure_log_drivers(opt_log_path, opt_log_size_max, opt_cid, opt_name, opt_log_tag);
| ^~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make: *** [Makefile:71: src/cli.o] Error 1
The command '/bin/sh -c git clone --branch v$CONMON_VERSION https://github.com/containers/conmon $GOPATH/src/github.com/containers/conmon && cd $GOPATH/src/github.com/containers/conmon && make' returned a non-zero code: 2
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
After multiple exec on a container, conman processes sporn that take 100% CPU for 5 mins
Steps to reproduce the issue:
Create a container podman run --rm --name test -d docker.io/nginx
Start the service for the REST API podman system service tcp:0.0.0.0:8090 -t0
Do MULTIPLE exec into the conatiner e.G. for i in {1..20}; do podman --remote --url tcp://127.0.0.1:8090 exec test ls; done
(Maybe must even be executed 2 or 3 times)
Describe the results you received:
After that some conman processes sporn that take ~100% CPU
Describe the results you expected:
Normal CPU usage :-)
Additional information you deem important (e.g. issue happens only occasionally):
Seems to be irrelevant what container and which command is executed, but it only happens when you execute it about 10 times or more.
Output of podman version
:
Version: 2.2.1
API Version: 2.1.0
Go Version: go1.14.7
Built: Thu Dec 10 13:26:48 2020
OS/Arch: linux/amd64
Output of podman info --debug
:
host:
arch: amd64
buildahVersion: 1.18.0
cgroupManager: systemd
cgroupVersion: v1
conmon:
package: conmon-2.0.21-1.el8.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.21, commit: f619ab8ef5f69bd40bb75ed64f3e1dace1815c22-dirty'
cpus: 4
distribution:
distribution: '"centos"'
version: "8"
eventLogger: journald
hostname: localhost.localdomain
idMappings:
gidmap: null
uidmap: null
kernel: 4.18.0-240.1.1.el8_3.x86_64
linkmode: dynamic
memFree: 7358492672
memTotal: 8145018880
ociRuntime:
name: runc
package: runc-1.0.0-145.rc91.git24a3cf8.el8.x86_64
path: /usr/bin/runc
version: 'runc version spec: 1.0.2-dev'
os: linux
remoteSocket:
path: /run/podman/podman.sock
rootless: false
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 5368705024
swapTotal: 5368705024
uptime: 24m 24.79s
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- registry.centos.org
- docker.io
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 4
paused: 0
running: 1
stopped: 3
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev,metacopy=on
graphRoot: /var/lib/containers/storage
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "true"
imageStore:
number: 1
runRoot: /var/run/containers/storage
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 2.1.0
Built: 1607624808
BuiltTime: Thu Dec 10 13:26:48 2020
GitCommit: ""
GoVersion: go1.14.7
OsArch: linux/amd64
Version: 2.2.1
Package info (e.g. output of rpm -q podman
or apt list podman
):
podman-2.2.1-1.el8.x86_64
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
Tested in a VirtualBox vm
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.