project-flotta / flotta-device-worker Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Steps to reproduce:
/etc/yggdrasild/device-config.json
Actual behavior:
The control plane deletes the CR instance of the workload and the device's heartbeat no longer carries the workload information.
Expected behavior:
The control plane keeps the CR and, during the heartbeat reconciliation loop, it resets the workload configuration in the device so that it triggers the deployment again.
In current implementation, the device will try to register itself to the server every 10 seconds if fails.
We'd like to change that behavior to retry with an exponential backoff (10s, 20s, 40s, …), that is capped at 1 minute.
When I run this workload
apiVersion: management.project-flotta.io/v1alpha1
kind: EdgeWorkload
metadata:
name: log
spec:
deviceSelector:
matchLabels:
app: webcam
type: pod
pod:
spec:
containers:
- name: log
image: docker.io/eloycoto/logexample
it fails due to this error shown in the device's log:
Jul 28 11:38:59 fedora yggdrasild[837]: [yggdrasild] 2022/07/28 11:38:59 /usr/libexec/yggdrasil/device-worker: failed to delete chain 'log' from table 'edge' for workload 'log': running [/usr/sbin/nft list chain inet edge log]: exit status 1: Error: syntax error, unexpected log, expecting string
Jul 28 11:38:59 fedora yggdrasild[837]: [yggdrasild] 2022/07/28 11:38:59 /usr/libexec/yggdrasil/device-worker: list chain inet edge log
Jul 28 11:38:59 fedora yggdrasild[837]: [yggdrasild] 2022/07/28 11:38:59 /usr/libexec/yggdrasil/device-worker: ^^^
Proposed solution is to prepend with wl-
the chain name to avoid hitting keywords in nft and being able to still meet the max length requirements in NFT and kubernetes.
Removing the workload does not remove the service link in the flotta systemd directory, even though the service file has been removed:
[flotta@fedora user]$ ls -la /var/home/flotta/.config/systemd/user/mylog.service
ls: cannot access '/var/home/flotta/.config/systemd/user/mylog.service': No such file or directory
[flotta@fedora user]$ ls -la default.target.wants/
total 0
drwxr-xr-x. 2 flotta flotta 27 Aug 2 15:21 .
drwxr-xr-x. 4 flotta flotta 62 Aug 2 16:12 ..
lrwxrwxrwx. 1 flotta flotta 51 Aug 2 15:21 mylog.service -> /var/home/flotta/.config/systemd/user/mylog.service
Steps to reproduce:
apiVersion: management.project-flotta.io/v1alpha1
kind: EdgeWorkload
metadata:
name: mount
annotations:
podman/run.oci.keep_original_groups: "1"
spec:
deviceSelector:
matchLabels:
app: mount
type: pod
pod:
spec:
containers:
- image: docker.io/eloycoto/logexample
name: fedora
volumeMounts:
- mountPath: /home/flotta/
name: home
securityContext:
seLinuxOptions:
type: 'spc_t'
restartPolicy: Always
volumes:
- name: home
hostPath:
path: /var/home/flotta
type: File
app=mount
so that the workload will run in the devicesu -l flotta -s /bin/bash
podman exec -it mount-fedora bash
mount.service
found in /home/flotta/.config/systemd/user/default.target.wants/
[root@mount /]# ls -la /home/flotta/.config/systemd/user/default.target.wants
total 0
drwxr-xr-x. 2 root root 27 Aug 4 22:23 .
drwxr-xr-x. 4 root root 121 Aug 4 22:23 ..
lrwxrwxrwx. 1 root root 51 Aug 4 22:23 mount.service -> /var/home/flotta/.config/systemd/user/mount.service
[root@mount /]# rm /home/flotta/.config/systemd/user/default.target.wants/mount.service
rm: remove symbolic link '/home/flotta/.config/systemd/user/default.target.wants/mount.service'? y
[root@mount /]#
[flotta@fedora user]$ ls -la default.target.wants/
total 0
drwxr-xr-x. 2 flotta flotta 6 Aug 4 18:11 .
drwxr-xr-x. 4 flotta flotta 83 Aug 4 18:12 ..
Aug 04 18:12:09 fedora yggdrasild[841]: [yggdrasild] 2022/08/04 18:12:09 /usr/libexec/yggdrasil/device-worker: workload not found: mount. Removing. DeviceID: 4233c45699b644b79107306e74bccbc5;
Aug 04 18:12:20 fedora yggdrasild[841]: [yggdrasild] 2022/08/04 18:12:20 /usr/libexec/yggdrasil/device-worker: workload mount removed. DeviceID: 4233c45699b644b79107306e74bccbc5;
Note: As a side effect, the edgeworkload is removed from the control plane as well as from the device.
Aug 08 19:34:41 43ea4fa72936 yggdrasild[172]: [yggdrasild] 2022/08/08 19:34:41 /builddir/build/BUILD/yggdrasil-0.2.99-0.86.git.3eb009b/cmd/yggd/worker.go:114: /usr/libexec/yggdrasil/device-worker: Heartbeat send: Sending data: message_id:"00e3704a-b31f-49bf-a50f-f81fc2896a7e" content:"{\"events\":[{\"message\":\"failed to add rule tcp dport 8885 ct state new,established counter accept for workload nginx: running [/usr/sbin/nft add rule inet edge nginx tcp dport 8885 ct state new,established counter accept]: exit status 1: Error: No such file or directory; did you mean chain ‘wl-nginx’ in table inet ‘edge’?\\nadd rule inet edge nginx tcp dport 8885 ct state new,established counter accept\\n ^^^^^\\n\",\"reason\":\"Failed\",\"type\":\"warn\"}],\"status\":\"up\",\"upgrade\":{\"current_commit_ID\":\"unknown\"},\"version\":\"95295\",\"workloads\":null}" directive:"heartbeat"; Device ID: acc8996f-3959-4a93-9ed7-ae1be8a98ca7
Aug 08 19:34:41 43ea4fa72936 yggdrasild[172]: [yggdrasild] 2022/08/08 19:34:41 /builddir/build/BUILD/yggdrasil-0.2.99-0.86.git.3eb009b/internal/transport/http.go:113: posting HTTP request body: {"type":"data","message_id":"00e3704a-b31f-49bf-a50f-f81fc2896a7e","response_to":"","version":1,"sent":"2022-08-08T19:34:41.141049018Z","directive":"heartbeat","metadata":null,"content":{"events":[{"message":"failed to add rule tcp dport 8885 ct state new,established counter accept for workload nginx: running [/usr/sbin/nft add rule inet edge nginx tcp dport 8885 ct state new,established counter accept]: exit status 1: Error: No such file or directory; did you mean chain ‘wl-nginx’ in table inet ‘edge’?\nadd rule inet edge nginx tcp dport 8885 ct state new,established counter accept\n ^^^^^\n","reason":"Failed","type":"warn"}],"status":"up","upgrade":{"current_commit_ID":"unknown"},"version":"95295","workloads":null}}
$ top -b -n 1 -H -p `pgrep device-worker`
top - 14:55:44 up 7 days, 23:55, 2 users, load average: 1.50, 2.38, 2.49
Threads: 19 total, 1 running, 18 sleeping, 0 stopped, 0 zombie
%Cpu(s): 15.0 us, 2.2 sy, 0.0 ni, 82.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 31851.1 total, 662.1 free, 14655.7 used, 16533.4 buff/cache
MiB Swap: 8192.0 total, 7672.5 free, 519.5 used. 15723.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
418551 root 20 0 755408 58840 34732 S 33.3 0.2 0:51.23 device-worker
418546 root 20 0 755408 58840 34732 S 26.7 0.2 0:52.78 device-worker
418490 root 20 0 755408 58840 34732 S 20.0 0.2 0:49.80 device-worker
418499 root 20 0 755408 58840 34732 S 20.0 0.2 0:51.04 device-worker
418484 root 20 0 755408 58840 34732 S 13.3 0.2 0:51.16 device-worker
418498 root 20 0 755408 58840 34732 S 13.3 0.2 0:51.49 device-worker
418623 root 20 0 755408 58840 34732 S 13.3 0.2 0:47.45 device-worker
418481 root 20 0 755408 58840 34732 R 6.7 0.2 0:52.38 device-worker
418485 root 20 0 755408 58840 34732 S 6.7 0.2 0:51.55 device-worker
418491 root 20 0 755408 58840 34732 S 6.7 0.2 0:51.48 device-worker
418492 root 20 0 755408 58840 34732 S 6.7 0.2 0:51.66 device-worker
418493 root 20 0 755408 58840 34732 S 6.7 0.2 0:52.55 device-worker
418497 root 20 0 755408 58840 34732 S 6.7 0.2 0:20.73 device-worker
418483 root 20 0 755408 58840 34732 S 0.0 0.2 0:00.36 device-worker
418494 root 20 0 755408 58840 34732 S 0.0 0.2 0:00.01 device-worker
418495 root 20 0 755408 58840 34732 S 0.0 0.2 0:49.99 device-worker
418496 root 20 0 755408 58840 34732 S 0.0 0.2 0:00.00 device-worker
418544 root 20 0 755408 58840 34732 S 0.0 0.2 0:00.00 device-worker
418545 root 20 0 755408 58840 34732 S 0.0 0.2 0:35.64 device-worker
Will provide more info, when I will have.
At the moment ansible
must be installed to successfully run unit tests of package ansible
because they rely on real invocations of ansible-playbook
command (via go-ansible
library).
Unit-test shouldn't depend on external resources.
We need to mock them or to present an API that will encapsulate the actual call to the external service.
I created a new Fedora36 VM.
[gloria@f36 ~]$ podman version
-bash: podman: command not found
I copied a freshly generated install-agent-dnf.sh
from the operator to the VM.
I added at the beginning of the script set -x
to list the command sequence.
I ran the ./install-agent-dnf.sh
by indicating to use the testing repo ( -t true
)
[gloria@f36 ~]$ sudo ./install-agent-dnf.sh -t true -i 192.168.1.27
+ set -e
+ FLOTTA_PORT=8043
+ getopts i:p:t:h option
+ case "${option}" in
+ TESTING_REPO=0
+ getopts i:p:t:h option
+ case "${option}" in
+ FLOTTA_API_IP=192.168.1.27
+ getopts i:p:t:h option
+ [[ -z 192.168.1.27 ]]
+ TESTING_SUFFIX=
+ [[ -n 0 ]]
+ TESTING_SUFFIX=-testing
++ grep '^VERSION_ID' /etc/os-release
++ cut -d= -f2
+ VERSION=36
+ curl -s https://copr.fedorainfracloud.org/coprs/project-flotta/flotta-testing/repo/fedora-36/project-flotta-flotta-fedora-36.repo -o /etc/yum.repos.d/project-flotta.repo
+ dnf clean all
0 files removed
+ dnf --best -y install podman node_exporter yggdrasil flotta-agent
Fedora 36 - x86_64 8.4 MB/s | 81 MB 00:09
Fedora 36 openh264 (From Cisco) - x86_64 1.5 kB/s | 2.5 kB 00:01
Fedora Modular 36 - x86_64 1.4 MB/s | 2.4 MB 00:01
Fedora 36 - x86_64 - Updates 3.2 MB/s | 25 MB 00:07
Fedora Modular 36 - x86_64 - Updates 1.5 MB/s | 2.5 MB 00:01
Copr repo for flotta-testing owned by project-flotta 42 kB/s | 43 kB 00:01
Error:
Problem: conflicting requests
- nothing provides podman >= 4:4.2.0 needed by flotta-agent-0.2.0-3.fc36.x86_64
(try to add '--skip-broken' to skip uninstallable packages)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.