Code Monkey home page Code Monkey logo

flotta-device-worker's People

Contributors

ahmadateya avatar bardielle avatar eloycoto avatar gabriel-farache avatar gciavarrini avatar jakub-dzon avatar jordigilh avatar machacekondra avatar masayag avatar pkliczewski avatar tupyy avatar ydayagi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

flotta-device-worker's Issues

Removing the workload from the device-config.json will also remove the workload in the control plane with 1 device

Steps to reproduce:

  • Deploy a workload
  • Label one device to match the workload's selector
  • SSH into the device, and as root remove the workload field in the /etc/yggdrasild/device-config.json
  • The agent will remove the running pod and related service files

Actual behavior:
The control plane deletes the CR instance of the workload and the device's heartbeat no longer carries the workload information.

Expected behavior:
The control plane keeps the CR and, during the heartbeat reconciliation loop, it resets the workload configuration in the device so that it triggers the deployment again.

Unable to start pod named `log` due to conflict in keyword with NFT

When I run this workload

apiVersion: management.project-flotta.io/v1alpha1
kind: EdgeWorkload
metadata:
  name: log
spec:
  deviceSelector:
    matchLabels:
      app: webcam
  type: pod
  pod:
    spec:
      containers:
        - name: log
          image: docker.io/eloycoto/logexample

it fails due to this error shown in the device's log:

Jul 28 11:38:59 fedora yggdrasild[837]: [yggdrasild] 2022/07/28 11:38:59 /usr/libexec/yggdrasil/device-worker: failed to delete chain 'log' from table 'edge' for workload 'log': running [/usr/sbin/nft list chain inet edge log]: exit status 1: Error: syntax error, unexpected log, expecting string
Jul 28 11:38:59 fedora yggdrasild[837]: [yggdrasild] 2022/07/28 11:38:59 /usr/libexec/yggdrasil/device-worker: list chain inet edge log
Jul 28 11:38:59 fedora yggdrasild[837]: [yggdrasild] 2022/07/28 11:38:59 /usr/libexec/yggdrasil/device-worker:                      ^^^

Proposed solution is to prepend with wl- the chain name to avoid hitting keywords in nft and being able to still meet the max length requirements in NFT and kubernetes.

Removing the workload does not disable the service

Removing the workload does not remove the service link in the flotta systemd directory, even though the service file has been removed:

[flotta@fedora user]$ ls -la /var/home/flotta/.config/systemd/user/mylog.service
ls: cannot access '/var/home/flotta/.config/systemd/user/mylog.service': No such file or directory
[flotta@fedora user]$ ls -la default.target.wants/
total 0
drwxr-xr-x. 2 flotta flotta 27 Aug  2 15:21 .
drwxr-xr-x. 4 flotta flotta 62 Aug  2 16:12 ..
lrwxrwxrwx. 1 flotta flotta 51 Aug  2 15:21 mylog.service -> /var/home/flotta/.config/systemd/user/mylog.service

Workloads can modify systemd user services by means of mounting the device's file system and escaping SELinux.

Steps to reproduce:

  • Run this workload:
apiVersion: management.project-flotta.io/v1alpha1
kind: EdgeWorkload
metadata:
  name: mount
  annotations:
    podman/run.oci.keep_original_groups: "1"
spec:
  deviceSelector:
    matchLabels:
      app: mount
  type: pod
  pod:
    spec:
      containers:
      - image: docker.io/eloycoto/logexample
        name: fedora
        volumeMounts:
        - mountPath: /home/flotta/
          name: home
        securityContext:
          seLinuxOptions:
            type: 'spc_t'
      restartPolicy: Always      
      volumes:
      - name: home
        hostPath:
          path: /var/home/flotta
          type: File
  • Label the edgeworkload with app=mount so that the workload will run in the device
  • SSH to the device and then su into the flotta user: su -l flotta -s /bin/bash
  • Run a shell inside the container that runs the worlkoad: podman exec -it mount-fedora bash
  • Remove the soft link mount.service found in /home/flotta/.config/systemd/user/default.target.wants/
[root@mount /]# ls -la /home/flotta/.config/systemd/user/default.target.wants
total 0
drwxr-xr-x. 2 root root  27 Aug  4 22:23 .
drwxr-xr-x. 4 root root 121 Aug  4 22:23 ..
lrwxrwxrwx. 1 root root  51 Aug  4 22:23 mount.service -> /var/home/flotta/.config/systemd/user/mount.service
[root@mount /]# rm /home/flotta/.config/systemd/user/default.target.wants/mount.service 
rm: remove symbolic link '/home/flotta/.config/systemd/user/default.target.wants/mount.service'? y
[root@mount /]#
  • Exit the container and check that the file has been deleted:
[flotta@fedora user]$ ls -la default.target.wants/
total 0
drwxr-xr-x. 2 flotta flotta  6 Aug  4 18:11 .
drwxr-xr-x. 4 flotta flotta 83 Aug  4 18:12 ..
  • Wait until the agent deletes the workload:
Aug 04 18:12:09 fedora yggdrasild[841]: [yggdrasild] 2022/08/04 18:12:09 /usr/libexec/yggdrasil/device-worker: workload not found: mount. Removing. DeviceID: 4233c45699b644b79107306e74bccbc5;
Aug 04 18:12:20 fedora yggdrasild[841]: [yggdrasild] 2022/08/04 18:12:20 /usr/libexec/yggdrasil/device-worker: workload mount removed. DeviceID: 4233c45699b644b79107306e74bccbc5;

Note: As a side effect, the edgeworkload is removed from the control plane as well as from the device.

Integration test fails due to an error while running nft command

    Aug 08 19:34:41 43ea4fa72936 yggdrasild[172]: [yggdrasild] 2022/08/08 19:34:41 /builddir/build/BUILD/yggdrasil-0.2.99-0.86.git.3eb009b/cmd/yggd/worker.go:114: /usr/libexec/yggdrasil/device-worker: Heartbeat send: Sending data: message_id:"00e3704a-b31f-49bf-a50f-f81fc2896a7e" content:"{\"events\":[{\"message\":\"failed to add rule tcp dport 8885 ct state new,established counter accept for workload nginx: running [/usr/sbin/nft add rule inet edge nginx tcp dport 8885 ct state new,established counter accept]: exit status 1: Error: No such file or directory; did you mean chain ‘wl-nginx’ in table inet ‘edge’?\\nadd rule inet edge nginx tcp dport 8885 ct state new,established counter accept\\n                   ^^^^^\\n\",\"reason\":\"Failed\",\"type\":\"warn\"}],\"status\":\"up\",\"upgrade\":{\"current_commit_ID\":\"unknown\"},\"version\":\"95295\",\"workloads\":null}" directive:"heartbeat"; Device ID: acc8996f-3959-4a93-9ed7-ae1be8a98ca7
    Aug 08 19:34:41 43ea4fa72936 yggdrasild[172]: [yggdrasild] 2022/08/08 19:34:41 /builddir/build/BUILD/yggdrasil-0.2.99-0.86.git.3eb009b/internal/transport/http.go:113: posting HTTP request body: {"type":"data","message_id":"00e3704a-b31f-49bf-a50f-f81fc2896a7e","response_to":"","version":1,"sent":"2022-08-08T19:34:41.141049018Z","directive":"heartbeat","metadata":null,"content":{"events":[{"message":"failed to add rule tcp dport 8885 ct state new,established counter accept for workload nginx: running [/usr/sbin/nft add rule inet edge nginx tcp dport 8885 ct state new,established counter accept]: exit status 1: Error: No such file or directory; did you mean chain ‘wl-nginx’ in table inet ‘edge’?\nadd rule inet edge nginx tcp dport 8885 ct state new,established counter accept\n                   ^^^^^\n","reason":"Failed","type":"warn"}],"status":"up","upgrade":{"current_commit_ID":"unknown"},"version":"95295","workloads":null}}

Device-worker have big load on CPU

$ top -b -n 1 -H -p `pgrep device-worker`
top - 14:55:44 up 7 days, 23:55,  2 users,  load average: 1.50, 2.38, 2.49
Threads:  19 total,   1 running,  18 sleeping,   0 stopped,   0 zombie
%Cpu(s): 15.0 us,  2.2 sy,  0.0 ni, 82.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  31851.1 total,    662.1 free,  14655.7 used,  16533.4 buff/cache
MiB Swap:   8192.0 total,   7672.5 free,    519.5 used.  15723.9 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 418551 root      20   0  755408  58840  34732 S  33.3   0.2   0:51.23 device-worker
 418546 root      20   0  755408  58840  34732 S  26.7   0.2   0:52.78 device-worker
 418490 root      20   0  755408  58840  34732 S  20.0   0.2   0:49.80 device-worker
 418499 root      20   0  755408  58840  34732 S  20.0   0.2   0:51.04 device-worker
 418484 root      20   0  755408  58840  34732 S  13.3   0.2   0:51.16 device-worker
 418498 root      20   0  755408  58840  34732 S  13.3   0.2   0:51.49 device-worker
 418623 root      20   0  755408  58840  34732 S  13.3   0.2   0:47.45 device-worker
 418481 root      20   0  755408  58840  34732 R   6.7   0.2   0:52.38 device-worker
 418485 root      20   0  755408  58840  34732 S   6.7   0.2   0:51.55 device-worker
 418491 root      20   0  755408  58840  34732 S   6.7   0.2   0:51.48 device-worker
 418492 root      20   0  755408  58840  34732 S   6.7   0.2   0:51.66 device-worker
 418493 root      20   0  755408  58840  34732 S   6.7   0.2   0:52.55 device-worker
 418497 root      20   0  755408  58840  34732 S   6.7   0.2   0:20.73 device-worker
 418483 root      20   0  755408  58840  34732 S   0.0   0.2   0:00.36 device-worker
 418494 root      20   0  755408  58840  34732 S   0.0   0.2   0:00.01 device-worker
 418495 root      20   0  755408  58840  34732 S   0.0   0.2   0:49.99 device-worker
 418496 root      20   0  755408  58840  34732 S   0.0   0.2   0:00.00 device-worker
 418544 root      20   0  755408  58840  34732 S   0.0   0.2   0:00.00 device-worker
 418545 root      20   0  755408  58840  34732 S   0.0   0.2   0:35.64 device-worker

Will provide more info, when I will have.

Mock ansible invocation in unit test

At the moment ansible must be installed to successfully run unit tests of package ansible because they rely on real invocations of ansible-playbook command (via go-ansible library).

Unit-test shouldn't depend on external resources.

We need to mock them or to present an API that will encapsulate the actual call to the external service.

install-agent-dnf.sh script can't install podman >= 4:4.2.0

I created a new Fedora36 VM.

[gloria@f36 ~]$ podman version 
-bash: podman: command not found

I copied a freshly generated install-agent-dnf.sh from the operator to the VM.
I added at the beginning of the script set -xto list the command sequence.

I ran the ./install-agent-dnf.sh by indicating to use the testing repo ( -t true)

[gloria@f36 ~]$ sudo ./install-agent-dnf.sh -t true -i 192.168.1.27
+ set -e
+ FLOTTA_PORT=8043
+ getopts i:p:t:h option
+ case "${option}" in
+ TESTING_REPO=0
+ getopts i:p:t:h option
+ case "${option}" in
+ FLOTTA_API_IP=192.168.1.27
+ getopts i:p:t:h option
+ [[ -z 192.168.1.27 ]]
+ TESTING_SUFFIX=
+ [[ -n 0 ]]
+ TESTING_SUFFIX=-testing
++ grep '^VERSION_ID' /etc/os-release
++ cut -d= -f2
+ VERSION=36
+ curl -s https://copr.fedorainfracloud.org/coprs/project-flotta/flotta-testing/repo/fedora-36/project-flotta-flotta-fedora-36.repo -o /etc/yum.repos.d/project-flotta.repo
+ dnf clean all
0 files removed
+ dnf --best -y install podman node_exporter yggdrasil flotta-agent
Fedora 36 - x86_64                                                                                                 8.4 MB/s |  81 MB     00:09    
Fedora 36 openh264 (From Cisco) - x86_64                                                                           1.5 kB/s | 2.5 kB     00:01    
Fedora Modular 36 - x86_64                                                                                         1.4 MB/s | 2.4 MB     00:01    
Fedora 36 - x86_64 - Updates                                                                                       3.2 MB/s |  25 MB     00:07    
Fedora Modular 36 - x86_64 - Updates                                                                               1.5 MB/s | 2.5 MB     00:01    
Copr repo for flotta-testing owned by project-flotta                                                                42 kB/s |  43 kB     00:01    
Error: 
 Problem: conflicting requests
  - nothing provides podman >= 4:4.2.0 needed by flotta-agent-0.2.0-3.fc36.x86_64
(try to add '--skip-broken' to skip uninstallable packages)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.