Code Monkey home page Code Monkey logo

system-manager's Introduction

NuvlaEdge

Maintenance GitHub issues GitHub release GitHub release

This repository contains the NuvlaEdge source code, a microservice based agent for Nuvla.io. NuvlaEdge consists in the following services:

  • Agent: Main NuvlaEdge component that implements the Nuvla protocol, gathers system configuration and statistics and runs jobs from Nuvla.
  • System Manager: NuvlaEdge watchdog component. Monitors the different microservices and heals them if they fail.
  • Peripherals: NuvlaEdge add-ons that allows the detection of differnt types of devices:
    • Network
    • Bluetooth
    • USB
    • Modbus
    • GPU

For installation instructions, read the online documentation.

Latest releases and artifacts

repository release artifact
NuvlaEdge (deployment) GitHub tag (latest SemVer)
NuvlaEdge GitHub tag (latest SemVer) Docker Image Version (latest semver)

Build Status

To get more information on the latest builds click on the build status badges below.

repository status
NuvlaEdge (deployment) Build Status
Build Status
NuvlaEdge Build Status
Job Engine Build Status

Project tools

The project uses poetry for the project and dependency management and tox for tests execution and results reporting.

Running unit tests

Before running unit tests with tox you need to generate requirements file out of the per-component dependency lists provided in the poetry's project definition file.

For that run the following wrapper script:

./generate-requiremenents.sh

Then run the unit tests with:

tox

Copyright

Copyright © 2024, SixSq SA

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

system-manager's People

Contributors

ignacio-penas avatar konstan avatar mebster avatar schaubl avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

system-manager's Issues

[FEATURE] add Nuvla tagging mechanism from user labels

Is your feature request related to a problem? Please describe.
Nuvla placement policies shall be based on Docker labels -> Nuvla resource tags. Thus the system-manager should be able to parse all Docker labels assigned to the NuvlaBox deployment and automatically push those into Nuvla, for an easier and complementary tagging mechanism

Describe the solution you'd like
Simply add a procedure in the system manager, at start-up, that looks for all Docker labels that are relevant to NuvlaBox

Speed up and refactor CI/CD workflows

  • Make use of build-push-actions, qemu, and buildx to build images in parallel and accelerate the build process.
  • Use cache in docker base images, python (for unittests) and intermediate builds
  • Create and independent workflow for development (nuvladev) images
    - [ ] #53 descoped

[FEATURE] Implement memory control

Allow the system manager to use the memory monitoring to perform memory limit controls to prevent memory overloads that runs OOM exception in a system host level.

design and implement NB API

Feature set:

  • set node labels
  • reboot
  • execute command? (ssh nuvla credential vs system-manager bind mount .ssh/authorized_keys)
  • start data_gateway routers
  • publish local dashboard temporarily

[FEATURE] on startup failure report which system requirement is not met

Is your feature request related to a problem? Please describe.

In case minimum requirements for installation are not met, the system manager prints the following and bails out.

system-manager_1   | ERROR - run.py/run/requirements_check - System does not meet the minimum requirements!
system-manager_1   | Cannot continue...

The actual requirement that is not met is not specified in the error message.

Describe the solution you'd like

The requirements that are not met are logged.

Additional context

system-manager_1   | ERROR - run.py/run/requirements_check - System does not meet the minimum requirements!
system-manager_1   | Cannot continue...

add docker security benchmark as a requirement

docker run -it --net host --pid host --userns host --cap-add audit_control -e DOCKER_CONTENT_TRUST=$DOCKER_CONTENT_TRUST -v /var/lib:/var/lib -v /var/run/docker.sock:/var/run/docker.sock -v /usr/lib/systemd:/usr/lib/systemd -v /etc:/etc --label docker_bench_security \ docker/docker-bench-security

JSONDecodeError exception in the log

Sometime I see this exception on my RPi4:

Traceback (most recent call last):
  File "/opt/nuvlabox/./run.py", line 103, in <module>
    self_sup.write_container_stats_table_html()
  File "/opt/nuvlabox/system_manager/Supervise.py", line 189, in write_container_stats_table_html
    container_stats = json.load(cstats)
  File "/usr/local/lib/python3.9/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/local/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

[FEATURE] periodically remove zombie containers

Is your feature request related to a problem? Please describe.
containers that are started as part of the data gateway might end up not being well cleaned, if the corresponding peripheral is removed and network is unstable

Describe the solution you'd like
When getting all containers, check for those labelled as data-gateway specific and double check whether respective peripherals (matching nuvla resource id) are still in the system

[BUG] catch Docker exceptions for when NB containers are down

Describe the bug

system-manager_1   | Traceback (most recent call last):
system-manager_1   |   File "./app.py", line 127, in <module>
system-manager_1   |     supervisor.build_content()
system-manager_1   |   File "/opt/nuvlabox/system_manager/Supervise.py", line 89, in build_content
system-manager_1   |     self._get_stats_table_html())
system-manager_1   |   File "/opt/nuvlabox/system_manager/Supervise.py", line 127, in _get_stats_table_html
system-manager_1   |     cpu_system = float(container_stats["cpu_stats"]["system_cpu_usage"])
system-manager_1   | KeyError: 'system_cpu_usage'

To Reproduce

Deploy a NB with a restarting container (like vpn-client)

Expected behavior
Simply do not account that container

improve operational status assessment

atm the operational-status is set randomly, possibly inducing a wrong status.

instead, improve the mechanism in such a way that different checks in the system can contribute to the final assessment

[BUG] status-notes fail the spec

Describe the bug
API-SERVER expect for status-notes a vector without blank strings.
When there is multiple lines, some empty lines are sent to api-server that will return a spec validation failure.
Perhaps we should relax the spec to accept vector of strings even if empty to be less error prone

To Reproduce
When we have a status-notes like this one:
cat /var/lib/docker/volumes/nuvlabox_nuvlabox-db/_data/.status-notes

System does not meet the minimum requirements!
	* Your device only provides 977.61 MBs of memory. MIN REQUIREMENTS: 1024 MBs

Cannot connect nuvlabox_agent_1 to Data Gateway network

Steps to reproduce the behavior:

  1. Deploy with '...'
  2. Go to '....'
  3. Click on '....'
  4. See error

Expected behavior
No spec issue

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.