Code Monkey home page Code Monkey logo

riotkit-org / infracheck Goto Github PK

View Code? Open in Web Editor NEW
18.0 3.0 0.0 341 KB

Incredibly elastic and lightweight health check endpoint to cover ANY CASE, including infrastructure as well as applications

Home Page: https://infracheck.docs.riotkit.org

License: Apache License 2.0

Dockerfile 1.61% Shell 5.59% Python 92.80%
health-check infrastructure infrastructure-monitoring hardware-failures uptime uptime-monitor iwa-ait zsp anarchism anarchosyndicalism

infracheck's Introduction

InfraCheck

Documentation Status Test and release a package GitHub release PyPI PyPI - Python Version PyPI - Wheel codecov

Health check system designed to be easy to extend by not enforcing the programming language. A single health check unit (let we call it later just 'check') can be written even in BASH.

Read more in the documentation at: https://infracheck.docs.riotkit.org/en/latest/

Running with Docker Compose

See a working example in the ./example directory.

Standalone installation and running

From sources:

# from this directory
rkd :install

infracheck --help

From PIP:

pip install infracheck

infracheck --help

External dependencies

  • whois commandline tool (apt-get install whois)
  • sshpass (apt-get install sshpass)
  • openssl

infracheck's People

Contributors

blackandred avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

infracheck's Issues

[Core] Quiet hours for a health check

Given I have expected failure of a health check at given time ex. increased load average in nightly hours during backup packaging
Then I expect that my health check will not raise alert during that hours
{
    "quietHours": [
        {"from": 7, "to": 10},
        {"from": 4, "to": 6}
    ]
}

Kubernetes: Security - which pods can access API server

Check that should scan all pods -> their service accounts and identify if they can access API server resources (leading to leaking access to API server in case of compromising a Kubernetes node)

There could be two variants:

  • Alert if given cluster node has API key
  • Alert if given pods are having API keys

Invalid disk space recognition

"disk-space": {
            "ident": "disk-space=True",
            "output": "There is 5.7GB disk space at '/', nothing to worry about, defined minimum is 6GB\n",
            "status": true
        }

New check: TLS identity verification

Verification if curl -H 'Host: example.org' https://1.2.3.4 matches curl https://example.org certificate.
Reason: Manipulated man-in-the-middle proxy detection

Example:

{
    "type": "tls-integrity",
    "input": {
        "src_url": "1.2.3.4",
        "hostname": "example.org"
    }
}

Implement check: docker-container-log

Implement a check that will search for docker container logs, allow limiting by X last lines, since given time (in seconds could be), and specifying if the regexp should be in logs or not.

New check: SMTP

A check that is checking if the SMTP credentials and parameters are still valid.

  1. A check is proposed to be written in Python 3.7+ using smtplib
    https://www.tutorialspoint.com/python/python_sending_email.htm
    https://docs.python.org/3/library/smtplib.html
    https://stackoverflow.com/a/12555214/6782994
    https://www.authsmtp.com/python/index.html

  2. SMTP check should NOT send any e-mails

  3. SMTP check should verify that connecting to a remote SMTP server using selected credentials, encryption and authentication method works

  4. A unittest should be written to cover at least basic cases (test that can be used as an example: https://github.com/riotkit-org/infracheck/blob/master/tests/functional_test_docker_container_log_check.py)

  5. A check file that could be used as an example and a starting point to write new check basing on it - https://github.com/riotkit-org/infracheck/blob/master/infracheck/checks/docker-container-log

  6. Credentials, host, port, username, password should be possible to provide via environment variables

  7. Script should return status as exit code - 0 when success, 1 or higher on failure

  8. Script should describe the status of the check eg. "SMTP connection successful" or "Cannot authorize". Best if possible would be to catch exceptions thrown by smtplib and translate them to meaningful messages

  9. There should be a documentation written in the check file as a comment. Check should be also added to sphinx documentaton there: https://raw.githubusercontent.com/riotkit-org/infracheck/master/docs/source/reference.rst (see includes at the bottom of the file)

Kubernetes: DaemonSet support

DaemonSet mode should collect results from all infracheck instances placed on multiple Kubernetes nodes and join together into a single results page.

Current implementation (we can call it "Deployment" mode in Kubernetes way) is showing results from single node.

New check: InfluxDB query

Make a possibility to query the InfluxDB database. A good health check could be to verify if given host is reporting metrics to the InfluxDB at all - an example query for telegraf could be also provided in the documentation.

New check: ArgoCD applications health

In case, when any application have synchronization issue, then it should be reported with a proper message.
There should be a selector that allows to specify which applications to monitor.

[Core] Optionally specify a cache lifetime per check

Not every check must be ran every time. Some things needs to be checked ex. once a day like domain expiration or tls expiration. Others like "is service working" should be checked as often as possible.

Goal: There should be a possibility to specify that some checks can be checked rarely, by keeping the results in the cache for longer time

Given I configure check with setting "results_cache_time" = 3600
And I run checks every 120s
When I run once the check then it gets cached
And I run try to run next time the check after 120s
Then the check will not run until 3600s = 1h will not pass

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.