Code Monkey home page Code Monkey logo

system_monitor's Introduction

system_monitor

Erlang telemetry collector

system_monitor is a BEAM VM monitoring and introspection application that helps troubleshooting live systems. It collects various information about Erlang and Elixir processes and applications.

Unlike observer, system_monitor it does not require connecting to the monitored system via Erlang distribution protocol, and can be used to monitor systems with very tight access restrictions. It can happily monitor systems with millions of processes.

By default the data is stored in a Postgres database, and visualized using Grafana. Ready to use docker images of Postgres with the necessary schema and Grafana with the dashboards are provided. See documentation.

Features

Process top

Information about top N Erlang processes consuming the most resources (such as reductions or memory), or have the longest message queues, is presented on process top dashboard:

Process top

Historical data can be accessed via standard Grafana time picker. status panel can display important information about the node state. Pids of the processes on that dashboard are clickable links that lead to the process history dashboard.

Process history

Process history

Process history dashboard displays time series data about certain Erlang process. Note that some data points can be missing if the process didn't consume enough resources to appear in the process top.

Application top

Application top

Application top dashboard contains various information aggregated per OTP application.

Usage example

In order to integrate system_monitor into your system, simply add it to the release apps. Add the following lines to rebar.config:

{deps,
 [ {system_monitor, {git, "https://github.com/k32/system_monitor", {tag, "3.0.2"}}}
 ]}.

{relx,
 [ {release, {my_release, "1.0.0"},
    [kernel, sasl, ..., system_monitor]}
 ]}.

Or to mix.exs for Elixir:

defp deps() do
    [
        {:system_monitor, github: "k32/system_monitor", tag: "3.0.2"}
    ]
end

To enable export to Postgres:

application:load(system_monitor),
application:set_env(system_monitor, callback_mod, system_monitor_pg)

Custom node status

system_monitor can export arbitrary node status information that is deemed important for the operator. This is done by defining a callback function that returns an HTML-formatted string (or iolist):

-module(foo).

-export([node_status/0]).

node_status() ->
  ["my node type<br/>",
   case healthy() of
     true  -> "<font color=#0f0>UP</font><br/>"
     false -> "<mark>DEGRADED</mark><br/>"
   end,
   io_lib:format("very important value=~p", [very_important_value()])
  ].

This callback then needs to be added to the system_monitor application environment:

application:set_env(system_monitor, node_status_fun, {?MODULE, node_status})

More information about configurable options and the defaults is found here.

What are the preconfigured monitors

  • check_process_count Logs if the process_count passes a certain threshold
  • suspect_procs Logs if it detects processes with suspiciously high memory

system_monitor_pg allows for Postgres being temporary down by storing the stats in its own internal buffer. This buffer is built with a sliding window that will stop the state from growing too big whenever Postgres is down for too long. On top of this system_monitor_pg has a built-in load shedding mechanism that protects itself once the message length queue grows bigger than a certain level.

Release History

See our changelog.

License

Copyright © 2020 Klarna Bank AB Copyright © 2021-2022 k32

system_monitor's People

Contributors

k32 avatar andreashasse avatar onno-vos-dev avatar taddic avatar mikpe avatar lsxredrain avatar iequ1 avatar enidgjoleka avatar onnovos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.