Code Monkey home page Code Monkey logo

grafana-dashboard's Introduction

Tarantool Grafana dashboard

Dashboard for Tarantool application and database server monitoring, based on grafonnet library.

Our pages on Grafana Official & community built dashboards: InfluxDB version, Prometheus version, InfluxDB TDG version, Prometheus TDG version.

Refer to dashboard documentation page for prerequirements and installation guide.

Table of contents

Installation

  1. Open Grafana import menu.

    Grafana import menu

  2. To import a specific dashboard, choose one of the following options:

  3. Set dashboard name, folder and uid (if needed).

    Dashboard import variables

  4. Choose datasource and datasource variables on the dashboard.

    Dashboard datasource variables

Monitoring cluster

For guide on setting up your monitoring stack refer to documentation page.

Example app

This repository provides preconfigured monitoring cluster with example Tarantool app and load generatior for local dashboard development and tests.

docker-compose up -d

will start 6 containers: Tarantool App, Tarantool Load Generator, Telegraf, InfluxDB, Prometheus and Grafana, which build cluster with two fully operational metrics datasources (InfluxDB and Prometheus), extracting metrics from Tarantool App example project. We recommend using the exact versions we use in experimental cluster (e.g. Grafana v8.1.5). After start, Grafana UI will be available at localhost:3000. You can also interact with Prometheus at localhost:9090 and InfluxDB at localhost:8086.

Monitoring local app

If you want to monitor Tarantool cluster deployed on your local host, you can use monitoring cluster similar to example app one.

Configure Telegraf/Prometheus to monitor your own app in example_cluster/telegraf/telegraf.localapp.conf and example_cluster/prometheus/prometheus.localapp.yml. Use host.docker.internal as your machine host in configuration and set cluster instances ports as targets and correct metrics HTTP path. See more setup tips in documentation.

Start cluster with

docker-compose -f docker-compose.localapp.yml -p localapp-monitoring up -d

After start, Grafana UI will be available at localhost:3000. You can also interact with Prometheus at localhost:9090 and InfluxDB at localhost:8086.

Manual build

go v.1.14 or greater is required to install build and test dependencies. Run

make build-deps

to install dependencies that are required to build dashboards.

Run

make test-deps

to install build dependencies and dependencies that are required to run tests locally.

To build a custom dashboard, run make build command with your specific configuration.

make CONFIG=config.yml OUTPUT=mydashboard.json build

See repository example config config.yml for detailed info about supported options.

You can run tests with

make run-tests

Compiled dashboard test files can be updated with

make update-tests

It also formats all source files with jsonnetfmt.

Adding your panels

If you're interested in building grafonnet dashboards or custom panels, I suggest you to start with reading our grafonnet tutorial: in English, in Russian.

You can add your own custom panels to the bottom of the template dashboard.

  1. Add tarantool/grafana-dashboard as a dependency in your project with jsonnet-bundler. Run

    jb init

    to initialize jsonnet-bundler and add this repo to jsonnetfile.json as a dependency:

    {
      "version": 1,
      "dependencies": [
        {
          "source": {
            "git": {
              "remote": "https://github.com/tarantool/grafana-dashboard"
            }
          },
          "version": "master"
        }
      ],
      "legacyImports": true
    }

    Run

    jb install

    to install dependencies. grafonnet library will also be installed as a transitive dependency.

  2. Load a configuration, same as in "Manual build" section. (You can build it as a dictionary in code instead of parsing a YAML file.)

    # my_dashboard.jsonnet
    local config = import 'grafana-dashboard/dashboard/build/config.libsonnet';
    local raw_cfg = importstr 'config.yml';
    
    local cfg = config.prepare(std.parseYaml(raw_cfg));
  3. Import the main template.

    # my_dashboard.jsonnet
    local dashboard = import 'grafana-dashboard/dashboard/build/dashboard.libsonnet';
  4. To add your custom panels to a dashboard template, you must create panel objects.

    A row panel can be created by using the following script:

    # my_dashboard.jsonnet
    local common = import 'grafana-dashboard/dashboard/panels/common.libsonnet';
    
    local my_row = common_panels.row('My custom metrics')

    Panel with metrics data consists of a visualisation base (graph, table, stat etc.) and one or several datasource queries called "targets". To build a simple visualization graph, you may use common.default_graph util.

    # vendor/grafana-dashboard/dashboard/panels/common.libsonnet
    
    default_graph( # graph panel shortcut
      cfg, # Dashboard configuration
      title, # The title of the graph panel
      description, # (optional) The description of the panel
      format, # (default 'none') Unit of the Y axes
      min, # (optional) Min of the Y axes
      max, # (optional) Max of the Y axes
      labelY1, # (optional) Label of the left Y axis
      decimals, # (default null) Override automatic decimal precision for legend and tooltip
      decimalsY1, # (default null) Override automatic decimal precision for the left Y axis
      legend_avg, # (default true) Show average in legend
      legend_max, # (default true) Show max in legend
      panel_height, # (default 8) Panel heigth in grid units
      panel_width, # (default 8) Panel width in grid units, max is 24
    )

    Panel size is set with grid units. Grafana uses square-type grid where dashboard width is 24 units. For example, row size is 24 x 1 units and Grafana new panel size is 12 x 9 units.

    If you want to build non-graph panel or a graph panel with more complicated configuration, use grafonnet templates. You must set a size of each panel before adding it to our dashboard template. For each grafonnet panel, add { gridPos: { w: width, h: height } } to it. For example,

    local grafana = import 'grafonnet/grafana.libsonnet';
    
    local my_graph = grafana.graphPanel.new(
      title='My custom panel',
      points=true,
    ) { gridPos: { w: 6, h: 4 } };

    To build a target, you should use common utils.

    # vendor/grafana-dashboard/dashboard/panels/common.libsonnet
    
    target( # plain "select metric" shortcut
      cfg, # Dashboard configuration
      metric_name, # Target metric name to select
      additional_filters, # (optional) Query additional filter conditions. The structure is{ prometheus: filters, influxdb: filters }, filters have the same format as in cfg
      legend, # (optional) Target result legend. The structure is{ prometheus: legend_str, influxdb: legend_str }
      group_tags, # (InfluxDB only, optional). Target result group rules. All tags used in legend are expected to be here too
      converter, # (InfluxDB only, default 'mean') InfluxDB metrics converter (aggregation, selector, etc.)
      rate, # (default false) Whether to transform the metrics as rate
    ),

    To build more compound targets, use grafonnet library prometheus and influxdb templates.

    To add a target to a panel, call addTarget(target).

    To summarise, you can build a simple 'select metric' panel with

    local common = import 'grafana-dashboard/dashboard/panels/common.libsonnet';
    local variable = import 'grafana-dashboard/dashboard/variable.libsonnet';
    
    local my_custom_component_memory_graph = common.default_graph(
      cfg,
      title='My custom component memory',
      description=|||
        My custom component used memory.
        Shows mean value.
      |||,
      format='bytes',
      panel_width=12,
      panel_height=6,
    ).addTarget(common.target(cfg, 'my_component_memory'))

    and a simple rps panel with

    local common = import 'grafana-dashboard/dashboard/panels/common.libsonnet';
    local variable = import 'grafana-dashboard/dashboard/variable.libsonnet';
    
    local my_custom_component_rps_graph = common.default_graph(
      cfg,
      title='My custom component load',
      description=|||
        My custom component processes requests
        and collects info on process to summary collector
        'my_component_load_metric'.
      |||,
      labelY1='requests per second',
      panel_width=18,
      panel_height=6,
    ).addTarget(common.target(cfg, my_component_load_metric_count', rate=true))

    For more panel tips and examples, please examine this template dashboard source code and test cases.

    To add your custom panels, call addPanel(panel) or addPanels(panel_array) in dashboard template:

    # my_dashboard.jsonnet
    local dashboard = import 'grafana-dashboard/dashboard/build/dashboard.libsonnet';
    
    ...
    
    local my_dashboard_template = dashboard.addPanels([
      my_row, my_custom_component_memory_graph, my_custom_component_rps_graph
    ]);

    Finally, call build() to compute panels positions and build a resulting dashboard:

    # my_dashboard.jsonnet
    ...
    my_dashboard_template.build()

    Do not use ; in the end of your script so resulting dashboard will be returned as output.

  5. To save resulting dashboard into output.json file, use

    jsonnet -J ./vendor/ my_dashboard.jsonnet -o ./output.json

    and to save output into clipboard, use

    jsonnet -J ./vendor/ my_dashboard.jsonnet -o ./output.json | xclip -selection clipboard

Contacts

If you have questions, please ask it on StackOverflow or contact us in Telegram:

grafana-dashboard's People

Contributors

artembo avatar differentialorange avatar nickvolynkin avatar oleg-jukovec avatar opomuc avatar vasiliy-t avatar yngvar-antonsson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grafana-dashboard's Issues

Alert rules documentation

Since #59 we have a set of Prometheus alert rules. But it is just an example yml file with some comments. It would be more convenient for customers which interested in "how to set up alerts for tarantool cluster" to have some documentation page describing what and how you should monitor. Of course, we may base it on #59 results.

Overall load panels are red on low load

It the load is low, overall load panels show values in red. I think it's not a good idea to scare user like that since this may be expected (for example, HTTP handle only used to control the state of an app)

image
image

Separate load from example app

Example app is a luatest code that both creates the cluster and generates the load so graphs will be non-empty. It would be better if we had an example cluster and load generator separately. For example, cluster could be bootstrapped with cartridge-cli(#34).

Add customization guide

It is hard for beginners to understand how to build their own dashboard. The source code is a bit complicated, and there are no convenient way to add custom panels to existing dashboard -- they need to copypaste an entire dashboard. This way is also inconvenient if one wants to add its own panels to the tail of a dashboard and get all source dashboard updates automatically.

We need to add some guide on custom panel build (maybe in form of an article) and support adding custom panels to already existing dashboard in code.

Bootstrap example cluster with cartridge-cli

Luatest is not a tool created for bootstrapping configured (with both roles and some clusterwide config) cluster, but it can do it while cartridge-cli can't. We should consider moving to cartridge-cli bootstrap in example cluster when cli will be able to do this.

Health check

There is no health check both in dashboard and default Cartridge/metrics tools. Health check is required for overview panels.

Proposition: "if instance is sending any metrics, it is alive; otherwise, instance is not running (or something else is unhealthy)".

Build dashboard with pre-defined data source

It's impossible to build json with fixed datasource since all dashboards choose target type based on template variable value

if datasource == '${DS_PROMETHEUS}' then
prometheus.target(
expr=std.format('rate(%s{job=~"%s"}[%s])',
[metric_name, job, rate_time_range]),
legendFormat='{{alias}}',
)
else if datasource == '${DS_INFLUXDB}' then
influxdb.target(
policy=policy,
measurement=measurement,
group_tags=['label_pairs_alias'],
alias='$tag_label_pairs_alias',
).where('metric_name', '=', metric_name)
.selectField('value').addConverter('mean').addConverter('non_negative_derivative', ['1s']),

If someone decides to build compiled dashboard with myinflux datasource, it won't work out.

"example" project is not helpful for beginners

There is an example (https://github.com/tarantool/grafana-dashboard/tree/master/example) in our repo. It's not trivial to understand how does it works for person who first time run Grafana and Prometheus.

First of all, "project" application doesn't expose any metrics.

User need manually add to init.lua:

local metrics = require('cartridge.roles.metrics')
metrics.set_export({
    {
        path = '/metrics/prometheus',
        format = 'prometheus',
    }
})

Secondly it's not obvious how to install and run Prometheus and Graphana.
I think it's a lack of simple readme.md file that says:

brew install prometheus
brew install graphana
brew services start grafana # grafana will be available on :3000
prometheus --config.file="prometheus/prometheus.yml" # prometheus will be available on :9090
# Here we need to verify "Status" -> "Targets" page.

Also it's better to change target urls from "example_project:" to "localhost:" - because basically user will run application using cartridge start that starts applications on the localhost.

Please add replicasets.yml file to test project. It will help to avoid some excess actions from user - all we need to call cartridge replicasets setup.

Finally, it would be great to add a small description about steps how to import dashboard from grafana.com. It's easier if user understand that "job" option means. It wasn't so in my case and I spent some time to understand what's wrong.

Example cluster don't start

example_project_1  | .///cluster/integration/bootstrap_test.lua:110: attempt to index field 'cluster' (a nil value)
example_project_1  | stack traceback:
example_project_1  | 	/app/.rocks/share/tarantool/luatest/capture.lua:139: in function '__index'
example_project_1  | 	.///cluster/integration/bootstrap_test.lua:110: in main chunk
example_project_1  | 	[C]: in function 'require'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/loader.lua:43: in function 'load_tests'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/runner.lua:25: in function </app/.rocks/share/tarantool/luatest/runner.lua:14>
example_project_1  | 	[C]: in function 'xpcall'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/utils.lua:32: in function 'load_tests'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/runner.lua:56: in function </app/.rocks/share/tarantool/luatest/runner.lua:40>
example_project_1  | 	[C]: in function 'xpcall'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/runner.lua:40: in function 'fn'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/sandboxed_runner.lua:14: in function 'run'
example_project_1  | 	/app/.rocks/share/tarantool/luatest/cli_entrypoint.lua:4: in function </app/.rocks/share/tarantool/luatest/cli_entrypoint.lua:3>
example_project_1  | 	....rocks/share/tarantool/rocks/luatest/0.5.0-1/bin/luatest:3: in main chunk
grafana-dashboard_example_project_1 exited with code 255

Monitor replication

Replication status and replication lag is an important information about cluster state. There are some replication metrics in Tarantool default metrics, but it may be not sufficient. We should study replication to find out what metrics are needed to monitor replication and then (maybe after reworking current replication metrics) add them as panels to dashboard.

Publish alerts on Prometheus.io

There is a rumor that Prometheus.io have a section with user configuration which we can use to publish our alert rules. We should check it up and publish them somewhere if its possible.

Add CHANGELOG.md

Add CHANGELOG.md and describe all previously added features in it

CPU time panels format

CPU time panels s(seconds) format is a bit confusing. It may be more convenient for users to inspect a panel with percentage format.

Add configurable InfluxDB policy

default InfluxDB policy is used on all panels now. This should be reworked to configurable (preferably on import like measurement) policy.

Fix Prometheus rps computation

If Prometheus scrape_time is equal to 1m or less, rate() computation of data vector for 1 minute will fail

rate(tnt_stats_op_total{job=~\"[[job]]\",operation=\"upsert\"}[1m])

It is equal to 1m by default, so we should increase default vector time interval at least to 2m ("at the very minimum it should be two times the scrape interval" -- https://www.metricfire.com/blog/understanding-the-prometheus-rate-function/?GAID=231802381.1611061565&GAID=231802381.1611061565) and make in configurable.

Publish to Grafana with CI/CD

It would be convenient to publish new version of dashboard to Grafana Official & community built dashboards page with some sort of CI/CD, if it is possible.

Add some illustrations and descriptions

The dashboard should be described somewhere. It is needed to compile the board now to get what it consists of or read through the code. The description should be illustrated with screenshots.

Add cluster issues panel

Cluster issues metric is an important metric in terms of cluster overview. For example, original "issues" button is placed in top of Cartridge UI. I think it should be added as part of cluster overview panels.

Set alias on example cluster

Since moving on metrics role (#10) alias should be set through env, but now it's missing.

Default dashboard don't work as it should be

Provide Graphite example

metrics module supports Graphite. We should provide an example with Graphite monitoring stack, similar to Prometheus and InfluxDB ones.

Add network memory panels

HTTP is not the only way of interacting with Tarantool instances. Binary protocol connections (iproto) are also popular. You can monitor them with group of tnt_net_ metrics. It should be decided which tnt_net_ are most helpful and then they should be added as panels to dashboard.

Autoscreenshooting tool

It is an annoying and monotonous job to make screenshots of new dashboard each time. Maybe there are some tools to make this with some script.

Release on Grafana Dashboards with collaborative account

There is no possibility to share access on dashboard edit and publish in Grafana Official and community built dashboards now. We need to create come kind of collaborative account to make it able to edit and upload new versions of dashboard with our team.

Add Lua memory panel

Lua memory is one of the most important metric in Tarantool default metrics. At least, it is the one that should be wrapped in alert because of strict 2 Gb limit per instance.

[2pt] Create GitHub Actions workflows to translate documentation

The documentation translation process should be automated. We need to implement a set of tasks at each stage of tech writer's workflow to maintain 100% translation for each repository containing documentation.

  • Tech writer creates PR with a documentation update (rst sources).
    • (push-translation on pull_request) GitHub workflow builds po files from rst and sends them to the same name branch in Crowdin
  • Pull translations back
    • (pull-translation on workflow_dispatch) When translation of the corresponding changes is ready, tech writer manually triggers GitHub workflow for the translated branch (PR branch)
  • Tech writer or someone responsible merges PR
    • (upload-translation on push to master) GithHub workflow upload merged translations from po files into Crowdin master (or main) branch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.