Code Monkey home page Code Monkey logo

stat's Introduction

DEPRECATED

Please note, this project is deprecated and no longer being maintained, please use metrics.

stat - the status module for Tarantool 1.7+ Tarantool

Forked from https://github.com/dedok/tarantool-stat

Installation

  1. Add tarantool repository for yum or apt
  2. Install
$ sudo [yum|apt-get] install tarantool tarantool-stat

Getting Started

Prerequisites

  • Tarantool 1.7+ with header files (tarantool && tarantool-dev)

API Documentation

st = stat.stat()

returns various metrics about Tarantool

Args:

  • include_vinyl_count - include space:count if engine is vinyl, default value is true
  • only_numbers - include only numeric metrics, default value is false

Returns:

  • a table on success
  • error(reason) on error

Example

  tarantool> require('json').encode(require('stat').stat({ only_numbers = true }))
{
  //
  // Number of config options
  "cfg.listen":"3301",
  "cfg.current_time":1512430376,
  "cfg.hostname":"some_hostname",
  "cfg.read_only":false,
  
  //
  // Lag if instance is readonly 
  "replication.replica.<instance_id>.lsn":0,
  
  // Lag if instance is not readonly 
  "replication.master.<instance_id>.lsn":0,
  
  "info.pid":16930,
  "info.uuid":"b2441213-5663-4be0-8c0e-1c887f0d7c7b",
  
  // Current last sequence number
  "info.lsn":0,
  
  // Uptime of this instance
  "info.uptime":20,
  
  // Corresponds to replication.downstream.vclock - the instance’s vector clock 
  "info.vclock.<instance_id>":12,
  
  // The time difference between the local time at the instance, recorded when the event was received, 
  // and the local time at another master recorded when the event was written to the write ahead log on that master
  "info.replication.<instance_id>.lag":0.00001,
  
  //
  // Slab information
  // https://tarantool.org/doc/1.7/book/box/box_slab.html?highlight=slab%20info#lua-function.box.slab.info
  "slab.arena_size":5415928,
  "slab.arena_used":3062008,
  "slab.arena_used_ratio":56.5,
  "slab.items_size":1221832,
  "slab.items_used":735480,
  "slab.items_used_ratio":60.19,
  "slab.quota_size":104857600,
  "slab.quota_used":33554432,
  "slab.quota_used_ratio":32,
  
  //
  // is the current memory size used by Lua
  "runtime.used":29360128,
  // is the heap size of the Lua garbage collector;
  "runtime.lua":1247700,
  
  //
  // Number of various operations since Tarantool started
  "stat.op.delete.total":0,
  "stat.op.delete.rps":0,
  "stat.op.error.total":0,
  "stat.op.error.rps":0,
  "stat.op.insert.total":0,
  "stat.op.insert.rps":0,
  "stat.op.eval.total":0,
  "stat.op.eval.rps":0,
  "stat.op.auth.total":0,
  "stat.op.auth.rps":0,
  "stat.op.update.total":0,
  "stat.op.update.rps":0,
  "stat.op.replace.total":0,
  "stat.op.replace.rps":0,
  "stat.op.call.total":0,
  "stat.op.call.rps":0,
  "stat.op.upsert.total":0,
  "stat.op.upsert.rps":0,
  "stat.op.select.total":1,
  "stat.op.select.rps":0,
  
  //
  // Information about spaces
  // Number of rows in space if memtx
  "space.<space_name>.len":2,
  // Number of rows in space if vinyl
  "space.<space_name>.count":2,
  // Size of space in bytes
  "space.<space_name>.bsize":20,
  // Size of space indexs in bytes
  "space.<space_name>.index_bsize":10,
  // Total size of space
  "space.<space_name>.total_bsize":10,
  
  //
  // Information about Fibers
  // Number of fibers
  "fiber.count":3,
  // Total fiber memory alloated
  "fiber.memalloc":177040,
  // Total fiber memory used
  "fiber.memused":0,
  "fiber.csw":102,
  
  //
  // Perf information
  "statperf.stats_t":5.6982040405273e-05,
  "statperf.fibers":8.0000000000011e-06,
  "statperf.fibers_t":7.8678131103516e-06,
  "statperf.spaces_t":0.00010204315185547,
  "statperf.spaces":0.000103,
  "statperf.stats":5.4999999999999e-05,
  
  //
  // Number of packets passed via network interface
  "stat.net.received.total":0,
  "stat.net.received.rps":0,
  // Same, but sent via network interface
  "stat.net.sent.total":0,
  "stat.net.sent.rps":0,
  // Network requests
  "stat.net.requests.current": 0,
  "stat.net.requests.rps": 0,
  "stat.net.requests.total": 0,
  // Network connections
  "stat.net.connections.current": 0,
  "stat.net.connections.rps": 0,
  "stat.net.connections.total": 0,
}

$ tnt_check.sh

grep all tarantool@* instances and check parameters:

  • box.slab.info().arena_used_ratio
  • box.info().status
  • replication status

Returns:

  • empty result on success
  • error(reason) on error

$ tnt_collectd.sh

grep all tarantool@* instances, perform require('stat').stat() and generate putval statements

Returns:

  • list of metrics on success
  • error(reason) on error

Example

$ tnt_collectd.sh

PUTVAL "host/tarantool/tarantool-slab_arena_size" interval=1 N:7483488
PUTVAL "host/tarantool/tarantool-space_chat_bsize" interval=1 N:38774
PUTVAL "host/tarantool/tarantool-statperf_fibers_t" interval=1 N:0.039196491241455
PUTVAL "host/tarantool/tarantool-stat_net_received" interval=1 N:125297500
PUTVAL "host/tarantool/tarantool-space_question_bsize" interval=1 N:7670
PUTVAL "host/tarantool/tarantool-stat_op_update" interval=1 N:23988
...

See Also

stat's People

Contributors

alexopryshko avatar asverdlov avatar bofm avatar dedok avatar ilmarkov avatar kostja avatar rtsisyk avatar sharonovd avatar totktonada avatar vanyarock01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

bofm dsamirov

stat's Issues

Add detailed replication statistics

Replication statistics is not clear enough. It should use instance's host:port instead of indexes.

For example:

info.replication.localhost:3301.lag: 0.00013041496276855

instead of

info.replication.1.lag: 0.00013041496276855

Table overflow in util.gethostname()

Test case:

Tarantool 1.9.0-4-g195d4462d
type 'help' for interactive help
tarantool> box.cfg{}
2018-03-12 17:04:06.332 [1] main/101/interactive I> systemd: NOTIFY_SOCKET variable is empty, skipping
2018-03-12 17:04:06.332 [1] main/101/interactive C> Tarantool 1.9.0-4-g195d4462d
2018-03-12 17:04:06.332 [1] main/101/interactive C> log level 5
2018-03-12 17:04:06.332 [1] main/101/interactive I> mapping 268435456 bytes for memtx tuple arena...
2018-03-12 17:04:06.332 [1] main/101/interactive I> mapping 134217728 bytes for vinyl tuple arena...
2018-03-12 17:04:06.334 [1] main/101/interactive I> initializing an empty data directory
2018-03-12 17:04:06.336 [1] snapshot/101/main I> saving snapshot `./00000000000000000000.snap.inprogress'
2018-03-12 17:04:06.344 [1] snapshot/101/main I> done
2018-03-12 17:04:06.345 [1] main/101/interactive I> ready to accept requests
2018-03-12 17:04:06.345 [1] main/104/checkpoint_daemon I> started
2018-03-12 17:04:06.345 [1] main/104/checkpoint_daemon I> scheduled the next snapshot at Mon Mar 12 18:38:09 2018
---
...

tarantool> for i=1,99999 do require('stat').stat() end
---
- error: table overflow
...

tarantool> require('stat').stat()
---
- error: table overflow
...

tarantool> collectgarbage('collect')
---
- 0
...

tarantool> require('stat').stat()
---
- error: table overflow
...

The error is most likely caused by require('stat/util').gethostname() function.

t:3301> require('stat').stat()
---
- error: table overflow
...


t:3301> require('stat/util').gethostname()
---
- error: table overflow
...


t:3301> require('stat/util').gethostname = function() return 'hollywood' end
---
...

t:3301> require('stat').stat()
---
- stat.op.insert.rps: 0
  stat.op.error.rps: 0
  info.memory.data: 4408565616
  ...
  <it works>

For some reason in our environment this table overflow error starts to be the case after the Tarantool instance has been running for a few days.

Possible workarounds:

  • rewrite util.gethostname()
  • call /bin/hostname in a subprocess

Crash if a replica is down

Stat crashes if one of the replicas is down:

Unhandled error: ...ntool/testA_A_3/tnt/.rocks/share/tarantool/stat/init.lua:72: attempt to index field 'vclock' (a nil value)

stack traceback:

    ...ool/testA_A_3/tnt/.rocks/share/tarantool/http/server.lua:695: in function 'process_client'

    ...ool/testA_A_3/tnt/.rocks/share/tarantool/http/server.lua:1139: in function <...ool/testA_A_3/tnt/.rocks/share/tarantool/http/server.lua:1138>

    [C]: in function 'pcall'

    builtin/socket.lua:997: in function <builtin/socket.lua:995>

From the instance console:

localhost:3302> box.info.replication

localhost:3302> ---
...

5:

id: 5

uuid: bdf6d429-7b71-4108-bf6a-d224a5e801b1

lsn: 0

upstream:

  peer: XXXXXXX

  lag: 0.00026226043701172

  status: disconnected

  idle: 3264.8741423339

  message: connect, called on fd 34, aka 10.199.200.193:14269

downstream:

  status: stopped

  message: unexpected EOF when reading from socket, called on fd 33, aka 10.199.200.193:3302

feature - replication status

Please add replication status information.

Example

"replication.replica.<instance_id>.status": "follow",

or

"replication.master.<instance_id>.status": "follow",

apt installation fail

% cat /etc/apt/sources.list.d/tarantool_1_10.list
deb http://download.tarantool.org/tarantool/1.10/ubuntu/ bionic main
deb-src http://download.tarantool.org/tarantool/1.10/ubuntu/ bionic main
% sudo apt install tarantool-stat
E: Unable to locate package tarantool-stat

Update README.MD

Information about how this module could be used with statd or collectd should be added to the README.md.

This is important since the module can work with grafana, etc. Also is good to see some documentation (mini how-to) about how the module could be integrated with grafana, with good explanation of metrics.

include_vinyl_count should be false by default

include_vinyl_count option should be false by default because it hits performance and causes latency spikes when spaces contain much data.

Moreover, people who have been using stat package for monitoring purposes may get latency problems after upgrading stat to the recent version from the old one.

Add version

This module should have own version. And it should start from 0.1.0.

RPM

This package should be delivered via RPM and DEP.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.