Bell.js is a real-time anomalies(outliers) detection system for periodic time series, built to be able to monitor a large quantity of metrics. It collects metrics form clients like statsd, analyzes them with the 3-sigma, once enough anomalies were found in a short time it alerts us via sms/hipchat etc.
We eleme use it to monitor our website/rpc interfaces, including api called frequency, api response time(time cost per call) and exceptions count. Our services send these statistics to statsd, statsd aggregates them every 10 seconds and broadcasts the results to its backends including bell, bell analyzes current stats with history data, calculates the trending, and alerts us if the trending behaves anomalous.
You may also want to see https://github.com/eleme/noise, which is only for anomalies detection , but easier and faster.
- nodejs (>= 0.12) or iojs (>=1.1) (generator feature required)
- beanstalkd (https://github.com/kr/beanstalkd) (we are using version 1.9)
- ssdb (https://github.com/ideawu/ssdb) (we are using version 1.6.8.8)
Install bell.js via npm:
$ npm install bell.js -g
After the installation, there is a command named bell
avaliable, to start a service
(i.e. analyzer): bell analyzer -c configs.toml
:
$ bell <service-name> -c <path-to-config-file>
Here is a simple quickstart for the case with statsd, make sure statsd is ready to work.
- First, generate a sample config file via
bell -s
. - Open the sample config file (in language toml) and edit it.
- Start ssdb, beanstalkd.
- Start bell services (analyzer, listener, webapp, alerter, cleaner).
- Add
'bell.js/clients/statsd'
to statsd's backends and start statsd.
Default config file is at config/configs.toml.
Bell has 5 "services", they are started with different entries, running in separate processes:
- listener: Receive incoming stats from clients(like statsd) over TCP, pack to jobs and send them to job queue.
- analyzer(s): Get jobs from queue, analyze current datapoint via 3-sigma rule. Store analyzation result and all statistics in ssdb. Bell is scalable, we can start multiple analyzer instances to process lots of metrics.
- webapp: Visualize metrics and analyzation on the web, default prot: 8989.
- alerter: Alert once enough anomalies were detected.
- cleaner: Check the last time of a metric hitting bell every certain interval, if the age exceeds the threshold, clean it.
We are using statsd as bell's client, just add 'bell.js/clients/statsd'
to statsd config:
{
, backends: ['bell.js/clients/statsd']
}
And it's very simple to implement a custom bell client via clients/client.js:
var bell = require('bell.js');
var client = bell.createClient({port: 8889});
// send datapoints every 10 seconds
setInterval(function() {
var datapoints = [['foo', [1412762335, 3.14]], ['bar', [1412762335, 314]]];
client.send(datapoints);
}, 10 * 1000);
Bell comes with a built-in alerter: console.js, it's just an sample, but you can completely write one on your own, here are brief wiki:
-
An alerter is a nodejs module which should export a function
init
:init(configs, alerter, log)
-
To make an alerter work, add it to
alerter.modules
inconfigs.toml
:[alerter] modules = ["./path/to/myalerter.js"]
-
An nodejs event is available for the second parameter
alerter
in functioninit
: Event 'anomaly detected'- Parameters:
event
, an array like:[[metricName, [timestamp, metricValue, AnalyzationResult]], trend]
- Emitted when an anomaly was detected.
- Parameters:
There's a demo sms alerter: alerters/example-sms.js, it alerts when trending grows over 1 or -1.
Generally, we run bell services all on one machine, but analyzers may require more cpus to make
processing faster. The ssdb.*
, beanstalkd.*
, analyzer.*
and alerter.*
should be configured
to run separate analyzers on another host.
- Anomalies detection algorithm
- Eliminate periodicity
- Anomalous Serverity Trending
- Data flow and storage schema
- Listener Net Protocol
-
Analyzers scalability?
The more metrics, the more analyzers should be up. If the analyzation can not catch up with the incomming datapoints, we should increase analyzer instances, this is the preferred solution, another one is to reduce
analyzer.filter.offset
, this makes IO faster. Beanstats is a simple console tool to watch a single beanstalk tube, and show you how fast jobs are going in and out of the queue. -
SSDB disk usage is too Large.
Set item
compression
toyes
inssdb.conf
, or runcompact
in ssdb-cli. -
"Too many open files" in my ssdb log.
You need to set your linux's max open files to at least 10k, see how to.
MIT Copyright (c) 2014 - 2015 Eleme, Inc.