Code Monkey home page Code Monkey logo

morgoth's Introduction

Morgoth Build Status

Morgoth is a framework for flexible anomaly detection algorithms packaged to be used with Kapacitor

Morgoth provides a framework for implementing the smaller pieces of an anomaly detection problem.

The basic framework is that Morgoth maintains a dictionary of normal behaviors and compares new windows of data to the normal dictionary. If the new window of data is not found in the dictionary then it is considered anomalous.

Morgoth uses algorithms, called fingerprinters, to compare windows of data to determine if they are similar. The Lossy Counting Algorithm(LCA) is used to maintain the dictionary of normal windows. The LCA is a space efficient algorithm that can account for drift in the normal dictionary, more on LCA below.

Morgoth uses a consensus model where each fingerprinter votes for whether it thinks the current window is anomalous. If the total votes percentage is greater than a consensus threshold then the window is considered anomalous.

Getting started

Install

Morgoth can be installed via go:

go get github.com/nathanielc/morgoth/cmd/morgoth

Configuring

Morgoth can run as either a child process of Kapacitor or as a standalone daemon that listens on a socket.

Child Process

Morgoth is a UDF for Kapacitor. Add this configuration to Kapacitor in order to enable using Morgoth.

[udf]
  [udf.functions]
    [udf.functions.morgoth]
      prog = "/path/to/bin/morgoth"
      timeout = "10s"

Restart Kapacitor and you are ready to start using Morgoth within Kapacitor.

Socket

To use Morgoth as a socket UDF start the morgoth process with the -socket option.

   morgoth -socket /path/to/morgoth/socket

Next you will need to configure Kapacitor to use the morgoth socket.

[udf]
  [udf.functions]
    [udf.functions.morgoth]
      socket = "/path/to/morgoth/socket"
      timeout = "10s"

Restart Kapacitor and you are ready to start using Morgoth within Kapacitor.

TICKscript

Here is an example TICKscript for detecting anomalies in cpu data from Telegraf.

stream
    |from()
        .measurement('cpu')
        .where(lambda: "cpu" == 'cpu-total')
        .groupBy(*)
    |window()
        .period(1m)
        .every(1m)
    @morgoth()
        // track the 'usage_idle' field
        .field('usage_idle')
        // label output data as anomalous using the 'anomalous' boolean field.
        .anomalousField('anomalous')
        .errorTolerance(0.01)
        // The window is anomalous if it occurs less the 5% of the time.
        .minSupport(0.05)
        // Use the sigma fingerprinter
        .sigma(3.0)
        // Multiple fingerprinters can be defined...
    |alert()
        // Trigger a critical alert when the window is marked as anomalous.
        .crit(lambda: "anomalous")

Fingerprinters

A fingerprinter is a method that can determine if a window of data is similar to a previous window of data. In effect the fingerprinters take fingerprints of the incoming data and can compare fingerprints of new data to see if they match. These fingerprinting algorithms provide the core of Morgoth as they are the means by which Morgoth determines if a new window of data is new or something already observed.

An example fingerprinting algorithm is a sigma algorithm that computes the mean and standard deviation of a window and store them as the fingerprint for the window. When a new window arrives it compares the fingerprint (mean, stddev) of the new window to the previous window. If the windows are too far apart then they are not considered at match.

By defining several fingerprinting algorithms Morgoth can decide whether new data is anomalous or normal.

Lossy Counting Algorithm

The LCA counts frequent items in a stream of data. It is lossy because to conserve space it will drop less frequent items. The result is that the algorithm will find frequent items but may loose track of less frequent items. More on the specific mathematical properties of the algorithm can be found below.

There are two parameters to the algorithm, error tolerance (e) and minimum support (m). First e is in the range [0, 1] and is an error bound, interpreted as a percentage value. For example given and e = 0.01 (1%), items less the 1% frequent in the data set can be dropped. Decreasing e will require more space but will keep track of less frequent items. Increasing e will require less space but will loose track of less frequent items. Second m is in the range [0, 1] and is a minimum support such that items that are considered frequent have at least m% frequency. For example if m = 0.05 (5%) then if an item has a support less than 5% it is not considered frequent, aka normal. The minimum support becomes the threshold for when items are considered anomalous.

Notice that m > e, this is so that we reduce the number of false positives. For example say we set e = 5% and m = 5%. If a normal behavior X, has a true frequency of 6% than based on variations in the true frequency, X might fall below 5% for a small interval and be dropped. This will cause X's frequency to be underestimated, which will cause it to be flagged as an anomaly, triggering a false positive. By setting e < m we have a buffer to help mitigate creating false positives.

Properties

The Lossy Counting algorithm has three properties:

  1. there are no false negatives,
  2. false positives are guaranteed to have a frequency of at least (m - e)*N,
  3. the frequency of an item can underestimated by at most e*N,

where N is the number of items encountered.

The space requirements for the algorithm are at most (1 / e) * log(e*N). It has also been show that if the item with low frequency are uniformly random than the space requirements are no more than 7 / e. This means that as Morgoth continues to processes windows of data its memory usage will grow as the log of the number of windows and can reach a stable upper bound.

Metrics

Morgoth exposes metrics about each detector and fingerprinter. The metrics are exposed as a promethues /metrics endpoint over HTTP. By default the metrics HTTP endpoint binds to :6767.

NOTE: Using the metrics HTTP endpoint only makes sense if you are using Morgoth in socket mode as otherwise each new process would collide on the bind port.

Metrics will have some or all of these labels:

  • task - the Kapacitor task ID.
  • node - the ID of the morgoth node within the Kapacitor task.
  • group - the Kapacitor group ID.
  • fingerprinter - the unique name for the specific fingerprinter, i.e. sigma-0.

The most useful metric for debugging why Morgoth is not behaving as expected is likely to be the morgoth_unique_fingerprints gauge. The metric reports the number of unique fingerprints each fingerprinter is tracking. This is useful because if the number is large or growing with each new window its likely that the fingerprinter is erroneously marking every window as anomalous. By providing visibility into each fingerprinter, Morgoth can be tuned as needed.

Using Kapacitor's scraping service you can scrape the Morgoth UDF process for these metrics and consume them within Kapacitor. See this tutorial for more information.

morgoth's People

Contributors

abourget avatar bbigras avatar charl avatar errows avatar mre avatar nathanielc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

morgoth's Issues

Task stops with error: failed to register metrics

When using morgoth version 0.3.0 or above kapacitor tasks stop running with the error:

Stopped task: cpu_alert morgoth3: failed to register metrics for group: "cpu=cpu-total,host=host01": window count metric: duplicate metrics collector registration attempted

Kapacitor is version 1.3.2 and Influx is version 1.3.3. Morgoth seems to be working ok until commit 63efe5f, anything after that fails with the above mentioned error.

The tick script used for testing is:

var measurement = 'cpu'
var groups = [*]
var whereFilter = lambda: "cpu" == 'cpu-total'
var window = 1m
var field = 'usage_idle'

var scoreField = 'anomalyScore'
var minSupport = 0.05
var errorTolerance = 0.01
var consensus = 0.5

// Number of sigmas allowed for normal window deviation
var sigmas = 3.0

stream
  // Select the data we want
  |from()
      .database('telegraf')
      .retentionPolicy('autogen')
      .measurement(measurement)
      .groupBy(groups)
      .where(whereFilter)
  // Window the data for a certain amount of time
  |window()
     .period(window)
     .every(window)
     .align()
  // Send each window to Morgoth
  @morgoth()
     .field(field)
     .scoreField(scoreField)
     .minSupport(minSupport)
     .errorTolerance(errorTolerance)
     .consensus(consensus)
     // Configure a single Sigma fingerprinter
     .sigma(sigmas)
  // Morgoth returns any anomalous windows
  |alert()
     .details('')
     .crit(lambda: TRUE)
     .log('/tmp/cpu_alert.log')

Unable to bring up kapacitor after adding morgoth as kapacitor udf

Hi,
I am unable to bring up kapacitor service after adding morgoth as udf below is the output of /var/log/kapacitor/kapacitor.log
[run] 2017/02/17 12:04:14 I! Kapacitor starting, version 1.2.0, branch master, commit 5408057e5a3493d3b5bd38d5d535ea45b587f8ff [run] 2017/02/17 12:04:14 I! Go version go1.7.4 [srv] 2017/02/17 12:04:14 I! Kapacitor hostname: localhost [srv] 2017/02/17 12:04:14 I! ClusterID: 8433611d-9a3d-480a-993b-4e226288b192 ServerID: 4082d455-aec0-4245-9e0e-e6f94e2a5696 [task_master:main] 2017/02/17 12:04:14 I! opened [httpd] 2017/02/17 12:04:14 I! Closed HTTP service [task_master:main] 2017/02/17 12:04:14 I! closed [run] 2017/02/17 12:04:14 E! open server: open service *udf.Service: failed to load process info for "morgoth": fork/exec /root/projects/bin/morgoth: permission denied
chnaged the binary permission to kapacitor and influxd also still same error .
Any help is appreciated .
Thanks !

how long to wait for Morgoth to detect an anomaly

Hello

sorry , but a newbie

I wonder how long tit takes for Morgoth to start detecting anomaly
assuming a run every 1 minute, and a scoreField of 0,98 ,

am I wrong to deduce that I will need to wait at least 51 minutes to detect one anomaly ( as it need to occur less than 2% , hence )

thanks for feedback
Philippe

duplicate metrics collector registration attempted

I'm using 0.3.1
WOuld be nice to be able to run /opt/morgoth/morgoth --version to show in output.

However:

ts=2018-06-07T23:40:00.323Z lvl=error msg="failed to stop task with out error" service=kapacitor task_master=main task=jsandlin_dns err="morgoth3: failed to register metrics for group: \"domain=qa.ebay.com,host=tm2li92-01,record_type=NS,server=10.254.244.249\": window count metric: duplicate metrics collector registration attempted"
ts=2018-06-07T23:40:00.323Z lvl=error msg="task finished with error" service=task_store err="morgoth3: failed to register metrics for group: \"domain=qa.ebay.com,host=tm2li92-01,record_type=NS,server=10.254.244.249\": window count metric: duplicate metrics collector registration attempted" task=jsandlin_dns
ts=2018-06-07T23:40:00.323Z lvl=debug msg="closing edge" service=kapacitor task_master=main task=9aec6456-1f76-4d33-8c81-a45344a8e937 parent=stream child=stream0 collected=72 emitted=72
ts=2018-06-07T23:40:00.323Z lvl=debug msg="closing edge" service=kapacitor task_master=main task=9aec6456-1f76-4d33-8c81-a45344a8e937 parent=stream0 child=from1 collected=72 emitted=72
ts=2018-06-07T23:40:00.323Z lvl=debug msg="closing edge" service=kapacitor task_master=main task=9aec6456-1f76-4d33-8c81-a45344a8e937 parent=from1 child=window2 collected=24 emitted=24
ts=2018-06-07T23:40:00.323Z lvl=debug msg="closing edge" service=kapacitor task_master=main task=9aec6456-1f76-4d33-8c81-a45344a8e937 parent=window2 child=morgoth3 collected=2 emitted=2
ts=2018-06-07T23:40:00.323Z lvl=error msg="failed to stop task with out error" service=kapacitor task_master=main task=9aec6456-1f76-4d33-8c81-a45344a8e937 err="morgoth3: failed to register metrics for group: \"host=tm2li92-01,url=ecgs.qa.ebay.com\": fingerprinter 1: unique fingerprints metric: duplicate metrics collector registration attempted"
ts=2018-06-07T23:40:00.323Z lvl=error msg="task finished with error" service=task_store err="morgoth3: failed to register metrics for group: \"host=tm2li92-01,url=ecgs.qa.ebay.com\": fingerprinter 1: unique fingerprints metric: duplicate metrics collector registration attempted" task=9aec6456-1f76-4d33-8c81-a45344a8e937
ts=2018-06-07T23:40:03.326Z lvl=debug msg="linking subscription for cluster" service=influxdb cluster=default cluster=default
ts=2018-06-07T23:40:04.627Z lvl=error msg="received error message" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 err="field \"usage_idle\" is not a float or int"
ts=2018-06-07T23:40:04.627Z lvl=info msg="UDF log" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 text="2018/06/07 23:40:04 E! read error: field \"usage_idle\" is not a float or int"
ts=2018-06-07T23:40:04.627Z lvl=info msg="UDF log" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 text="2018/06/07 23:40:04 I! Stopping"
ts=2018-06-07T23:40:04.629Z lvl=info msg="UDF log" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 text="panic: close of closed channel"
ts=2018-06-07T23:40:04.629Z lvl=info msg="UDF log" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 text=
ts=2018-06-07T23:40:04.629Z lvl=info msg="UDF log" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 text="goroutine 1 [running]:"
ts=2018-06-07T23:40:04.629Z lvl=info msg="UDF log" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 text="panic(0x6cdbc0, 0xc420144130)"
ts=2018-06-07T23:40:04.629Z lvl=info msg="UDF log" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 text="\t/home/travis/.gimme/versions/go1.7.linux.amd64/src/runtime/panic.go:500 +0x1a1"
ts=2018-06-07T23:40:04.629Z lvl=info msg="UDF log" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 text=main.(*Handler).Stop(0xc4200a2f20)
ts=2018-06-07T23:40:04.629Z lvl=info msg="UDF log" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 text="\t/home/travis/gopath/src/github.com/nathanielc/morgoth/cmd/morgoth/main.go:498 +0x33"
ts=2018-06-07T23:40:04.629Z lvl=info msg="UDF log" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 text=main.main()
ts=2018-06-07T23:40:04.629Z lvl=info msg="UDF log" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 text="\t/home/travis/gopath/src/github.com/nathanielc/morgoth/cmd/morgoth/main.go:149 +0x605"
ts=2018-06-07T23:40:07.419Z lvl=debug msg="closing edge" service=kapacitor task_master=main task=cpu_morgoth parent=morgoth3 child=alert4 collected=0 emitted=0
ts=2018-06-07T23:40:07.419Z lvl=error msg="node failed" service=kapacitor task_master=main task=cpu_morgoth node=morgoth3 err="write error: write |1: broken pipe"
ts=2018-06-07T23:40:07.419Z lvl=debug msg="task finished" service=task_store task=cpu_morgoth
ts=2018-06-07T23:40:07.419Z lvl=debug msg="closing edge" service=kapacitor task_master=main task=cpu_morgoth parent=stream child=stream0 collected=1757 emitted=1755
ts=2018-06-07T23:40:07.419Z lvl=debug msg="closing edge" service=kapacitor task_master=main task=cpu_morgoth parent=stream0 child=from1 collected=1757 emitted=1757
ts=2018-06-07T23:40:07.419Z lvl=debug msg="closing edge" service=kapacitor task_master=main task=cpu_morgoth parent=from1 child=window2 collected=270 emitted=270
ts=2018-06-07T23:40:07.419Z lvl=debug msg="closing edge" service=kapacitor task_master=main task=cpu_morgoth parent=window2 child=morgoth3 collected=2 emitted=2
ts=2018-06-07T23:40:07.419Z lvl=error msg="failed to stop task with out error" service=kapacitor task_master=main task=cpu_morgoth err="morgoth3: write error: write |1: broken pipe"
ts=2018-06-07T23:40:07.420Z lvl=error msg="task finished with error" service=task_store err="morgoth3: write error: write |1: broken pipe" task=cpu_morgoth

Debugging Morgoth in Kapacitor

Setting the -log-level debug argument in Kapacitor does not work and just gives an error, also if I have my alerts to also log info nothing comes out. Is there some way to see what is going on internally in Morgoth?

var cpu = stream
    |from()
        .measurement('win_cpu')
        .groupBy('host','domain')
    |window()
        .period(5m)
        .every(5m)
        .align()
    |default()
        .field('% Processor Time', 0.0)
  @morgoth()
    .field('% Processor Time')
    .scoreField('anomalyScore')
    .sigma(3.5)

cpu
    |alert()
        .info(lambda: 'anomalyScore' > 0.0)
        .warn(lambda: 'anomalyScore' > 1.0)
        .crit(lambda: 'anomalyScore' > 3.0)
        .log('/tmp/cpu_morgoth.log')

undefined: client.ClientConfig

I just tried to build with a fresh GOPATH.

go build
# github.com/nvcook42/morgoth/engine/influxdb
d:\morgoth\GOPATH\src\github.com\nvcook42\morgoth\engine\influxdb\client.go:9: undefined: client.ClientConfig
d:\morgoth\GOPATH\src\github.com\nvcook42\morgoth\engine\influxdb\engine.go:40: cannot use "list continuous queries" (type string) as type client.Query in argument to self.client.Query
d:\morgoth\GOPATH\src\github.com\nvcook42\morgoth\engine\influxdb\engine.go:47: series.GetPoints undefined (type *influxdb.Result has no field or method GetPoints)
d:\morgoth\GOPATH\src\github.com\nvcook42\morgoth\engine\influxdb\engine.go:72: cannot use q (type string) as type client.Query in argument to self.client.Query
d:\morgoth\GOPATH\src\github.com\nvcook42\morgoth\engine\influxdb\engine.go:87: undefined: client.Series
d:\morgoth\GOPATH\src\github.com\nvcook42\morgoth\engine\influxdb\engine.go:96: self.client.WriteSeriesWithTimePrecision undefined (type *client.Client has no field or method WriteSeriesWithTimePrecision)
d:\morgoth\GOPATH\src\github.com\nvcook42\morgoth\engine\influxdb\engine.go:96: undefined: client.Series
d:\morgoth\GOPATH\src\github.com\nvcook42\morgoth\engine\influxdb\engine.go:96: undefined: client.Second
d:\morgoth\GOPATH\src\github.com\nvcook42\morgoth\engine\influxdb\engine.go:103: undefined: client.Series
d:\morgoth\GOPATH\src\github.com\nvcook42\morgoth\engine\influxdb\engine.go:112: self.client.WriteSeriesWithTimePrecision undefined (type *client.Client has no field or method WriteSeriesWithTimePrecision)
d:\morgoth\GOPATH\src\github.com\nvcook42\morgoth\engine\influxdb\engine.go:112: too many errors

go version go1.4.1 windows/amd64

Test twitter anomaly detection with Morgoth

twitter recently announced "Practical and robust anomaly detection in a time series" (see blog post here: https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series)
The code is on GitHub:
https://github.com/twitter/AnomalyDetection
It is written in R, so it's not really real-time capable yet. That said, the tool looks quite powerful. Here's a list of things it can detect:
https://anomaly.io/anomaly-detection-twitter-r/

Would it make sense to port this algorithm to Go and run it in combination with Morgoth?

Non-CGO_ENABLED builds desirable

I'm trying to use the GitHub releases but it seems the Linux build for amd64 has dependencies.. I wanted to use it in alpine alonside the standard alpine-based kapacitor image..

/ # ldd /usr/bin/morgoth
    /lib64/ld-linux-x86-64.so.2 (0x558cd1924000)
    libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x558cd1924000)
    libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x558cd1924000)

Error initializing detector ... no such file or directory

I didn't use vagrant (since I already have grafana and influxdb) but I ran the commands from bootstrap.sh.

My config looks like this (just to be sure mydatabase is a database in influxdb and myserie is a influxdb serie that I can query with SELECT * FROM myserie :

---
engine:
  influxdb:
    host: localhost
    port: 8086
    user: myuser
    password: mypass
    database: mydatabase

schedule:
  rotations:
    - {period: 2m, resolution: 2s}
    - {period: 4m, resolution: 4s}
    - {period: 8m, resolution: 8s}
    - {period: 24m, resolution: 24s}
  delay: 15s

metrics:
  - pattern: myserie.*
    detectors:
      - mgof:
          min: 0
          max: 800
    notifiers:
      - riemann: {}

fittings:
  - rest:
      port: 7000
  - graphite: {}

logging:
    level: INFO
bbigras@ubuntunew:/gopath/src/github.com/nvcook42/morgoth$ ./run
E0129 10:06:16.327578   19947 config.go:58] Error initializing detector open /gopath/src/github.com/nvcook42/morgoth/meta/.a387fa3b36b07261e8f60802fea21053e6054621: no such file or directory
E0129 10:06:16.331366   19947 config.go:58] Error initializing detector open /gopath/src/github.com/nvcook42/morgoth/meta/.df602f5168cb9bb029364667726bfbd3d2d6cc17: no such file or directory
E0129 10:06:16.333158   19947 config.go:58] Error initializing detector open /gopath/src/github.com/nvcook42/morgoth/meta/.b89270bafd8b2c741930d51d04bb9359dbf97f26: no such file or directory
E0129 10:06:16.334486   19947 config.go:58] Error initializing detector open /gopath/src/github.com/nvcook42/morgoth/meta/.d82d32f5923504dd11b831c17cb851a81fcd2c83: no such file or directory
E0129 10:06:16.338714   19947 config.go:73] Error getting configured notifier: dial tcp 127.0.0.1:5555: connection refused
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x1 pc=0x81006f5]

goroutine 20 [running]:
github.com/nvcook42/morgoth/metric.(*ManagerStruct).NewMetric(0x18776b10, 0x18792140, 0x35)
        /gopath/src/github.com/nvcook42/morgoth/metric/manager.go:52 +0x165
github.com/nvcook42/morgoth/app.(*writerProxy).Insert(0x187770e0, 0xcc5c4362, 0xe, 0x0, 0x8540c80, 0x18792140, 0x35, 0x0, 0x0)
        /gopath/src/github.com/nvcook42/morgoth/app/writer_proxy.go:18 +0x4f
github.com/nvcook42/morgoth/fitting/graphite.(*GraphiteFitting).handleConnection(0x187869c0, 0xb75c9638, 0x1871c7a0)
        /gopath/src/github.com/nvcook42/morgoth/fitting/graphite/graphite.go:70 +0x726
created by github.com/nvcook42/morgoth/fitting/graphite.(*GraphiteFitting).Start
        /gopath/src/github.com/nvcook42/morgoth/fitting/graphite/graphite.go:39 +0x2fc

goroutine 1 [semacquire]:
sync.(*WaitGroup).Wait(0x187869e0)
        /usr/local/go/src/sync/waitgroup.go:132 +0x13e
github.com/nvcook42/morgoth/app.(*App).Run(0x18779b00, 0x0, 0x0)
        /gopath/src/github.com/nvcook42/morgoth/app/app.go:114 +0xb2b
main.main()
        /gopath/src/github.com/nvcook42/morgoth/morgoth.go:24 +0x155

goroutine 5 [chan receive]:
goroutine 7 [syscall]:
os/signal.loop()
        /usr/local/go/src/os/signal/signal_unix.go:21 +0x21
created by os/signal.init·1
        /usr/local/go/src/os/signal/signal_unix.go:27 +0x34

goroutine 8 [chan receive]:
github.com/nvcook42/morgoth/app.(*App).signalHandler(0x18779b00)
        /gopath/src/github.com/nvcook42/morgoth/app/app.go:137 +0xe8
created by github.com/nvcook42/morgoth/app.(*App).Run
        /gopath/src/github.com/nvcook42/morgoth/app/app.go:72 +0x124

goroutine 13 [IO wait]:
net.(*pollDesc).Wait(0x18735678, 0x72, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:84 +0x42
net.(*pollDesc).WaitRead(0x18735678, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:89 +0x40
net.(*netFD).accept(0x18735640, 0x0, 0xb75c7ec8, 0x187771a8)
        /usr/local/go/src/net/fd_unix.go:419 +0x34f
net.(*TCPListener).AcceptTCP(0x1871c790, 0x18754480, 0x0, 0x0)
        /usr/local/go/src/net/tcpsock_posix.go:234 +0x48
net.(*TCPListener).Accept(0x1871c790, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/tcpsock_posix.go:244 +0x48
net/http.(*Server).Serve(0x18735780, 0xb75caa80, 0x1871c790, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:1728 +0x7b
net/http.Serve(0xb75caa80, 0x1871c790, 0xb75caaa0, 0x1872cfdc, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:1606 +0x8c
github.com/nvcook42/morgoth/fitting/rest.(*RESTFitting).Start(0x1872cfc0, 0xb75c9498, 0x18779b00)
        /gopath/src/github.com/nvcook42/morgoth/fitting/rest/rest.go:48 +0x772
github.com/nvcook42/morgoth/app.func·001(0xb75ca908, 0x1872cfc0, 0x187869e0)
        /gopath/src/github.com/nvcook42/morgoth/app/app.go:104 +0x12f
created by github.com/nvcook42/morgoth/app.(*App).Run
        /gopath/src/github.com/nvcook42/morgoth/app/app.go:105 +0x8ee

goroutine 11 [IO wait]:
net.(*pollDesc).Wait(0x187352f8, 0x72, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:84 +0x42
net.(*pollDesc).WaitRead(0x187352f8, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:89 +0x40
net.(*netFD).Read(0x187352c0, 0x18788000, 0x1000, 0x1000, 0x0, 0xb75c7ec8, 0x18776efc)
        /usr/local/go/src/net/fd_unix.go:242 +0x2f0
net.(*conn).Read(0x1871c578, 0x18788000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:121 +0xba
net/http.noteEOFReader.Read(0xb75ca700, 0x1871c578, 0x1872cbcc, 0x18788000, 0x1000, 0x1000, 0x18735180, 0x0, 0x0)
        /usr/local/go/src/net/http/transport.go:1270 +0x5c
net/http.(*noteEOFReader).Read(0x18776bf0, 0x18788000, 0x1000, 0x1000, 0x18786060, 0x0, 0x0)
        <autogenerated>:125 +0x9f
bufio.(*Reader).fill(0x18732c90)
        /usr/local/go/src/bufio/bufio.go:97 +0x15c
bufio.(*Reader).Peek(0x18732c90, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/bufio/bufio.go:132 +0xd2
net/http.(*persistConn).readLoop(0x1872cba0)
        /usr/local/go/src/net/http/transport.go:842 +0x87
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:660 +0xa45

goroutine 12 [select]:
net/http.(*persistConn).writeLoop(0x1872cba0)
        /usr/local/go/src/net/http/transport.go:945 +0x31a
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:661 +0xa5a

goroutine 14 [IO wait]:
net.(*pollDesc).Wait(0x187356f8, 0x72, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:84 +0x42
net.(*pollDesc).WaitRead(0x187356f8, 0x0, 0x0)
        /usr/local/go/src/net/fd_poll_runtime.go:89 +0x40
net.(*netFD).accept(0x187356c0, 0x0, 0xb75c7ec8, 0x187771e8)
        /usr/local/go/src/net/fd_unix.go:419 +0x34f
net.(*TCPListener).AcceptTCP(0x1871c798, 0x18718ee8, 0x0, 0x0)
        /usr/local/go/src/net/tcpsock_posix.go:234 +0x48
net.(*TCPListener).Accept(0x1871c798, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/tcpsock_posix.go:244 +0x48
github.com/nvcook42/morgoth/fitting/graphite.(*GraphiteFitting).Start(0x187869c0, 0xb75c9498, 0x18779b00)
        /gopath/src/github.com/nvcook42/morgoth/fitting/graphite/graphite.go:35 +0x2b2
github.com/nvcook42/morgoth/app.func·001(0xb75ca928, 0x187869c0, 0x187869e0)
        /gopath/src/github.com/nvcook42/morgoth/app/app.go:104 +0x12f
created by github.com/nvcook42/morgoth/app.(*App).Run
        /gopath/src/github.com/nvcook42/morgoth/app/app.go:105 +0x8ee

goroutine 15 [sleep]:
github.com/nvcook42/morgoth/schedule.func·001(0xf08eb000, 0x1b, 0x77359400, 0x0, 0xcc5c43d0, 0xe, 0x0, 0x855cbe0, 0xf08eb000, 0x1b)
        /gopath/src/github.com/nvcook42/morgoth/schedule/schedule.go:50 +0x3c3
created by github.com/nvcook42/morgoth/schedule.(*Schedule).Start
        /gopath/src/github.com/nvcook42/morgoth/schedule/schedule.go:58 +0x325

goroutine 16 [sleep]:
github.com/nvcook42/morgoth/schedule.func·001(0xe11d6000, 0x37, 0xee6b2800, 0x0, 0xcc5c43d0, 0xe, 0x0, 0x855cbe0, 0xe11d6000, 0x37)
        /gopath/src/github.com/nvcook42/morgoth/schedule/schedule.go:50 +0x3c3
created by github.com/nvcook42/morgoth/schedule.(*Schedule).Start
        /gopath/src/github.com/nvcook42/morgoth/schedule/schedule.go:58 +0x325

goroutine 18 [sleep]:
github.com/nvcook42/morgoth/schedule.func·001(0xc23ac000, 0x6f, 0xdcd65000, 0x1, 0xcc5c44c0, 0xe, 0x0, 0x855cbe0, 0xc23ac000, 0x6f)
        /gopath/src/github.com/nvcook42/morgoth/schedule/schedule.go:50 +0x3c3
created by github.com/nvcook42/morgoth/schedule.(*Schedule).Start
        /gopath/src/github.com/nvcook42/morgoth/schedule/schedule.go:58 +0x325

goroutine 19 [sleep]:
github.com/nvcook42/morgoth/schedule.func·001(0x46b04000, 0x14f, 0x9682f000, 0x5, 0xcc5c44c0, 0xe, 0x0, 0x855cbe0, 0x46b04000, 0x14f)
        /gopath/src/github.com/nvcook42/morgoth/schedule/schedule.go:50 +0x3c3
created by github.com/nvcook42/morgoth/schedule.(*Schedule).Start
        /gopath/src/github.com/nvcook42/morgoth/schedule/schedule.go:58 +0x325
exit status 2
godep: go exit status 1

how to deal with "spikey" data

Hi, I have some data that is very spikey like ( I am sure there is a statistical term for this maybe not normal)
image

if I use the example on the README, i get non stop alerts. like
image

I have tweaked the two paramters for errorTolerence and Minimum support but I etiher get a lot or alertsor no alerts here is an example of my morgoth kapacitor tick.
I am collecting my metrics every 10 seconds i used a 15 min window to make sure i am getting enough data.

dbrp "statsd"."autogen"

stream
    |from()
        .measurement('load_avg_five')
        .groupBy('host')
    |window()
        .period(15m)
        .every(1m)
    @morgoth()
        .field('value')
        .anomalousField('anomalous')
        .errorTolerance(0.01)
        .minSupport(0.05)
        .sigma(3.0)
    |alert()
        .message('{{ .Level}}: {{ .Name }}/{{ index .Tags "host" }} anomalous')
        .crit(lambda: "anomalous")
        .log('/tmp/malerts.log')
        .sensu()
        .slack()

I would like to get no alerts unless i put a lot of load on the system.
Thanks !
rob

duplicate metrics collector registration attempted

created issue to track coming fix. more info
Error msg:
Error: morgoth3: failed to register metrics for group: "cid=2,cpu=cpu-total,host=p0-c2-xyz.com,region=us-west-2,role=proxy": window count metric: duplicate metrics collector registration attempted’

Consensus -1 not working , -1.0 seems OK

Hello

it seems that defining consensus as -1 ( in order to average fingerprints) , is not working :

got "invalid TICKscript: line 134 char 5: unexpected arg to consensus, got INT expected DOUBLE"

when defining the tasks

using -1.0 is seems to be OK ,
but does it then really provide the average .
If so , a correction to the documentation is needed .

thanks for feedback
Philippe

error when trying to start Kapacitor with and without Morgoth

Installed go (v1.92) kapaccitor (v1.4) influxdb(v1.3.8) getting error on kapacitor startup
ERROR:
s=2018-03-28T09:49:59.141-04:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="invalid character '<' looking for beginning of value"

ts=2018-03-28T09:49:59.675-04:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="invalid character '<' looking for beginning of value"

ts=2018-03-28T09:50:00.703-04:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="invalid character '<' looking for beginning of value"

ts=2018-03-28T09:50:02.077-04:00 lvl=error msg="failed to connect to InfluxDB, retrying..." service=influxdb cluster=default err="invalid character '<' looking for beginning of value"

error: duplicate metrics collector registration attempted

Hello
Do you have a fix for this issue?
ref: #56

ts=2018-06-21T16:34:38.198+03:00 lvl=error msg="task finished with error" service=task_store err="morgoth3: failed to register metrics for group: "cpu=cpu-total,host=KLMETZOOD1": window count metric: duplicate metrics collector registration attempted" task=cpu_usage_idle

Thanks.

Detects every data point as anamolous.

Hi,
I am consuming system metrics data from kafka and inerting the same into influx where the margoth script runs to detect any anomalous system behaviour based on the metrics, but the problem is every metric morgoth receives it's logging it as anomalous,have attached a screenshot of my dataset and the anomalous data set as well as my tick script for the same.
Any help is appreciated!
Thanks in advance!

from my influxdb:
select VALUE,anomalyScore from anomaly_cpu where NODE_NAME='PROC-1' AND METRIC='system.cpu.idle' AND time >= 1487919960000000000 and time <=1487920500000000000;
name: anomaly_cpu
time VALUE anomalyScore


1487919960000000000 87.2257319 0.95
1487920020000000000 87.2394801 0.9523809523809523
1487920080000000000 87.2379489 0.9545454545454546
1487920140000000000 87.2407403 0.9565217391304348
1487920200000000000 87.2488526 0.9583333333333334
1487920260000000000 87.2469828 0.96
1487920320000000000 87.2715053 0.9615384615384616
1487920380000000000 87.245502 0.962962962962963
1487920440000000000 87.2551249 0.9642857142857143
1487920500000000000 87.2479187 0.9655172413793104

select VALUE from system where NODE_NAME='PROC-1' AND METRIC='system.cpu.idle' AND time >= 1487919960000000000 and time <=1487920500000000000;
name: system
time VALUE


1487919960000000000 87.2257319
1487920020000000000 87.2394801
1487920080000000000 87.2379489
1487920140000000000 87.2407403
1487920200000000000 87.2488526
1487920260000000000 87.2469828
1487920320000000000 87.2715053
1487920380000000000 87.245502
1487920440000000000 87.2551249
1487920500000000000 87.2479187

cpu_alert_tick.docx

Installation Issue

Hi,

I'm seeing an error while installing Morgoth. Any idea?

go get github.com/nathanielc/morgoth/cmd/morgoth

Error:
# github.com/nathanielc/morgoth/cmd/morgoth ../go_workspace/src/github.com/nathanielc/morgoth/cmd/morgoth/main.go:43: cannot use h (type *Handler) as type agent.Handler in assignment: *Handler does not implement agent.Handler (missing Snapshot method) ../go_workspace/src/github.com/nathanielc/morgoth/cmd/morgoth/main.go:90: cannot use h (type *Handler) as type agent.Handler in assignment: *Handler does not implement agent.Handler (missing Snapshot method)

Add support for elasticsearch data engine

Currently morgoth can process numeric data via its mongodb and influxdb data engines. Lets add an elasticsearch backend so we can detect anomalies in logs as well.

How to detect anomaly for Kafka rate of messages being processed

Hi Nathaniel,

I need to find out anomaly on kafka message produced to our kafka-topic-* measurements using morgoth,
Here is the tick script
can you verify and check if this should give correct anomaly data

Also I am am not able to save the topic name which is the name of the measurement into the influxdb(kafka-morgoth-alert)

also can you explain what should be value for minSupport and errorTolerance

var groups = 'host'
var field = 'produced'

var scoreField = 'anomalyScore'
var minSupport = 0.05
var errorTolerance = 0.01
var consensus = 0.05
var sigmas = 3.5

var last_day_mean = batch
    |query('SELECT * FROM "sensu"."default"./kafka-topic-lst_plugin.*/')
        .groupBy(groups)
        .period(1d)
        .every(10m)
        .align()
    @morgoth()
        .field(field)
        .scoreField(scoreField)
        .minSupport(minSupport)
        .errorTolerance(errorTolerance)
        .consensus(consensus)
        .sigma(sigmas)
    |alert()
        .details('Kafka Message Produced Is Anomalous')
        .crit(lambda: "anomalyScore" > 0.98)
        .log('/tmp/kafka-morgoth-alert.log')
    |influxDBOut()
        .database('sensu')
        .retentionPolicy('default')
        .measurement('kafka-morgoth-alert')\

Showing summary of the anomaly in alert message

Hello,

I have created a tick script to identify anomaly in the number of requests hitting our servers. The script works fine.

The input to the morgoth is a the number of requests grouped by one minute for last one hour.

In the alert message, I would also like to add the number of requests server received in this window. This is something that I have trouble finding how to do it.


  batch
        | query('''
           SELECT count(timetaken) as count
           FROM "telegraf".two_months.responsetimes
           WHERE "responsecode" != '10018' AND "responsecode" != '10097' AND "responsecode" != '10181' AND "responsecode" != '10256' AND "responsecode" != '10285' AND "merchant" != 'Sevasys'
        ''')
     .period(window)
     .groupBy(time(1m),  'qcinstance', 'txntype', 'merchant')
     .every(1m)
     .align()
     .fill(0)
    @morgoth()
     .field(field)
     .scoreField(scoreField)
     .anomalousField('anomalous')
     .minSupport(minSupport)
     .errorTolerance(errorTolerance)
     .sigma(sigmas)
  // Morgoth returns any anomalous windows

  | eval(lambda: strReplace("txntype", ' ','%20', -1), lambda: strReplace("merchant", ' ', '%20', -1), lambda: int(unixNano(now())/1000000), lambda: int((unixNano(now())-two_hours)/1000000))
        .as('txntype2', 'merchant2', 'now', 'two_hours_ago')
        .keep('anomalous', 'txntype2', 'txntype', 'merchant2', 'merchant', 'now', 'two_hours_ago', 'count')
  |alert()
     .message(message)
     .details('')
     .crit(lambda: "anomalous")
     .slack()
     .channel('#softwarealerts')

In the alert message, I want to add the sum of the count .

I have tried doing following steps

  • do sum after the data passes through morgoth

   @morgoth()
     .field(field)
     .scoreField(scoreField)
     .anomalousField('anomalous')
     .minSupport(minSupport)
     .errorTolerance(errorTolerance)
     .sigma(sigmas)
  |sum('count')
  // Morgoth returns any anomalous windows

But, after doing above step, anomalous field is missing from the output of sum.

  • Use a separate batch query that does the summation and join it original series
  batch
        | query('''
           SELECT count(timetaken) as sum
           FROM "telegraf".two_months.responsetimes
           WHERE "responsecode" != '10018' AND "responsecode" != '10097' AND "responsecode" != '10181' AND "responsecode" != '10256' AND "responsecode" != '10285' AND "merchant" != 'Sevasys'
        ''')
     .period(window)
     .groupBy( 'qcinstance', 'txntype', 'merchant')
     .every(1m)
     .align()
     .fill(0)

However, join of the result of above query is not joining with original series. So, sum value is not available at the time of alerting.

Can you please suggest a way to do this?

How do I quickly create an anomaly to see this framework alerts?

I have telegraf, influxdb up and running along with morgoth with the following config. I don't see any anomolies even when the cpu spikes are 100%. I am seeing alertmanager reporting no anomalies. Is there a quick way to create an anomaly using your example docs.

schedules:

  • query: SELECT value FROM "day".cpu_usage_idle GROUP BY *
    period: 30s
    delay: 10s
    tags:
    ret: day
  • query: SELECT value FROM "week".cpu_usage_idle GROUP BY *
    period: 5m
    delay: 1m
    tags:
    ret: week
  • query: SELECT value FROM "month".cpu_usage_idle GROUP BY *
    period: 1h
    delay: 1m
    tags:
    ret: month

mappings:

  • name: cpu_.*
    tags:
    ret: month
    detector:
    fingerprints:
    - sigma:
    deviations: 10
    alerts:
  • query: SELECT COUNT(start) FROM anomaly GROUP BY *
    message: Too many anomalies detected
    threshold: 1
    period: 1m
    group_by_interval: 2m
    notifiers:
    • log:
      file: alerts.log

issue with kapacitor tick script and template

hello

wil will to "template" a working tikc script, for Kapacitor, I encouter issue with using variable in the SELECT cuase of a batch node:

here is some part of the script:

var measurement string
var groupBy string
var indicator string
var offset duration
var period duration
var every duration
var shift = offset
.....
var prev_period = batch
|query('SELECT mean(indicator), stddev(indicator) from .measurement(measurement)')
.groupBy('devaddr')
.offset(offset)
.period(period)
.every(every)
.align()
|shift(shift)

when trying to "load" the template , I've got

error reloading associated task xxxxxxxxxx: failed to parse InfluxQL query: found ., expected identifier at line 1, char 49

same error with .db(db).rp(.rp) with db and rp defined as var string.
same without the "."

any hints ?
thans in advance
( Kapacitor 1.4)

regards
Philippe

Alerts Not Getting Processed

Hi @nathanielc ,
i have configured Morgoth for anomaly detection as per the Morgoth documentation, but unfortunately alerts are not getting processed.
[ec2-user@ip-172-31-14-178 ~]$ kapacitor show cpu_alert

ID: cpu_alert
Error: 
Template: 
Type: stream
Status: enabled
Executing: true
Created: 16 May 17 00:48 UTC
Modified: 16 May 17 01:05 UTC
LastEnabled: 16 May 17 01:05 UTC
Databases Retention Policies: ["telegraf"."autogen"]
TICKscript:
// The measurement to analyze
var measurement = 'cpu'

// Optional group by dimensions
var groups = [*]

// Optional where filter
var whereFilter = lambda: TRUE

// The amount of data to window at once
var window = 1m

// The field to process
var field = 'usage_idle'

// The name for the anomaly score field
var scoreField = 'anomalyScore'

// The minimum support
var minSupport = 0.05

// The error tolerance
var errorTolerance = 0.01

// The consensus
var consensus = 0.5

// Number of sigmas allowed for normal window deviation
var sigmas = 3.0

stream
    // Select the data we want
    |from()
        .measurement(measurement)
        .groupBy(groups)
        .where(whereFilter)
    // Window the data for a certain amount of time
    |window()
        .period(window)
        .every(window)
        .align()
    // Send each window to Morgoth
    @morgoth()
        .field(field)
        .scoreField(scoreField)
        .minSupport(minSupport)
        .errorTolerance(errorTolerance)
        .consensus(consensus)
        // Configure a single Sigma fingerprinter
        .sigma(sigmas)
    // Morgoth returns any anomalous windows
    |alert()
        .details('')
        .crit(lambda: TRUE)
        .log('/tmp/cpu_alert.log')

DOT:
digraph cpu_alert {
graph [throughput="0.00 points/s"];

stream0 [avg_exec_time_ns="0s" ];
stream0 -> from1 [processed="46"];

from1 [avg_exec_time_ns="5.916µs" ];
from1 -> window2 [processed="46"];

window2 [avg_exec_time_ns="3.191µs" ];
window2 -> morgoth3 [processed="8"];

morgoth3 [avg_exec_time_ns="0s" ];
morgoth3 -> alert4 [processed="0"];

alert4 [alerts_triggered="0" avg_exec_time_ns="0s" crits_triggered="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" ];`

The morgoth binary is in

/usr/local/bin

with permissions kapacitor:kapacitor. Please find the Kapacitor.conf file below:

[udf]
# Configuration for UDFs (User Defined Functions)
[udf.functions]
    [udf.functions.morgoth]
       prog = "/usr/local/bin/morgoth"
       timeout = "10s"

Please advise

TestNotify and TestSpecificChiSqInc fails

bbigras@ubuntunew:/gopath/src/github.com/nvcook42/morgoth$ ./test
Testing all packages...
ok      github.com/nvcook42/morgoth/app 0.127s
?       github.com/nvcook42/morgoth/app/types   [no test files]
ok      github.com/nvcook42/morgoth/config      0.100s
ok      github.com/nvcook42/morgoth/config/dynamic_type 0.050s
ok      github.com/nvcook42/morgoth/config/types        0.057s
ok      github.com/nvcook42/morgoth/defaults    0.071s
ok      github.com/nvcook42/morgoth/detector    0.093s
ok      github.com/nvcook42/morgoth/detector/kstest     0.076s
?       github.com/nvcook42/morgoth/detector/metadata   [no test files]
ok      github.com/nvcook42/morgoth/detector/mgof       0.107s
?       github.com/nvcook42/morgoth/detector/test       [no test files]
ok      github.com/nvcook42/morgoth/detector/tukey      0.088s
ok      github.com/nvcook42/morgoth/engine      0.092s
?       github.com/nvcook42/morgoth/engine/generator    [no test files]
ok      github.com/nvcook42/morgoth/engine/influxdb     0.053s
ok      github.com/nvcook42/morgoth/fitting     0.069s
ok      github.com/nvcook42/morgoth/fitting/graphite    0.083s
ok      github.com/nvcook42/morgoth/fitting/rest        1.223s
ok      github.com/nvcook42/morgoth/metric      0.063s
?       github.com/nvcook42/morgoth/metric/set  [no test files]
?       github.com/nvcook42/morgoth/metric/types        [no test files]
?       github.com/nvcook42/morgoth/notifier    [no test files]
--- FAIL: TestNotify (0.02s)
        Location:       riemann_test.go:25
        Error:          Expected nil, but got: &net.OpError{Op:"dial", Net:"tcp", Addr:(*net.TCPAddr)(0x18637100), Err:0x6f}

FAIL
FAIL    github.com/nvcook42/morgoth/notifier/riemann    0.116s
ok      github.com/nvcook42/morgoth/registery   0.072s
ok      github.com/nvcook42/morgoth/schedule    0.080s
Mine was 2.493580
Theirs was 2.438378
--- FAIL: TestSpecificChiSqInc (0.00s)
        Location:       chisq_test.go:14
        Error:          Not equal: 79.66881012367774 (expected)
                                != 79.66881012367767 (actual)

FAIL
FAIL    github.com/nvcook42/morgoth/stat        4.963s
?       github.com/nvcook42/morgoth/stat/fn     [no test files]
ok      github.com/nvcook42/morgoth/tests/benchmarks    0.039s

Morgoth closing a closed channel[GoLang]

I was trying CPU monitoring example but Morgoth stops working suddenly. I am using latest InfluxDB and latest Morgoth binary which I have compiled using Go1.8.3. I have got only below logs from kapacitor.log file. Any one has any idea why Morgoth is closing a closed channel?

2017/08/24 09:52:00 I!P 2017/08/24 09:52:00 I! Stopping
2017/08/24 09:52:00 I!P panic: close of closed channel
2017/08/24 09:52:00 I!P
2017/08/24 09:52:00 I!P goroutine 1 [running]:
2017/08/24 09:52:00 I!P main.(*Handler).Stop(0xc42008ca50)
2017/08/24 09:52:00 I!P    /prog/src/github.com/nathanielc/morgoth/cmd/morgoth/main.go:498 +0x33
2017/08/24 09:52:00 I!P main.main()
2017/08/24 09:52:00 I!P    /prog/src/github.com/nathanielc/morgoth/cmd/morgoth/main.go:149 +0x728

No data points processed

Hi,
I am using morgoth for anomaly detection , kapacitor(running on 9096) is on a different server and the influxdb is on a different server ,after enabling the task i notice that kapacitor is not streaming data.
the related info:

kapacitor version
Kapacitor 1.2.0 (git: master 5408057e5a3493d3b5bd38d5d535ea45b587f8ff)

kapacitor.conf file:

 hostname = "localhost"
data_dir = "/var/lib/kapacitor"
skip-config-overrides = false
default-retention-policy = ""
[http]
  bind-address = ":9096"
  auth-enabled = false
  log-enabled = true
  write-tracing = false
  pprof-enabled = false
  https-enabled = false
  https-certificate = "/etc/ssl/kapacitor.pem"
[config-override]  
 enabled = true
[logging]
   file = "/var/log/kapacitor/kapacitor.log"
    level = "INFO"
[replay]
  dir = "/var/lib/kapacitor/replay"
[task]
  dir = "/var/lib/kapacitor/tasks"
 snapshot-interval = "60s"
[storage]
 boltdb = "/var/lib/kapacitor/kapacitor.db"
[deadman]
 global = false
  threshold = 0.0
  interval = "10s"
  id = "node 'NODE_NAME' in task '{{ .TaskName }}'"
  message = "{{ .ID }} is {{ if eq .Level \"OK\" }}alive{{ else }}dead{{ end }}: {{ index .Fields \"collected\" | printf \"%0.3f\" }} points/INTERVAL."
[[influxdb]]
  enabled = true
  default = true
  name = "server1"
  urls = ["http://172.16.23.20:8086"]
  username = ""
  password = ""
  timeout = 0
  insecure-skip-verify = false
  startup-timeout = "5m"
  disable-subscriptions = false
  subscription-protocol = "http"
 subscriptions-sync-interval = "1m0s"
  kapacitor-hostname = ""
  http-port = 0
  udp-bind = ""
  udp-buffer = 1000
 udp-read-buffer = 0 

The kapacitor.log

[root@localhost Morgoth_Tick_scripts]# tail -f /var/log/kapacitor/kapacitor.log
[task_master:main] 2017/03/22 17:17:20 I! Started task: cpu_idle
[cpu_idle:morgoth3] 2017/03/22 17:17:20 I!P 2017/03/22 17:17:20 I! Starting agent using STDIN/STDOUT
[httpd] ::1 - - [22/Mar/2017:17:17:20 +0530] "PATCH /kapacitor/v1/tasks/cpu_idle HTTP/1.1" 200 1094 "-" "KapacitorClient" 4e59cb20-0ef5-11e7-8052-000000000000 62470
[httpd] ::1 - - [22/Mar/2017:17:17:27 +0530] "GET /kapacitor/v1/tasks?dot-view=attributes&fields=link&limit=100&offset=0&pattern=cpu_idle&replay-id=&script-format=formatted HTTP/1.1" 200 123 "-" "KapacitorClient" 52b842c7-0ef5-11e7-8053-000000000000 1324
[httpd] ::1 - - [22/Mar/2017:17:17:27 +0530] "PATCH /kapacitor/v1/tasks/cpu_idle HTTP/1.1" 200 1103 "-" "KapacitorClient" 52b8aa97-0ef5-11e7-8054-000000000000 42493
[httpd] ::1 - - [22/Mar/2017:17:17:36 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" 584398f4-0ef5-11e7-8055-000000000000 10828
[httpd] ::1 - - [22/Mar/2017:17:17:38 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" 5959083b-0ef5-11e7-8056-000000000000 11890
[httpd] ::1 - - [22/Mar/2017:17:17:39 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" 59bd8eac-0ef5-11e7-8057-000000000000 16112
[httpd] ::1 - - [22/Mar/2017:17:17:51 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" 60cf6482-0ef5-11e7-8059-000000000000 11066
[httpd] ::1 - - [22/Mar/2017:17:54:09 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" 732ca24d-0efa-11e7-805a-000000000000 11702
[httpd] ::1 - - [22/Mar/2017:17:57:57 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" fb09e4b9-0efa-11e7-805b-000000000000 10574
[httpd] ::1 - - [22/Mar/2017:17:59:28 +0530] "GET /kapacitor/v1/tasks/cpu_idle?dot-view=attributes&replay-id=&script-format=formatted HTTP/1.1" 200 1103 "-" "KapacitorClient" 313fd8ae-0efb-11e7-805c-000000000000 11217

task output:

kapacitor -url http://localhost:9096 show cpu_idle
ID: cpu_idle
Error:
Template:
Type: stream
Status: enabled
Executing: true
Created: 22 Mar 17 16:14 IST
Modified: 22 Mar 17 17:17 IST
LastEnabled: 22 Mar 17 17:17 IST
Databases Retention Policies: ["morgoth"."autogen"]
TICKscript:
// The measurement to analyze
var measurement = 'system'

// Optional group by dimensions
var groups = [*]

// Optional where filter
var whereFilter = lambda: "METRIC" == 'system.cpu.idle'

// The amount of data to window at once
var window = 10m

// The field to process
var field = 'VALUE'

// The name for the anomaly score field
var scoreField = 'anomalyScore'

// The minimum support
var minSupport = 0.05

// The error tolerance
var errorTolerance = 0.01

// var errorTolerance = 0.005

// The consensus
var consensus = 0.5

// Number of sigmas allowed for normal window deviation
var sigmas = 3.3

stream
    // Select the data we want
    |from()
        .database('morgoth')
        .measurement(measurement)
        .groupBy(groups)
        .where(whereFilter)
    // Window the data for a certain amount of time
    |window()
        .period(window)
        .every(window)
        .align()
    // Send each window to Morgoth
    @morgoth()
        .field(field)
        .scoreField(scoreField)
        .minSupport(minSupport)
        .errorTolerance(errorTolerance)
        .consensus(consensus)
        // Configure a single Sigma fingerprinter
        .sigma(sigmas)
    // Morgoth returns any anomalous windows
    |influxDBOut()
        .database('morgoth')
        .retentionPolicy('autogen')
        .measurement('cpu_idle_anomaly')

DOT:
digraph cpu_idle {
graph [throughput="0.00 points/s"];

stream0 [avg_exec_time_ns="0s" ];
stream0 -> from1 [processed="0"];

from1 [avg_exec_time_ns="0s" ];
from1 -> window2 [processed="0"];

window2 [avg_exec_time_ns="0s" ];
window2 -> morgoth3 [processed="0"];

morgoth3 [avg_exec_time_ns="0s" ];
morgoth3 -> influxdb_out4 [processed="0"];

influxdb_out4 [avg_exec_time_ns="0s" points_written="0" write_errors="0" ];
}

kapacitor stats

ClusterID:                    59ac060a-2a62-4acd-9308-4af668fc42d2
ServerID:                     e3d490e8-3035-4823-884c-ec973bf81e8b
Host:                         localhost
Tasks:                        15
Enabled Tasks:                15
Subscriptions:                3
Version:                      1.2.0
 kapacitor -url http://localhost:9096 stats ingress
Database   Retention Policy Measurement Points Received
_kapacitor autogen          edges                 59505
_kapacitor autogen          ingress                4642
_kapacitor autogen          kapacitor               970
_kapacitor autogen          nodes                 57565
_kapacitor autogen          runtime                 970 

influxdb subscriptions:

Connected to http://localhost:8086 version 1.2.0
InfluxDB shell version: 1.2.0
 show subscriptions
name: _internal
retention_policy name                                           mode destinations

monitor          kapacitor-80f85136-b89b-4a20-ab98-7b1476707e38 ANY  [http://localhost:9092]
monitor          kapacitor-59ac060a-2a62-4acd-9308-4af668fc42d2 ANY  [http://localhost:9096]


name: load_testing
retention_policy name                                           mode destinations

autogen          kapacitor-80f85136-b89b-4a20-ab98-7b1476707e38 ANY  [http://localhost:9092]
autogen          kapacitor-59ac060a-2a62-4acd-9308-4af668fc42d2 ANY  [http://localhost:9096]


name: morgoth
retention_policy name                                           mode destinations

autogen          kapacitor-80f85136-b89b-4a20-ab98-7b1476707e38 ANY  [http://localhost:9092]
autogen          kapacitor-59ac060a-2a62-4acd-9308-4af668fc42d2 ANY  [http://localhost:9096]

command for defining and enabling task

kapacitor -url http://localhost:9096 define  cpu_idle -type stream  -dbrp morgoth.autogen -tick ./cpu_morgoth.tick
kapacitor -url http://localhost:9096 enable cpu_idle

influxdb data

select * from system where METRIC='system.cpu.idle' limit 10;
name: system
time                CUSTOMER DETAIL1 DETAIL2 GROUP_NAME IP_ADDRESS   LOCATION METRIC          NODE_NAME PRODUCT_NAME VALUE      VENDOR_NAME

1489567587790458512 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6139237 Teledna
1489567647790523682 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6620568 Teledna
1489567707790599926 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6098692 Teledna
1489567767790326233 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.674531  Teledna
1489567827790470782 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6243747 Teledna
1489567887790607264 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6723692 Teledna
1489567947790470804 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.5994077 Teledna
1489568007790460163 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6786397 Teledna
1489568067790455453 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.6035798 Teledna
1489568127790466002 teledna  cpu     idle    Testing    172.16.23.28 Banglore system.cpu.idle SIGTRAN   smsc         99.584484  Teledna

panic: assignment to entry in nil map

panic: assignment to entry in nil map

goroutine 9 [running]:
panic(0x569c20, 0xc82046a390)
/usr/lib/go-1.6/src/runtime/panic.go:481 +0x3e6
main.(*Handler).EndBatch(0xc820064120, 0xc82044e800, 0x0, 0x0)
/home/administrator/go/src/github.com/nathanielc/morgoth/cmd/morgoth/main.go:340 +0x356
github.com/nathanielc/morgoth/vendor/github.com/influxdata/kapacitor/udf/agent.(*Agent).readLoop(0xc820052150, 0x0, 0x0)
/home/administrator/go/src/github.com/nathanielc/morgoth/vendor/github.com/influxdata/kapacitor/udf/agent/agent.go:209 +0x4fe
github.com/nathanielc/morgoth/vendor/github.com/influxdata/kapacitor/udf/agent.(*Agent).Start.func1(0xc820052150)
/home/administrator/go/src/github.com/nathanielc/morgoth/vendor/github.com/influxdata/kapacitor/udf/agent/agent.go:89 +0x6b
created by github.com/nathanielc/morgoth/vendor/github.com/influxdata/kapacitor/udf/agent.(*Agent).Start
/home/administrator/go/src/github.com/nathanielc/morgoth/vendor/github.com/influxdata/kapacitor/udf/agent/agent.go:98 +0x1bb


ID: dev-morgoth
Error:
Template:
Type: stream
Status: enabled
Executing: true
Created: 29 Jun 17 22:44 UTC
Modified: 10 Jul 17 23:33 UTC
LastEnabled: 10 Jul 17 23:33 UTC
Databases Retention Policies: ["stats"."autogen"]
TICKscript:
stream
|from()
.measurement('traf')
.groupBy(*)
|window()
.periodCount(60)
.everyCount(60)
@morgoth()
.field('traffic_out')
.scoreField('anomalyScore')
.sigma(3.5)
|alert()
.details('error')
.crit(lambda: "anomalyScore" > 0.9)
.log('/tmp/dev-morgoth-alerts.log')

Support for influxDB 0.8.x

Whether the influxDB 0.8.x is supported?

As per the README morgoth integrates with graphite and mongodb also. Is there any reference/example for integration of the graphite or mongodb.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.