Code Monkey home page Code Monkey logo

elastalert's Introduction

ElastAlert is no longer maintained. Please use ElastAlert2 instead.

Build Status Join the chat at https://gitter.im/Yelp/elastalert

ElastAlert - Read the Docs.

Easy & Flexible Alerting With Elasticsearch

ElastAlert is a simple framework for alerting on anomalies, spikes, or other patterns of interest from data in Elasticsearch.

ElastAlert works with all versions of Elasticsearch.

At Yelp, we use Elasticsearch, Logstash and Kibana for managing our ever increasing amount of data and logs. Kibana is great for visualizing and querying data, but we quickly realized that it needed a companion tool for alerting on inconsistencies in our data. Out of this need, ElastAlert was created.

If you have data being written into Elasticsearch in near real time and want to be alerted when that data matches certain patterns, ElastAlert is the tool for you. If you can see it in Kibana, ElastAlert can alert on it.

Overview

We designed ElastAlert to be reliable, highly modular, and easy to set up and configure.

It works by combining Elasticsearch with two types of components, rule types and alerts. Elasticsearch is periodically queried and the data is passed to the rule type, which determines when a match is found. When a match occurs, it is given to one or more alerts, which take action based on the match.

This is configured by a set of rules, each of which defines a query, a rule type, and a set of alerts.

Several rule types with common monitoring paradigms are included with ElastAlert:

  • Match where there are at least X events in Y time" (frequency type)
  • Match when the rate of events increases or decreases" (spike type)
  • Match when there are less than X events in Y time" (flatline type)
  • Match when a certain field matches a blacklist/whitelist" (blacklist and whitelist type)
  • Match on any event matching a given filter" (any type)
  • Match when a field has two different values within some time" (change type)
  • Match when a never before seen term appears in a field" (new_term type)
  • Match when the number of unique values for a field is above or below a threshold (cardinality type)

Currently, we have built-in support for the following alert types:

  • Email
  • JIRA
  • OpsGenie
  • Commands
  • HipChat
  • MS Teams
  • Slack
  • Telegram
  • GoogleChat
  • AWS SNS
  • VictorOps
  • PagerDuty
  • PagerTree
  • Exotel
  • Twilio
  • Gitter
  • Line Notify
  • Zabbix

Additional rule types and alerts can be easily imported or written.

In addition to this basic usage, there are many other features that make alerts more useful:

  • Alerts link to Kibana dashboards
  • Aggregate counts for arbitrary fields
  • Combine alerts into periodic reports
  • Separate alerts by using a unique key field
  • Intercept and enhance match data

To get started, check out Running ElastAlert For The First Time in the documentation.

Running ElastAlert

You can either install the latest released version of ElastAlert using pip:

pip install elastalert

or you can clone the ElastAlert repository for the most recent changes:

git clone https://github.com/Yelp/elastalert.git

Install the module:

pip install "setuptools>=11.3"

python setup.py install

The following invocation can be used to run ElastAlert after installing

$ elastalert [--debug] [--verbose] [--start <timestamp>] [--end <timestamp>] [--rule <filename.yaml>] [--config <filename.yaml>]

--debug will print additional information to the screen as well as suppresses alerts and instead prints the alert body. Not compatible with --verbose.

--verbose will print additional information without suppressing alerts. Not compatible with --debug.

--start will begin querying at the given timestamp. By default, ElastAlert will begin querying from the present. Timestamp format is YYYY-MM-DDTHH-MM-SS[-/+HH:MM] (Note the T between date and hour). Eg: --start 2014-09-26T12:00:00 (UTC) or --start 2014-10-01T07:30:00-05:00

--end will cause ElastAlert to stop querying at the given timestamp. By default, ElastAlert will continue to query indefinitely.

--rule will allow you to run only one rule. It must still be in the rules folder. Eg: --rule this_rule.yaml

--config allows you to specify the location of the configuration. By default, it is will look for config.yaml in the current directory.

Third Party Tools And Extras

Kibana plugin

img Available at the ElastAlert Kibana plugin repository.

Docker

A Dockerized version of ElastAlert including a REST api is build from master to bitsensor/elastalert:latest.

git clone https://github.com/bitsensor/elastalert.git; cd elastalert
docker run -d -p 3030:3030 \
    -v `pwd`/config/elastalert.yaml:/opt/elastalert/config.yaml \
    -v `pwd`/config/config.json:/opt/elastalert-server/config/config.json \
    -v `pwd`/rules:/opt/elastalert/rules \
    -v `pwd`/rule_templates:/opt/elastalert/rule_templates \
    --net="host" \
    --name elastalert bitsensor/elastalert:latest

Documentation

Read the documentation at Read the Docs.

To build a html version of the docs locally

pip install sphinx_rtd_theme sphinx
cd docs
make html

View in browser at build/html/index.html

Configuration

See config.yaml.example for details on configuration.

Example rules

Examples of different types of rules can be found in example_rules/.

  • example_spike.yaml is an example of the "spike" rule type, which allows you to alert when the rate of events, averaged over a time period, increases by a given factor. This example will send an email alert when there are 3 times more events matching a filter occurring within the last 2 hours than the number of events in the previous 2 hours.

  • example_frequency.yaml is an example of the "frequency" rule type, which will alert when there are a given number of events occuring within a time period. This example will send an email when 50 documents matching a given filter occur within a 4 hour timeframe.

  • example_change.yaml is an example of the "change" rule type, which will alert when a certain field in two documents changes. In this example, the alert email is sent when two documents with the same 'username' field but a different value of the 'country_name' field occur within 24 hours of each other.

  • example_new_term.yaml is an example of the "new term" rule type, which alerts when a new value appears in a field or fields. In this example, an email is sent when a new value of ("username", "computer") is encountered in example login logs.

Frequently Asked Questions

My rule is not getting any hits?

So you've managed to set up ElastAlert, write a rule, and run it, but nothing happens, or it says 0 query hits. First of all, we recommend using the command elastalert-test-rule rule.yaml to debug. It will show you how many documents match your filters for the last 24 hours (or more, see --help), and then shows you if any alerts would have fired. If you have a filter in your rule, remove it and try again. This will show you if the index is correct and that you have at least some documents. If you have a filter in Kibana and want to recreate it in ElastAlert, you probably want to use a query string. Your filter will look like

filter:
- query:
    query_string:
      query: "foo: bar AND baz: abc*"

If you receive an error that Elasticsearch is unable to parse it, it's likely the YAML is not spaced correctly, and the filter is not in the right format. If you are using other types of filters, like term, a common pitfall is not realizing that you may need to use the analyzed token. This is the default if you are using Logstash. For example,

filter:
- term:
    foo: "Test Document"

will not match even if the original value for foo was exactly "Test Document". Instead, you want to use foo.raw. If you are still having trouble troubleshooting why your documents do not match, try running ElastAlert with --es_debug_trace /path/to/file.log. This will log the queries made to Elasticsearch in full so that you can see exactly what is happening.

I got hits, why didn't I get an alert?

If you got logs that had X query hits, 0 matches, 0 alerts sent, it depends on the type why you didn't get any alerts. If type: any, a match will occur for every hit. If you are using type: frequency, num_events must occur within timeframe of each other for a match to occur. Different rules apply for different rule types.

If you see X matches, 0 alerts sent, this may occur for several reasons. If you set aggregation, the alert will not be sent until after that time has elapsed. If you have gotten an alert for this same rule before, that rule may be silenced for a period of time. The default is one minute between alerts. If a rule is silenced, you will see Ignoring match for silenced rule in the logs.

If you see X alerts sent but didn't get any alert, it's probably related to the alert configuration. If you are using the --debug flag, you will not receive any alerts. Instead, the alert text will be written to the console. Use --verbose to achieve the same affects without preventing alerts. If you are using email alert, make sure you have it configured for an SMTP server. By default, it will connect to localhost on port 25. It will also use the word "elastalert" as the "From:" address. Some SMTP servers will reject this because it does not have a domain while others will add their own domain automatically. See the email section in the documentation for how to configure this.

Why did I only get one alert when I expected to get several?

There is a setting called realert which is the minimum time between two alerts for the same rule. Any alert that occurs within this time will simply be dropped. The default value for this is one minute. If you want to receive an alert for every single match, even if they occur right after each other, use

realert:
  minutes: 0

You can of course set it higher as well.

How can I prevent duplicate alerts?

By setting realert, you will prevent the same rule from alerting twice in an amount of time.

realert:
  days: 1

You can also prevent duplicates based on a certain field by using query_key. For example, to prevent multiple alerts for the same user, you might use

realert:
  hours: 8
query_key: user

Note that this will also affect the way many rule types work. If you are using type: frequency for example, num_events for a single value of query_key must occur before an alert will be sent. You can also use a compound of multiple fields for this key. For example, if you only wanted to receieve an alert once for a specific error and hostname, you could use

query_key: [error, hostname]

Internally, this works by creating a new field for each document called field1,field2 with a value of value1,value2 and using that as the query_key.

The data for when an alert will fire again is stored in Elasticsearch in the elastalert_status index, with a _type of silence and also cached in memory.

How can I change what's in the alert?

You can use the field alert_text to add custom text to an alert. By setting alert_text_type: alert_text_only, it will be the entirety of the alert. You can also add different fields from the alert by using Python style string formatting and alert_text_args. For example

alert_text: "Something happened with {0} at {1}"
alert_text_type: alert_text_only
alert_text_args: ["username", "@timestamp"]

You can also limit the alert to only containing certain fields from the document by using include.

include: ["ip_address", "hostname", "status"]

My alert only contains data for one event, how can I see more?

If you are using type: frequency, you can set the option attach_related: true and every document will be included in the alert. An alternative, which works for every type, is top_count_keys. This will show the top counts for each value for certain fields. For example, if you have

top_count_keys: ["ip_address", "status"]

and 10 documents matched your alert, it may contain something like

ip_address:
127.0.0.1: 7
10.0.0.1: 2
192.168.0.1: 1

status:
200: 9
500: 1

How can I make the alert come at a certain time?

The aggregation feature will take every alert that has occured over a period of time and send them together in one alert. You can use cron style syntax to send all alerts that have occured since the last once by using

aggregation:
  schedule: '2 4 * * mon,fri'

I have lots of documents and it's really slow, how can I speed it up?

There are several ways to potentially speed up queries. If you are using index: logstash-*, Elasticsearch will query all shards, even if they do not possibly contain data with the correct timestamp. Instead, you can use Python time format strings and set use_strftime_index

index: logstash-%Y.%m
use_strftime_index: true

Another thing you could change is buffer_time. By default, ElastAlert will query large overlapping windows in order to ensure that it does not miss any events, even if they are indexed in real time. In config.yaml, you can adjust buffer_time to a smaller number to only query the most recent few minutes.

buffer_time:
  minutes: 5

By default, ElastAlert will download every document in full before processing them. Instead, you can have ElastAlert simply get a count of the number of documents that have occured in between each query. To do this, set use_count_query: true. This cannot be used if you use query_key, because ElastAlert will not know the contents of each documents, just the total number of them. This also reduces the precision of alerts, because all events that occur between each query will be rounded to a single timestamp.

If you are using query_key (a single key, not multiple keys) you can use use_terms_query. This will make ElastAlert perform a terms aggregation to get the counts for each value of a certain field. Both use_terms_query and use_count_query also require doc_type to be set to the _type of the documents. They may not be compatible with all rule types.

Can I perform aggregations?

The only aggregation supported currently is a terms aggregation, by setting use_terms_query.

I'm not using @timestamp, what do I do?

You can use timestamp_field to change which field ElastAlert will use as the timestamp. You can use timestamp_type to change it between ISO 8601 and unix timestamps. You must have some kind of timestamp for ElastAlert to work. If your events are not in real time, you can use query_delay and buffer_time to adjust when ElastAlert will look for documents.

I'm using flatline but I don't see any alerts

When using type: flatline, ElastAlert must see at least one document before it will alert you that it has stopped seeing them.

How can I get a "resolve" event?

ElastAlert does not currently support stateful alerts or resolve events.

Can I set a warning threshold?

Currently, the only way to set a warning threshold is by creating a second rule with a lower threshold.

License

ElastAlert is licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Read the documentation at Read the Docs.

Questions? Drop by #elastalert on Freenode IRC.

elastalert's People

Contributors

0xd012 avatar a3dho3yn avatar ahsanali avatar alvarolmedo avatar andrelouiscaron avatar armiiller avatar avanishp avatar bean5 avatar bitsofinfo avatar brianmpollack avatar danielpops avatar dylanjf avatar fiunchinho avatar jaguasch avatar jeffashton avatar johnsusek avatar jraby avatar ltagliamonte avatar matsgoran avatar micahhausler avatar mircopolo avatar msmerc avatar multani avatar muravitskiy avatar ndevox avatar pdscopes avatar qmando avatar ropes avatar sherifeldeeb avatar snuids avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elastalert's Issues

Send aggregated alerts at set time

Aggregations currently only send the alert at the time of the first match plus aggregation time. You should be able to set it to alert at a given time, daily, weekly, etc

Reduce the number of dependencies (make them optional when possible)

In order to increase the modularity of ElastAlert, it would be good to make optional every dependency that is not strictly mandatory. I am mostly referring to alerters here: for example, I would like to be able to install ElastAlert without needing to install python-jira.

From what I understand, that would mean :

is_silenced causes error on new index

create_index only creates a mapping for the silence type for rule_name. When is_silenced tries to query it sorts by "until" which doesn't yet exist in the index. The mapping should be specified for every type.

Option to buffer internal state of rules to disk

The current state of each rule is currently stored in memory, which it loses whenever it restarts.

It would be nice to allow the rules to write state to elastic search or to disk so it can gracefully resume whenever it restarts.

FFT-based Spike Rule

Our current spike rule insists traffic should not change in a 3 hour window.

Add a spike rule that predicts traffic should change in a somewhat sinusoidal shape over the course of a day.

ElastAlert crashes if value of query_key is not a string

ElastAlert attempts to create the 'key' for silenced alerts by concatenated the value of match[query_key] with the rule name. This causes an error if the value is non-str, for example, a list.

key = '.' + match[rule['query_key']]

TypeError: cannot concatenate 'str' and 'list' objects

Is it possible to have an expression in change rule

Hi Quentin

In the rule example - https://github.com/Yelp/elastalert/blob/master/example_rules/example_change.yaml

You watch for countryname to change. Is it possible to have an expression . I have a usecase, where I want to send an alert if the value has increased by 500 i.e at 10.00 AM I see 100 and then at 10:02 if I see a reading of 600 , then I need to throw an alert .

PS : Apologies if this is not the rite place to ask a question , I couldn't find any other forum .

Running with --debug option queries ES for silenced rules

This caused some query errors, because there was no mapping for until

We should modify is_silenced to always return false if elastalert is running in debug mode. We should also add the mapping for until to the metadata index in the create index script.

Duplicate events counted with buffer_time

When ElastAlert is restarted, it will ignore buffer_time on the first run and set starttime to the endtime of the last query run. However, on the 2nd query, it will again use buffer time, which can cause the query window to overlap the previous queries and data to be counted twice.

Kibana dashboards with or

An or in the filters doesn't work with generate_kibana_link. This could be done by changing the mandate field to "either" when parsing or

Search Jira for existing ticket before opening a new one

ElastAlert will create multiple tickets for a continuing alert. There should be a feature to have it bump existing tickets instead of duplicating them.
Something like:

  1. ElastAlert trigger fires
  2. ElastAlert searches JIRA for query_key & creation date < 30 days
  3. No result => create a JIRA
  4. Yes => Append a "Related web link" (Kibana link when the incident happened) to the existing JIRA

Pipeline alerts and allow them to share data

The alerts should have some way of communicating with each other. This could be done by creating a pipeline in the order defined in the rule file, where each alerter can pass a dictionary of data to the next. This would allow, for example, a JIRA alert to create a ticket and then an email alert to contain a link to that ticket.

Allow Jira tickets to be linked in Emails

If an alert sends both an email and creates a ticket, it should be able to link to the ticket in the email. In general, there should be a mechanism for alerts to communicate in some way with each other.

Add a custom action alerter

It would be useful to have a custom action alerter which would send the data to a configurable program or script. This would make for easier prototyping, by allowing people not to write custom python alerters.

Support Kibana 4

The kibana template only works for Kibana 3. We should be able to do roughly the same for the Kibana 4.

Using threading for to avoid blocking on ES queries

Rewrite the rules loop to use Python threading facilities, allowing rules to be run in parallel in multicore environments. Inspect the performance implications of this and correct for any lock contention or global state issues that may arise.

use_count_query produces unexpected results when --start is provided

The query window size (ie, resolution) will equal the run_every setting. When you run elastalert with --start provided, it essentially overrides the run_every setting and queries a big chunk of time at once. This manifests as different results when use_count_query/use_terms/query is used because the query time period is divided into run_every sized chunks and one buffer_time sized chunk.

There also isn't documentation about the resolution of use_count_query.

Allow mapping in config between num_events/threshold and values of query_key

It would be nice to be able to map alert thresholds with different values of query_key. This way, one rule can cover multiple things which only differ in a threshold value. An example of what this could look like:

type: frequency
query_key: error_type
num_events: 
  timeouterror: 50
  valueerror: 5
  someothererror: 15
  default: 10

AttributeError in top_count_keys

top_count_keys passed timestamp strings to get_top_counts, which expects datetime objects. The exception is thrown in format_index.

Traceback (most recent call last):
File "run_elastalert.py", line 6, in
client.start()
File "/nail/srv/_versions/elastalert-201503151132-324bf6daa4-master/virtualenv_run/lib/python2.6/site-packages/elastalert/elastalert.py", line 535, in start
num_matches = self.run_rule(rule, endtime, starttime)
File "/nail/srv/_versions/elastalert-201503151132-324bf6daa4-master/virtualenv_run/lib/python2.6/site-packages/elastalert/elastalert.py", line 415, in run_rule
self.alert([match], rule)
File "/nail/srv/_versions/elastalert-201503151132-324bf6daa4-master/virtualenv_run/lib/python2.6/site-packages/elastalert/elastalert.py", line 681, in alert
counts = self.get_top_counts(rule, start, end, keys, rule.get('top_count_number'), qk)
File "/nail/srv/_versions/elastalert-201503151132-324bf6daa4-master/virtualenv_run/lib/python2.6/site-packages/elastalert/elastalert.py", line 960, in get_top_counts
index = self.get_index(rule, starttime, endtime)
File "/nail/srv/_versions/elastalert-201503151132-324bf6daa4-master/virtualenv_run/lib/python2.6/site-packages/elastalert/elastalert.py", line 97, in get_index
return format_index(index, starttime, endtime)
File "/nail/srv/_versions/elastalert-201503151132-324bf6daa4-master/virtualenv_run/lib/python2.6/site-packages/elastalert/util.py", line 129, in format_index
start -= start.utcoffset()
AttributeError: 'str' object has no attribute 'utcoffset'

terms_size is undocumented

terms_size, the maximum number of results returned when using use_terms_query, is undocumented and has a horrible default of just 5

bosun.org

Hi!

Have you considered integration with https://bosun.org/?

It looks like Bosun supports setting up alerts against Elasticsearch too.

Add hostname to elastalert_status

If multiple instances are writing documents to elastalert_status, there needs to be some way to determine which machine they came from. Hostname would be useful to have in these logs.

Compound query_key

It would be useful to have a compound query_key with an arbitrary number of fields

Silence cache not updated sometimes with query_key

Alert with a query key are usually silenced by the rule['name']+key. If the key is missing, or manually added, it can be silenced with just the rule name. However, when that stash expires and an alert is sent, the silence cache will be updated with the key being name+key, and the expired cache remains.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.