Code Monkey home page Code Monkey logo

zabbix_integration's Introduction

Zabbix Integration Cookbook

Overview

This cookbook integrates all systems with Zabbix monitoring using Cluster Chef.

It has two kinds of recipes:

1) Recipes for integrated monitoring of common aspects of components like logs, ports, daemons, &c.

2) Recipes for standalone monitors of distributed systems like databases that Zabbix can’t talk to directly.

Usage

Aspect Monitoring

Components following the Metachef cookbook’s approach announce themselves along with their aspects. A typical example might be

# in webapp/recipes/backend.rb
announce(:webapp, :backend, {
  :logs    => { :unicorn => '/var/www/webapp/shared/logs' },
  :ports   => { :master  => 8080, :slave1 => 8081, :slave2 => 8082 },
  :daemons => { :unicorn => 'webapp_backend' }
  })

# in webapp/recipes/frontend.rb
announce(:webapp, :frontend, {
  :logs    => { :nginx => '/var/www/webapp/shared/logs' },
  :ports   => { :http  => 80 },
  :daemons => { :nginx => 'nginx' }
})

# in webapp/recipes/worker.rb
announce(:webapp, :worker, {
  :logs    => { :resque => '/var/www/webapp/shared/logs' },
  :ports   => { :admin  => 9000 },
  :daemons => { :web => 'resque_web', :worker1 => 'resque_worker_1, :worker2 -> 'resque_worker_2' }
})

These recipes may run on the same or on different machines. Each announcement decalres all the aspects in a way that allows them to be monitored or processed in a fashion decoupled from the ‘webapp’ cookbook itself.

This cookbook sets up Zabbix items and triggers to monitor and alert on each aspect a component may declare.

The following recipes are available:

zabbix_integration::logs

This recipe monitors log files and directories, ensuring that the files within do not grow too large.

The following attributes define the default maximum file size and alert priority for a bloated log file.

default[:zabbix_integration][:logs][:file_size][:max_size] = 200
default[:zabbix_integration][:logs][:file_size][:priority] = :warning

Declaring a log file and overriding these options occur in the same place.

Here’s a component declaring a log directory. All files immediately within the directory will be monitored with the default settings:

announce(:webapp, :worker, { :logs => "/var/www/webapp/logs" })

Here’s a component declaring a specific log file:

announce(:webapp, :worker, {
  :logs => {
    :access => "/var/www/webapp/webapp.access.log",
    :error  => "/var/www/webapp/webapp.error.log"
  }})

There are two reasons for individually specifying log files in the same directory:

1) You want to provide different handling for each file (see customization below)

2) Not all the other files in the directory are logs.

Here’s a component customizing the maximum log file size and the alert priority:

announce(:webapp, :worker, {
  :logs => {
    # high throughput access log
    :access => {
      :path     => "/var/www/webapp/webapp.access.log",

:max_size => 1024, # in MB

  :priority => :high # or: not_classified information warning average high disaster
},
# low throughput error log
:error => {
  :path     => "/var/www/webapp/webapp.error.log",

:max_size => 200, :priority => :warning

}}})

A log aspect can be declared and monitoring can be selectively disabled:

announce(:webapp, :worker, {
  :logs => {
    :access => {
      :path    => "/var/www/webapp/webapp.access.log",

:monitor => false

}}})

zabbix_integration::ports

This recipe monitors ports along with their availability and performance. Failures to connect to a port will cause an availability alert and poor response time from a port will cause a response time alert.

The following attributes define the default maximum failures and the default maximum average response time over chosen intervals with chosen alert priority:

default[:zabbix_integration][:ports][:availability][:window]   = 900
default[:zabbix_integration][:ports][:availability][:failures] = 5
default[:zabbix_integration][:ports][:availability][:priority] = :high

default[:zabbix_integration][:ports][:response_time][:window]   = 900
default[:zabbix_integration][:ports][:response_time][:average]  = 3.0
default[:zabbix_integration][:ports][:response_time][:priority] = :warning

Declaring a port and overriding these options occur in the same place.

Here’s a simple component with some ports:

announce(:webapp, :frontend, :ports => { :app => 8080, :admin => 8081 })

Here we dig in, specifying a different tolerance for failures and performance:

announce(:webapp, :frontend, {
  :ports => {
    :app => {
      :port 	       => 8080,

:failures => 2, :response_time => 1.0

},
:admin => {
  :port 	       => 8081,

:failures => 20, :repsonse_time => 5.0

}}})

Both the maximum number of failures and the average response time are measured over the last 15 minutes (900 s). This timeframe as well as the priority can be configured on a per-port basis:

announce(:webapp, :frontend, {
  :ports => {
    :app => {
      :port 	       	       	=> 8080

:failures => 2, :availability_window => 900, :availability_priority => :high,

:response_time => 1.0, :response_time_window => 60, :response_time_priority => :high

},
...
}}})

All port-checks are defined as “simple” Zabbix checks so they will be initiated by the server to the node – the node is not just looping back on itself to measure a local port’s performance.

A port aspect can be announced and have its monitoring disabled:

announce(:webapp, :frontend, {
  :ports => {
    :app => {
      :port    => 8080,

:monitor => false

}}})

zabbix_integration::daemons

This recipe monitors running daemons. If a daemon process is not found to be running, an alert will be sent.

The following attributes define the default priority for alerts triggered when daemons are found to be not running:

default[:zabbix_integration][:daemons][:running][:priority] = :average

Declaring a daemon and overriding this default option occur in the same place.

Here’s a simple component with a daemon:

announce(:webapp, :frontend, :daemons => { :unicorn => 'unicorn' })

If no process naemd ‘unicorn’ can be found on the node then this will cause an alert.

Some command line programs, typically those that are compiled, are their own ‘process name’: ‘nginx’, ‘sort’, ‘sshd’, &c. Interpreted scripts (‘/usr/bin/my-ruby-script’) also work this way.

For some daemons, the process name may not be useful or may not be enough to distinguish the daemon. In this case we can pass another option

announce(:webapp, :frontend, {
  :daemons => {
    :php_cgi => {
      :name => 'php-cgi',

:cmd => ‘webapp.ini’

}}})

Where the actual command line of the process is filtered using ‘webapp.ini’ (it’s not clear what the limits to this filtering syntax is – best not get too crazy).

Sometimes its sufficient to identify a process run by a user:

announce(:webapp, :frontend, {
  :daemons => {
    :php_cgi => {
      :name => 'php-cgi',

:user => ‘www-data’

}}})

It’s also possible to list multiple processes with specific counts:

announce(:webapp, :frontend, {
  :daemons => {
    :unicorn_master => {
      :name   => '',

:cmd => ‘unicorn master’, :number => 1

},
:unicorn_worker => {
  :name   => '',

:cmd => ‘unicorn worker’, :number => 4

}}})

A daemon aspect can be announced and have its monitoring disabled:

announce(:webapp, :frontend, {
  :daemons => {
    :nginx => {
      :name    => 'nginx',

:monitor => false

}}})

Standalone Monitors

Zabbix likes to monitor systems via the Zabbix agent that is installed on each monitored machine. This works well for systems that are limited to a single node and that can speak Zabbix’s protocol or be defined via UserParameters but fails horribly when trying to monitor something like a distributed database.

Happily Zabbix allows for measurements to be directly sent to it via some long-running script.

This cookbook defines several such Ruby scripts. Each:

  • provides performance metrics about a (possibly distributed) component

  • without having to be deployed on the same nodes that run the component

  • submits this data to a Zabbix server via a persistent FIFO set up on the node.

The recipes which set up standalone monitors include:

  • zabbix_integration::elasticsearch_monitor to monitor an ElasticSearch cluster

  • zabbix_integration::flume_monitor to monitor a Flume cluster

  • zabbix_integration::hbase_monitor to monitor an HBase cluster

  • zabbix_integration::redis_monitor to monitor Redis nodes

  • zabbix_integration::mongodb_monitor to monitor MongoDB nodes

zabbix_integration's People

Watchers

James Chang avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.