Code Monkey home page Code Monkey logo

nagios-plugin-mongodb's Introduction

Nagios-MongoDB

Overview

This is a simple Nagios check script to monitor your MongoDB server(s).

Authors

Main Author

Mike Zupan mike -(at)- zcentric.com

Contributers

  • Frank Brandewiede <brande -(at)- travel-iq.com> <brande -(at)- bfiw.de> <brande -(at)- novolab.de>
  • Sam Perman <sam -(at)- brightcove.com>
  • Shlomo Priymak <shlomoid -(at)- gmail.com>
  • @jhoff909 on github
  • Dag Stockstad <dag.stockstad -(at)- gmail.com>

Installation

In your Nagios plugins directory run

git clone git://github.com/mzupan/nagios-plugin-mongodb.git

Then use pip to ensure you have all pre-requisites.

pip install -r requirements

Usage

Install in Nagios

Edit your commands.cfg and add the following


define command {
    command_name    check_mongodb
    command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$
}

define command {
    command_name    check_mongodb_database
    command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -d $ARG5$
}

define command {
    command_name    check_mongodb_collection
    command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -d $ARG5$ -c $ARG6$
}

define command {
    command_name    check_mongodb_replicaset
    command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -r $ARG5$
}

define command {
    command_name    check_mongodb_query
    command_line    $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $HOSTADDRESS$ -A $ARG1$ -P $ARG2$ -W $ARG3$ -C $ARG4$ -q $ARG5$
}

(add -D to the command if you want to add perfdata to the output) Then you can reference it like the following. This is is my services.cfg

Check Connection

This will check each host that is listed in the Mongo Servers group. It will issue a warning if the connection to the server takes 2 seconds and a critical error if it takes over 4 seconds


define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Connect Check
    check_command           check_mongodb!connect!27017!2!4
}

Check Percentage of Open Connections

This is a test that will check the percentage of free connections left on the Mongo server. In the following example it will send out an warning if the connection pool is 70% used and a critical error if it is 80% used.


define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Free Connections
    check_command           check_mongodb!connections!27017!70!80
}

Check Replication Lag

This is a test that will test the replication lag of Mongo servers. It will send out a warning if the lag is over 15 seconds and a critical error if its over 30 seconds. Please note that this check uses 'optime' from rs.status() which will be behind realtime as heartbeat requests between servers only occur every few seconds. Thus this check may show an apparent lag of < 10 seconds when there really isn't any. Use larger values for reliable monitoring.


define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Replication Lag
    check_command           check_mongodb!replication_lag!27017!15!30
}

Check Replication Lag Percentage

This is a test that will test the replication lag percentage of Mongo servers. It will send out a warning if the lag is over 50 percents and a critical error if its over 75 percents. Please note that this check gets oplog timeDiff from primary and compares it to replication lag. When this check reaches 100 percent full resync is needed.


define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Replication Lag Percentage
    check_command           check_mongodb!replication_lag_percent!27017!50!75
}

Check Memory Usage

This is a test that will test the memory usage of Mongo server. In my example my Mongo servers have 32 gigs of memory so I'll trigger a warning if Mongo uses over 20 gigs of ram and a error if Mongo uses over 28 gigs of memory.


define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Memory Usage
    check_command           check_mongodb!memory!27017!20!28
}

Check Mapped Memory Usage

This is a test that will check the mapped memory usage of Mongo server.


define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Mapped Memory Usage
    check_command           check_mongodb!memory_mapped!27017!20!28
}

Check Lock Time Percentage

This is a test that will test the lock time percentage of Mongo server. In my example my Mongo I want to be warned if the lock time is above 5% and get an error if it's above 10%. When you start to have lock time it generally means your db is now overloaded.


define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Lock Percentage
    check_command           check_mongodb!lock!27017!5!10
}

Check Average Flush Time

This is a test that will check the average flush time of Mongo server. In my example my Mongo I want to be warned if the average flush time is above 100ms and get an error if it's above 200ms. When you start to get a high average flush time it means your database is write bound.


define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Flush Average
    check_command           check_mongodb!flushing!27017!100!200
}

Check Last Flush Time

This is a test that will check the last flush time of Mongo server. In my example my Mongo I want to be warned if the last flush time is above 200ms and get an error if it's above 400ms. When you start to get a high flush time it means your server might be needing faster disk or its time to shard.


define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Last Flush Time
    check_command           check_mongodb!last_flush_time!27017!200!400
}

Check status of mongodb replicaset

This is a test that will check the status of nodes within a replicaset. Depending which status it is it sends a waring during status 0, 3 and 5, critical if the status is 4, 6 or 8 and a ok with status 1, 2 and 7.

Note the trailing 2 0's keep those 0's as the check doesn't compare to anything.. So those values need to be there for the check to work.


define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB state
      check_command           check_mongodb!replset_state!27017!0!0
}

Check status of index miss ratio

This is a test that will check the ratio of index hits to misses. If the ratio is high, you should consider adding indexes. I want to get a warning if the ratio is above .005 and get an error if it's above .01


define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Index Miss Ratio
      check_command           check_mongodb!index_miss_ratio!27017!.005!.01
}

Check number of databases and number of collections

These tests will count the number of databases and the number of collections. It is usefull e.g. when your application "leaks" databases or collections. Set the warning, critical level to fit your application.


define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Number of databases
      check_command           check_mongodb!databases!27017!300!500
}

define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Number of collections
      check_command           check_mongodb!collections!27017!300!500
}

Check size of a database

This will check the size of a database. This is useful for keeping track of growth of a particular database. Replace your-database with the name of your database


define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Database size your-database
      check_command           check_mongodb_database!database_size!27017!300!500!your-database
}

Check index size of a database

This will check the index size of a database. Overlarge indexes eat up memory and indicate a need for compaction. Replace your-database with the name of your database


define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Database index size your-database
      check_command           check_mongodb_database!database_indexes!27017!50!100!your-database
}

Check index size of a collection

This will check the index size of a collection. Overlarge indexes eat up memory and indicate a need for compaction. Replace your-database with the name of your database and your-collection with the name of your collection


define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Database index size your-database
      check_command           check_mongodb_collection!collection_indexes!27017!50!100!your-database!your-collection
}

Check the primary server of replicaset

This will check the primary server of a replicaset. This is useful for catching unexpected stepdowns of the replica's primary server. Replace your-replicaset with the name of your replicaset


define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Replicaset Master Monitor: your-replicaset
      check_command           check_mongodb_replicaset!replica_primary!27017!0!1!your-replicaset
}

Check the number of queries per second

This will check the number of queries per second on a server. Since MongoDB gives us the number as a running counter, we store the last value in the local database in the nagios_check collection. The following types are accepted: query|insert|update|delete|getmore|command

This command will check updates per second and alert if the count is over 200 and warn if over 150


define service {
      use                     generic-service
      hostgroup_name          Mongo Servers
      service_description     MongoDB Updates per Second
      check_command           check_mongodb_query!queries_per_second!27017!200!150!update
}

Check Primary Connection

This will check each host that is listed in the Mongo Servers group. It will issue a warning if the connection to the primary server of current replicaset takes 2 seconds and a critical error if it takes over 4 seconds


define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Connect Check
    check_command           check_mongodb!connect_primary!27017!2!4
}

Check Collection State

This will check each host that is listed in the Mongo Servers group. It can be useful to check availability of a critical collection (locks, timeout, config server unavailable...). It will issue a critical error if find_one query failed


define service {
    use                 generic-service
    hostgroup_name          Mongo Servers
    service_description     Mongo Collection State
    check_command           check_mongodb!collection_state!27017!your-database!your-collection
}

nagios-plugin-mongodb's People

Contributors

adrianlzt avatar andyroyle avatar averstappen avatar bastianvoigt avatar dstockstad avatar epleterte avatar hdeheer avatar hedenface avatar hilli avatar hydrapolic avatar janboll avatar jhoff909 avatar jvginkel avatar kamaradclimber avatar keralin avatar khodin avatar konishchevdmitry avatar krobertson avatar mmore avatar mvernimmen-cg avatar mzupan avatar notz avatar organicveggie avatar rectalogic avatar ruben-herold avatar sharon-tickell avatar shlomoid avatar srri avatar warrenpnz avatar zopyx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nagios-plugin-mongodb's Issues

No module named pymongo

Hi All,

We have Nagios server.
[nagios-server nagios-plugin-mongodb]# ./check_mongodb.py --help
No module named pymongo
[nagios-server nagios-plugin-mongodb]#rpm -qa python-devel
python-devel-2.6.6-29.el6_2.2.x86_64

But On client side.
[db01 plugins]# ./check_mongodb.py -H localhost -p 27017
OK - Connection took 0 seconds

Some one please help me .

Status information: (null)

The subject seems to be self explanatory. I've installed the plugin to /usr/local/libexec/nagios, configured everything fine, but it keeps returning me (null) as status information.

When it failed with a generic instalation, I tried to modify the config, but still with no results. Here's what I tried:

  1. I'm running a freebsd host, so I changed #!/usr/bin/env python to #!/usr/local/bin/python, no success

  2. I changed nagios command config to have full, absolute path to the script, no result

  3. I changed the command config to have hardcoded arguments, no result.

The script works fine if I execute it from the console, either from root or nagios user. I can see the output properly and valid return codes.

Feature Request

Is it possible that you can return those returned stats into "performance data" format so that Nagios can create rrd databased and then be graphed by the pnp4nagios plug-in?

Thanks,

Henry

No module named pymongo

Hi
Form nagios server check it working fine [root@nagio-server libexec]# ./check_mongodb.py -H 10.180.86.108 -p 27017
OK - Connection took 0 seconds

but when i run check form remote side its give me error. [root@mogodb-server plugins]# ./check_mongodb.py -H localhost
No module named pymongo

Please advise.

Ali

replication lag output incorrect

from the nagios UI:

OK - Max replication lag: 2 [ip-10-2-31-239.ec2.internal:27017 lag=1:ip-10-122-161-138.ec2.internal:27017 lag=]

You can see it's cutting off the last character.

you can also see this from the command line;

ubuntu@ip-10-212-115-11:/etc/oggi/nagios$ /usr/lib/nagios/plugins/nagios-plugin-mongodb/check_mongodb.py -H 10.255.101.220 -A replication_lag
OK - Max replication lag: 1 [ip-10-2-31-239.ec2.internal:27017 lag=1;ip-10-122-161-138.ec2.internal:27017 lag=]

If I have time today, I will look at the code. This should be easy :)

Checking arbiter fails

OK - State: 7 (Arbiter)
CRITICAL - General MongoDB Error: 'int' object has no attribute 'admin'

I am using python 2.6, CentOS 6 and the server is running mongo 1.8

Check the number of queries on replicasets

Hi ,

I was trying out Check the number of queries per second on non-master nodes. I think we should allow "query" to be read from non-master node as well right? We are allowing applications to do queries directory from replica-set instead of the primary node.

Thanks,

Henry

Use of replicaSet argument for pymongo.Connection() call on version 1.9 is not supported

Hello,

on RedHat 5, pymongo package module (version 1.9) in EPL does not support replicaSet argument in pymongo.Connection() call

> ./check_mongodb.py -H localhost -A replica_primary -P 27017 --replicaset "tst_set03"
CRITICAL - General MongoDB Error: __init__() got an unexpected keyword argument  'replicaSet'

I made a pull request with a workaround that make the replicaset name check manually after getting the connection.

br,

--Jeremy

Improve performance data

Performance data lacks UOM modifier, and for database size it appears to be formatted wrongly, the database=foo is extra?

NRPE: Unable to read output

Hi

I have installed pymongo many thanks.

Command is running successfully on both server nagios server and remote side

[root@Mongo-server plugins]# ./check_mongodb.py -H localhost -p 27017
OK - Connection took 0 seconds

root@Balancer libexec]# ./check_mongodb.py -H 10.180.86.108 -p 27017
OK - Connection took 0 seconds

but nagios show NRPE: Unable to read output I tried my best to resolved it but its not working any advise remaining checks are working perfectly.

Licensing unclear

Hello,

There does not seem to be any kind of license attached to this work. Under which terms should we use it ? BSD, GPL ? Anything else ?

I would like to use it but I will be able only if it's a permissive license (BSD like).

I guess all authors should agree.

In all cases, there should be a LICENSE statement somewhere...

Check replica_primary attempts update on secondary

When the replica_primary check finds that the current host has switched from primary to secondary, it attempts to update the relevant record, and then crashes.

Suggested behaviour: when on the secondary, the plugin should simply exit with a warning or error as appropriate, and leave updating the collection to the copy that executes on the master.

# check_mongodb.py -u nagios -p Aeng2yau -A replica_primary -r cubmongodev
Traceback (most recent call last):
  File "check_mongodb.py", line 1386, in <module>
    sys.exit(main(sys.argv[1:]))
  File "check_mongodb.py", line 236, in main
    return check_replica_primary(con, host, warning, critical, perf_data, replicaset)
  File "check_mongodb.py", line 1162, in check_replica_primary
    db.last_primary_server.update({"_id": "last_primary"}, {"$set": last_primary_server_record}, upsert=True, safe=True)
  File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 405, in update
    _check_keys, self.__uuid_subtype), safe)
  File "/usr/lib/python2.7/dist-packages/pymongo/connection.py", line 748, in _send_message
    raise AutoReconnect(str(e))
pymongo.errors.AutoReconnect: not master

mongodb lock over 100%.

Hi,

We are using mongodb 2.2 in our production. As the lock changed from global to db level in this version, sometimes the mongo lock check says the lock is over 100% as it's checking the lock of the local name space (the one used for oplog). Is it possible to add a option to check just some specific db.

CRITICAL - Lock Percentage: 475.87%

If you have time, can you take a look at this, thanks for the great plugin.

Roderic Liu

check_collections fail on slaves

Trying to run check_mongodb.py -A collections -P 27017 -W 160 -C 200 will fail with:

CRITICAL - General MongoDB Error: not master and slaveOk=false.

However, trying to pass --replica <replicaname> is prevented with a warning:

passing a replicaset while not checking replica_primary does not work

If I comment-out the check:

    #elif not action == 'replica_primary' and replicaset:
    #    return "passing a replicaset while not checking replica_primary does not work"

Then it works fine. Happy to submit a pull request, to extend the test to other potential commands, but wanted to check first if there's something I'm perhaps missing?

Connection Timed Out

Hello!

I am using this plugin, and I consistently get error notifications from Icingina/Nagios for this plugin specifically about the connecting timing out, regardless of what test it is trying to perform. I have tried increasing the timeout time, but have had no luck.

If I run the script from the command line, I am incapable of getting the problem to replicate.

So it's hard for me to say what the specific issue is, but I am hoping someone here can start to point me in the direction of figuring out exactly what is happening with these errors. We will be woken up a lot less during the night if I can figure out what is going on :P

replSet State Check

The replset_state check no longer works like the previous versions. I had to add something for the -W and -C even though it appears to be ignored. IE:

define service {
use generic-service
hostgroup_name Mongo Servers
service_description MongoDB state
check_command check_mongodb!replset_state!27017!0!0
}

Was this intentional?

Thanks,

-=Dusty

Feature Request

Is it possible that you can return those returned stats into "performance data" format so that Nagios can create rrd databased and then be graphed by the pnp4nagios plug-in?

Thanks,

Henry

replset_state Check not reporting correctly

It looks like neither the warning nor critical states of the replset_state check are working correctly:

def check_replset_state(con, perf_data, warning="", critical=""):
    try:
        warning = [int(x) for x in warning.split(",")]
    except:
        warning = [0, 3, 5]
    try:
        critical = [int(x) for x in critical.split(",")]
    except:
        critical = [8, 4, -1]

    ok = range(-1, 8)  # should include the range of all posiible values
...

Example results (should be WARNING):

OK - State: 3 (Recovering)

I hacked this in to get it going:

def check_replset_state(con, perf_data, warning="", critical=""):
    warning = []
    # most of these states are critical imho
    critical = [0,5,3,8,4,-1]
    ok = range(-1,8) #should include the range of all posiible values
...

Thoughts?

Curious about replica set monitoring...

I am new to mongo, so I am sorry if this is totally off...

We had a replica set failure, since recovered with a full wipe and full sync of the mongo DB.

http://$masterip:28017/_replSet seems to show that all is well.

The replication nagios monitor, however, sees only:

CRITICAL - Max replication lag: 307496 [$slaveip lag=307496: $slaveip lag=3857: $slaveip lag=0: ]

Does this replication check support replica sets?

We are on mongo version 1.6, and I have less insight to replication than I wish I did.

Thank you,
Joshua

Query per second checks not updating mongo document

The query per second doesn't seem to work consistently for me. To the best of my knowledge it seems to not update the nagios-checks document/collection (sorry, I'm not super knowledgeable of the terms) correctly.

I can manually watch queries of the various kinds being made with mongostats. Sometimes it manifests as it will ALWAYS say "This is the first check ... " and never update itself. I see some of the queries in the document/collection and others never get added. Other times the query number just won't update so I have the same number in the document after the check is run as before the check was run.

To me it seems inconsistent, so I have not been able to figure out what is causing the issue.

Wrong performance data

Performance data is being generated wrongly. It's missing the comas between labels. Will be great to have units also.

Performance data should be:
'label'=value[UOM];[warn];[crit];[min];[max]

More info:
http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN201

Actual:

./check_mongodb.py -H my_mongo_server -P 27017 --perf-data -A memory

OK - Memory Usage: 0.49GB resident, 3.04GB virtual, 1.56GB mapped, |memory_usage=0.49;8;16memory_mapped=1.56memory_virtual=3.04mappedWithJournal=0.00

./check_mongodb.py -H my_mongo_server -P 27017 --perf-data -A connections

OK - 0 percent (30 of 9600 connections) used |used_percent=0;80;95current_connections=30.0available_connections=9570.0

Repl Node name does not necessarily match hostname

If the nagios server's DNS uses different domain search than the mongo cluster, the host required by -H may not match the name in the replset. Suggest adding a --name option to specify an alternate node name.

check_mongodb.py can't check correct on a host with multi-mongod

Hi Mike

Thanks for a powerful Nagios plugin for Mongodb and your good works.

I found that checking a multi-mongod host with action replication_lag and --max-lag, the check_mongodb.py can't check correct. Like a replica set: a primary node, a secondary node, a arbiter node.
xxxx:SECONDARY> rs.status()
{
"set" : "xxxx",
"date" : ISODate("2013-05-15T05:35:14Z"),
"myState" : 2,
"syncingTo" : "10.136.24.88:27032",
"members" : [
{
"_id" : 0,
"name" : "10.136.24.88:27032",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 13724936,
"optime" : Timestamp(1359596451000, 1),
"optimeDate" : ISODate("2013-01-31T01:40:51Z"),
"lastHeartbeat" : ISODate("2013-05-15T05:35:13Z"),
"pingMs" : 0
},
{
"_id" : 1,
"name" : "10.136.24.89:27032",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 13724967,
"optime" : Timestamp(1359596451000, 1),
"optimeDate" : ISODate("2013-01-31T01:40:51Z"),
"self" : true
},
{
"_id" : 2,
"name" : "10.136.24.89:37032",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 13724936,
"lastHeartbeat" : ISODate("2013-05-15T05:35:14Z"),
"pingMs" : 0
}
],
"ok" : 1
}

Run the script

./check_mongodb.py -H 10.136.24.89 -D -P 27032 -A replication_lag --max-lag -W 5 -C 10

The variable host_node on line 367 is assigned to the arbiter node values. So i write a patch.

--- check_mongodb.py 2013-05-15 13:24:39.000000000 +0800
+++ monitor/check_mongodb.py 2013-05-15 13:23:51.000000000 +0800
@@ -185,9 +185,9 @@ def main(argv):
if action == "connections":
return check_connections(con, warning, critical, perf_data)
elif action == "replication_lag":

  •    return check_rep_lag(con, host, warning, critical, False, perf_data, max_lag, user, passwd)
    
  •    return check_rep_lag(con, host, port, warning, critical, False, perf_data, max_lag, user, passwd)
    
    elif action == "replication_lag_percent":
  •    return check_rep_lag(con, host, warning, critical, True, perf_data, max_lag, user, passwd)
    
  •    return check_rep_lag(con, host, port, warning, critical, True, perf_data, max_lag, user, passwd)
    
    elif action == "replset_state":
    return check_replset_state(con, perf_data, warning, critical)
    elif action == "memory":
    @@ -324,7 +324,7 @@ def check_connections(con, warning, crit
    return exit_with_general_critical(e)

-def check_rep_lag(con, host, warning, critical, percent, perf_data, max_lag, user, passwd):
+def check_rep_lag(con, host, port, warning, critical, percent, perf_data, max_lag, user, passwd):
if percent:
warning = warning or 50
critical = critical or 75
@@ -363,7 +363,7 @@ def check_rep_lag(con, host, warning, cr
for member in rs_status["members"]:
if member["stateStr"] == "PRIMARY":
primary_node = member

  •            if member["name"].split(':')[0] == host:
    
  •            if member["name"].split(':')[0] == host and int(member["name"].split(':')[1]) == port:
                 host_node = member
    
         # Check if we're in the middle of an election and don't have a primary
    

No Connecting to Arbiter Servers

I think i found a little bug in this script.

When i try to monitor an Arbiter Server with the replset_state action it fails with a Critical Error - State 2.

The Problem seems that pymongo isn't allowing to connect to an arbiter. (Which is not so bad for non-monitoring purposes)
It fails with an exception (i cleared the hostname):
pymongo.errors.AutoReconnect: <_host_>:27017 is an arbiter

You only geht this debug output when you eliminate the try statement at the connection and loggin in section.

So you can't monitor an Arbiter with this thing because you will fall into the exception path of the try statement an the script returns status 2 to Nagios which ends up in a Critical Error.

Possible Solution:
Catch that error in the exception Path and do your Arbiter check through the driver error message.
But I'm not sure if there will be other errors catched too, that we possibly don't want to be catched....

Best Regards

Ingo Gottwald

General MongoDB Error: not authorized for update on local.nagios_check

I am using MongoDB version 2.4.4. and I created a user named "omd" with Readonly permission on "admin" database, but when run the query checks, I get the following error:

General MongoDB Error: not authorized for update on local.nagios_check

If I use the "admin" user , thing will appear normally except "insert" and "update". They would report with "connection took blah blah blah..." instead of "Queries blah blah blah... "

MDB_Query_Per_Second 0 OK - Queries / Sec: 0.750000 |query_per_sec=0.75;150.0;200.0
MDB_Insert_Per_Second 0 OK - Connection took 0 seconds |connection_time=0.0;150.0;200.0
MDB_Update_Per_Second 0 OK - Connection took 0 seconds |connection_time=0.0;150.0;200.0
MDB_Delete_Per_Second 0 OK - Queries / Sec: 0.000000 |delete_per_sec=0.0;150.0;200.0
MDB_Getmore_Per_Second 0 OK - Queries / Sec: 0.000000 |getmore_per_sec=0.0;150.0;200.0
MDB_Command_Per_Second 0 OK - Queries / Sec: 6.562500 |command_per_sec=6.5625;150.0;200.0

authentication

when running:

python check_mongodb.py -A replica_primary -u nagios -p password -r replset

on an auth enabled instance

result:

pymongo.errors.OperationFailure: database error: not authorized for query on nagios.system.namespaces

because on line 1147:

db = con["nagios"]

switch to nagios db without authenticating first on it. so it fails

a workaround is to add:

            db = con["nagios"]
            if not db.authenticate(user, passwd):
                sys.exit("Username/Password incorrect")

after line 271

Different result on nagios mongo check and mongostat

Please help me out. I have found different result on nagios mongo check and mongostat.Any suggestion?

[root@-node3 plugins]# ./check_mongo_lock -A lock -W 9 -C 10
CRITICAL - Lock Percentage: 1437.81%

insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time
*7 1 *0 *0 0 5|0 0 13.2g 28.6g 7.32g 0 .:2.7% 0 0|0 0|0 598b 4k 1720 mongo3 SEC 05:31:05
*5 3 *0 *0 0 7|0 0 13.2g 28.6g 7.32g 0 serverdata:0.2% 0 0|0 0|0 737b 8k 1720 mongo3 SEC 05:31:06
*3 *0 *0 *0 0 9|0 0 13.2g 28.6g 7.32g 0 .:3.2% 0 0|0 0|0 745b 5k 1720 mongo3 SEC 05:31:07
*8 1 *0 *0 0 3|0 0 13.2g 28.6g 7.32g 0 .:0.6% 0 0|0 0|0 287b 4k 1720 mongo3 SEC 05:31:08
*8 *0 *0 *0 0 21|0 0 13.2g 28.6g 7.32g 0 serverdata:0.1% 0 0|0 0|0 1k 11k 1720 mongo3 SEC 05:31:09
*3 2 *0 *0 0 33|0 0 13.2g 28.6g 7.32g 0 .:1.1% 0 0|0 0|0 2k 20k 1720 mongo3 SEC 05:31:10
*4 1 *0 *0 0 15|0 0 13.2g 28.6g 7.32g 0 .:0.5% 0 0|0 0|0 1k 9k 1720 mongo

Please add mention of pymongo module in README

Hi and thanks for making a plugin to check mongodb.
It would be worth to mention the installation of pymongo in the readme, just below the git clone command.
In Debian this is relatively easy:

apt-get install python-pymongo

Just for the sake of completeness ;-)

max_lag incorrectly calculated if stateStr == STARTUP2

With replication_lag, if a node is not PRIMARY, we added a line to output in Nagios who the primary is. This I found on the web but forgot where.
OK - Primary server has not changed and is VP-L012-DBSTG-L01-CMS.pearsontc.com:27017

We also needed to skip member stateStr==STARTUP2 when calculating max_lag on Primary. For instance, we've got nodes in the process of being configured and do not want the max_lag from STARTUP2 status nodes' optimeDate to be used in the calculation.
I created a simple patch for this and applied in my environment:
@@ -397,10 +397,11 @@
maximal_lag = 0
for member in rs_status['members']:
if not member['stateStr'] == "ARBITER":

  •                        lastSlaveOpTime = member['optimeDate']
    
  •                        replicationLag = abs(primary_node["optimeDate"] - lastSlaveOpTime).seconds - slaveDelays[member['name']]
    
  •                        data = data + member['name'] + " lag=%d;" % replicationLag
    
  •                        maximal_lag = max(maximal_lag, replicationLag)
    
  •                       if not member['stateStr'] == "STARTUP2":
    
  •                            lastSlaveOpTime = member['optimeDate']
    
  •                            replicationLag = abs(primary_node["optimeDate"] - lastSlaveOpTime).seconds - slaveDelays[member['name']]
    
  •                            data = data + member['name'] + " lag=%d;" % replicationLag
    
  •                            maximal_lag = max(maximal_lag, replicationLag)
                 if percent:
                     err, con = mongo_connect(primary_node['name'].split(':')[0], int(primary_node['name'].split(':')[1]), False, user, passwd)
                     if err != 0:
    
    @@ -1161,6 +1162,8 @@
    db.last_primary_server.update({"_id": "last_primary"}, {"$set": last_primary_server_record}, upsert=True, safe=True)
    message = "Primary server has changed from %s to %s" % (saved_primary, current_primary)
    primary_status = 1
  • if current_primary == saved_primary:
  •    message = "Primary server has not changed and is %s" % (current_primary)
    
    return check_levels(primary_status, warning, critical, message)

MongoDB Error: 'datetime.timedelta'

Mongodb version 2.2.1

I am getting this error on servers that are non-primary server of my replica sets:

CRITICAL - General MongoDB Error: 'datetime.timedelta' object has no attribute 'total_seconds'

I am using this service:

define service {
use generic-service
hostgroup_name Mongo Servers
service_description Mongo Replication Lag
check_command check_mongodb!replication_lag!27017!15!30
}

PyPI release

We use this module, but we don't really want to install it on production servers via git clone, because:

  • we want to install versioned artefacts to servers
  • we already have a dependency on PyPI for production deployments, we don't want one on github as well
  • cloning into /usr/lib/nagios/plugins only allows for one git repository to use this trick -- if we had another nagios plugin with the same installation mechanism we couldn't use both.

Would it be possible to publish this to PyPI so that we can install using pip or easy_install?

replication lag doesn't check ALL replica set slaves

hi, I just discovered that in your script, check for replication lag is done for one slave only, because the section with "if lag >= critical:..." is IN the for loop, not outside of it

I also added some additional printout, namely slave host IPs and lag values per IP, which may pinpoint what slave has replication problems, exactly.

The code is below, hope it helps.

    data = ";"
    for slave in slaves:
        lastSlaveOpTime = slave['syncedTo'].time
        replicationLag = lastMasterOpTime - lastSlaveOpTime
        data = data + slave["host"] + " lag=" + str(replicationLag) + "; "
        lag = max(lag, replicationLag)

    data = data[1:len(data)]

    if lag >= critical:
        print "CRITICAL - Max replication lag: %i [%s]" % (lag, data)
        sys.exit(2)
    elif lag >= warning:
        print "WARNING - Max replication lag: %i [%s]" % (lag, data)
        sys.exit(1)
    else:
        print "OK - Max replication lag: %i [%s]" % (lag, data)
        sys.exit(0)

Timur Evdokimov
http://www.jacum.com

Feature: Monitor Clock Skew

Iam missing the monitoring of the Clock Skew (for Replica Sets), but its an important metric which should be monitored.

Check plugin should be consistent in where it saves runtime data

From an operations point of view,it is recommended to give monitoring users no more privileges than the bare minimum they need for the job. I'm currently in the process of figuring out what is required, and am running into a rather annoying inconsistency:

The replica_primary check writes it's data in the nagios database. This is proper and not a problem, merely need to give the user readWrite on it.

The queries_per_second check, however, then tries to write it's data to a collection in the local database. Apart from consistency between checks, this would require a monitoring user to have write permission on a system database...

Would you take this into account in a future release, as well as document the necessary roles for each check?

Thank you,
Johan

mongodb user permissions

Hello,

I'm currently trying to setup the plugin to monitor a mongodb cluster where auth is enabled. Here's my question. Can you provide the information what priviledges the mongodb user must have to get this working?

regards

Replication status checks do not work for secondary - "Address family not supported by protocol".

Hi.

I am using:

/usr/local/nagios/plugins/check_mongodb.py -A replication_lag -H host1 -u xxx -p xxx -P 27017 -C 1 -W 1

OK - This is the primary.

and:

/usr/local/nagios/plugins/check_mongodb.py -A replication_lag -H host2 -u xxx -p xxx -P 27017 -C 1 -W 1
CRITICAL - General MongoDB Error: could not connect to host2:27017: [Errno 97] Address family not supported by protocol

I am using pymongo-2.1.1-1 on Centos 6.3 x86_64.

host1 is primary, host2 is secondary. Running db.runCommand({ replSetGetStatus: 1 }) from monitoring server for both hosts works ok.

Any clues?

Best regards,
Rafal Radecki.

Single Mongo Server support

Is there any chance for support for single mongo server setups so no using replicasets?

currently all test fail with "CRITICAL - General MongoDB Error: could not find master"

General MongoDB Error: 'module' object has no attribute 'MongoClient'

When I'm running:
/usr/lib/nagios/plugins/check_mongodb.py -H localhost -p 27017
I receive the following error:
CRITICAL - General MongoDB Error: 'module' object has no attribute 'MongoClient'

pymongo is installed using the package python-pymongo (v:2.1-ubuntu0.1)

Do you have any idea?

replication lag

One more issue:

./check_mongodb.py -H hostname -A replication_lag -W 15 -C 30
'me'
CRITICAL - General MongoDB Error: 'me'

deprecated warning about using safe parameter (mongo v2.4.6)

/usr/lib64/nagios/plugins/check_mongodb.py:1187: DeprecationWarning: The safe parameter is deprecated. Please use write concern options instead.
db.last_primary_server.update({"_id": "last_primary"}, {"$set": last_primary_server_record}, upsert=True, safe=True)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.