angoca / monitor-db2-with-nagios Goto Github PK

View Code? Open in Web Editor NEW

19.0 7.0 8.0 1.05 MB

Set of plugins / scripts to monitor DB2 from Nagios.

License: Apache License 2.0

Shell 100.00%

monitor-db2-with-nagios's Introduction

monitor-db2-with-nagios

Welcome to the monitor-db2-with-nagios project!

Here you will find the sources, the wiki, and a bug tracker.

This project aims to provide a set of open source tools to monitor DB2. The monitoring is limited to control that a value is between a range. The output of the scripts, allows you to create graphs and see the behavior of the monitored elements.

Each script is autonomous, it means there are not dependencies between files, and any modification of the behaviour will be and affect the script.

The template file is provided to create new scripts based on it. It was written in a way that you just need to fill the TODO with what you want to monitor.

For more information about how to use these scripts, please visit the Wiki: [https://github.com/angoca/monitor-db2-with-nagios/wiki]

If you have seen a problem or you have any comments, please feel free to open an issue and tell us your issue: [https://github.com/angoca/monitor-db2-with-nagios/issues]

References:

Nagios plug-in development guidelines. http://nagiosplug.sourceforge.net/developer-guidelines.html
Nagios Plugin API. http://nagios.sourceforge.net/docs/3_0/pluginapi.html
Nagios Plugins. http://nagios.sourceforge.net/docs/3_0/plugins.html

monitor-db2-with-nagios's People

Contributors

Stargazers

Watchers

Forkers

rameshsraj 20rostantino12 opsgit cilesiz nubbeldupp wanchao1123 stefanhummel rmfnogueira

monitor-db2-with-nagios's Issues

Monitoring snapshots

Se pueden monitorear varios elementos de db2 por medio de snapshots diarios o por hora, de manera que se vea en qué momento hay cargas de cierto tipo.

Se puede usar el artículo http://www.ibm.com/developerworks/data/library/techarticle/dm-1009db2monitoring1/index.html?ca=drs- para monitorear eso.

El objetivo es tomar info de bufferpools, tablespaces y extraer la info para ver el uso. Con el artículo se estaría guardando la info instantánea, y con nagios se estaría graficando la diferencia de snapshots.

Trace -files with instance and database name

Los archivos de trace deben tener en el nombre del archivo la instancia y la base de datos si aplica, para no mezclar archivos.

Monitor ADMIN_MOVE_TABLES

Tener una sonda que monitoree los estados del comando ADMIN_MOVE_TABLE
Con esto permite determinar procesos que se trabaron pero nunca se cancelaron
Procesos demorados

Se podría hacer un query en la tabla SYSTOOLS.ADMIN_MOVE_TABLE

db2 "select * from SYSTOOLS.ADMIN_MOVE_TABLE where key = 'LOCK' "

Además de un select en la tabla de los semáforos: SYSTOOLS.OTM_SEMAPHORE_TABLE

esto permite tener una idea de cómo se está usando esa utilidad.

Monitor memory pools

select pool_id, pool_secondary_id, pool_cur_size, pool_watermark
from sysibmadm.snapdd_memory_pool

Requirements

Completar los "requerimietnos" de este foro
http://database.ittoolbox.com/groups/technical-functional/db2-l/3rd-party-db-monitoring-software-4870118#M4897393

El conjunto de plugins debe ofrecer esto.

Script to monitor

Estos scripts pueden tener informacion interesante

db2 -x "select TOTAL_LOG_USED,TOTAL_LOG_AVAILABLE,SEC_LOG_USED_TOP,LOCKS_HELD,LOCK_WAITS,LOCK_WAIT_TIME,LOCK_LIST_IN_USE,DEADLOCKS,LOCK_ESCALS,X_LOCK_ESCALS,LOCKS_WAITING,SORT_HEAP_ALLOCATED,SEC_LOGS_ALLOCATED,DB_STATUS,LOCK_TIMEOUTS
FROM TABLE(SNAPSHOT_DATABASE('$DB', -1))" | xargs echo | awk '{ print "\nTOTAL_LOG_USED=" $1 "\nTOTAL_LOG_AVAILABLE=" $2 "\nSEC_LOG_USED_TOP=" $3 "\nLOCKS_HELD=" $4 "\nLOCK_WAITS=" $5 "\nLOCK_WAIT_TIME=" $6 "\nLOCK_LIST_IN_USE=" $7 "\nDEADLOCKS=" $8 "\nLOCK_ESCALS=" $9 "\nX_LOCK_ESCALS=" $10 "\nLOCKS_WAITING=" $11 "\nSORT_HEAP_ALLOCATED=" $12 "\nSEC_LOGS_ALLOCATED=" $13 "\nDB_STATUS=" $14 "\nLOCK_TIMEOUTS=" $15}'

db2 -x "WITH BPMETRICS AS (
SELECT bp_name,
pool_data_l_reads + pool_temp_data_l_reads +
pool_index_l_reads + pool_temp_index_l_reads +
pool_xda_l_reads + pool_temp_xda_l_reads as logical_reads,
pool_data_p_reads + pool_temp_data_p_reads +
pool_index_p_reads + pool_temp_index_p_reads +
pool_xda_p_reads + pool_temp_xda_p_reads as physical_reads,
member
FROM TABLE(MON_GET_BUFFERPOOL('',-2)) AS METRICS)
SELECT
VARCHAR(bp_name,20) AS bp_name,
CASE WHEN logical_reads > 0
THEN DEC((1 - (FLOAT(physical_reads) / FLOAT(logical_reads))) * 100,5,2)
ELSE NULL
END AS HIT_RATIO
FROM BPMETRICS" | tr '-' '0' | awk '{print $1"="$2}'

db2 -x "select TABLESPACE_NAME,(USED_PAGES),USABLE_PAGES from table (snapshot_tbs_cfg('$1', 0))" | awk '{print $1"="$2/($3+1)*100"%"}'

HADR script

El script de HADR está devolviendo una salida vacía cuando el estado es desconocido. Posiblemente el debido a grep.

Mensaje recibido:

Database is primary and not peer:  ().

Código:

  else
    OUTPUT="Database is primary and not peer: $HADR_STATUS ($CONNECTED)."
    RETURN=$UNKNOWN
  fi

Como se puede ver $HADR_STATUS y $CONNECTED están vacías.

HADR_STATUS=`printf '%s\n' "$OUTPUT_HADR" | awk '/^Primary / {print $2}'`
CONNECTED=`printf '%s\n' "$OUTPUT_HADR" | awk '/onnected/ {print $1}' | tail -1`

Si algunas de esas cadenas es vacía, mostrar la salida completa del comando:

COMMAND_HADR="db2pd -db wfscpd -hadr"

Log files usage when log full is not due to a big transaction

Sometimes, the online directory becomes full because it cannot externalize the files, and the file system is full. In this condition, the message should not show a handle

./bin/db2_local_ps for instance check

Este es otro comando que permite ver si la instancia esta funcionando

ps -ef | grep db2sysc
para ver si el proceso esta activo

netstat | grep servName
para ver que esta escuchando por el puerto

./bin/db2_local_ps
Para ver que todos los procesos asociados con DB2 estan bien iniciados, en una instancia dada

Script for custom script

Crear una sonda que permita lanzar un script y verificar el output.

Recibiría como parámetro el query a ejecutar, y los valores de warning y crítical. El problema es si nos valores no son dentro de un rango.

Para la parte de performance habría problemas. Se podría devolver el valor obtenido, y los límites no se mostrarían.

Check the expiration date of the passwords

La seguridad de DB2 reposa en el sistema operativo o Kerberos.
Como los usuarios en Linux puede caducar, y por tal motivo una aplicación deja de funcionar, sería interesante tener un script que verifique la validez de las contraseñas.

Warning si está a punto de caducar.
Critical si ya caducó.

Table size monitoring

Hacer una que monitoree el tamaño de una tabla dada
Puede incluir valores de pagesve

Se podría hacer otra sonda que monitoree la cantidad de registros de una tabla.

I am unable to minitor db2 databases by using Nagios tool

Hi Team,

I have a db2 production system which is being monitored by Nagios tool and it is configured by other teammate who is not here .
I tried by fetching the script like disk_utilization and some other scripts but I don't find in the server where we installed Nagios but database level monitoring scripts are present in database machine but unfortunately those are present in two different locations.
I would like to find from which path those are monitoring.

Kindly let me know If you need any additional information.

Thanks In Advance !!!

Applications states

These are all applications states

http://www-01.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.sql.rtn.doc/doc/r0022011.html?cp=SSEPGG_10.5.0%2F3-6-1-4-14&lang=en

Monitor catalog cache

(1 - (cat_cache_inserts / cat_cache_lookups))

More recent backup

Hacer una sonda que monitoree que tan viejo es el más reciente backup.

La gráfica mostraría tres líneas, una para backups full, otra para incrementales y otra para deltas.De esta manera se puede ver qué tan viejos son los backups y reaccionar si es necesario.

Se podría cuadrar un límite de warning si durante un día no hay backup de ninguna índole, critical si se pasan dos días.

De manera similar, un warning si pasa más de una semana sin backup full, critical semana y media.

La gráfica sería en dientes de sierra, y se sobrelaparían las tres líneas (los tres tipos de backup) después de un backup full.

Log usage standard input error

Cuando se tiene una situación especial (creo que cuando la instancia está abajo), db2 devuelve mensajes que no corresponden, y el script genera errores.

Mensaje recibido:

Remote command execution failed: (standard_in) 1: syntax error

El código impactado debe ser el ciclo para recorrer el output

while IFS= read -r LINE ; do
done < <( printf '%s\n' "$COMMAND_OUTPUT" | grep -i log )

El resto de código es normal, y se ha probado ante varias situaciones.

Hacer una verificación que no es vacío, antes de continuar.

Log utilization

Cambiar la sonda, o crear una adicional, con la siguiente informacion

db2 select * from sysibmadm.log_utilization

Output with the same pattern

Escribir los output de la misma manera, para que sea más rápido identificar problemas
OK - Descripción (Problema)
WARN - Descripción (Problema)

Monitor lock table scalations and table locks

Monitorear estos dos elementos

La cantidad de escalaciones de lock en la última hora (se están manteniendo muchos candados)
La cantidad de candados a nivel de tabla (permite ver que se está reduciendo la concurrencia)

Log consumption output

El output de este script en el mail debería tener punto final.

if [[ $HADR == true ]] ; then
  OUTPUT="HADR does not perform archiving"
  RETURN=$OK
else
  OUTPUT="Archive logs counted"
  RETURN=$OK

La salida debería mostrar un mensaje más claro que deje la impresión que no hay error.

Explotar esta documentacion

https://www.ibm.com/developerworks/community/blogs/IMSupport/entry/weekly_tips_from_db2_experts_maintaining_mon_get_pkg_cache_stmt_result_into_a_table?lang=en

Validate db2 database structure

Validar la estructura de directorios y archivos de db2

Algunas veces se cambian los permisos o se mueven los directorios y puede afectar el funcionamiento de db2
Verififcar esto
archivelogs
backups
onlinelog
mirrorlog
dbfiles
containers

Ignore connections for activate databases

Cuando se activa una base de datos, o conla primera conexion, hay varias conexiones extras en list applications. Estas se deberian ignorar porque hacen parte de la activacion

Closed files

Monitorear la cantidad de archivos abiertos, y la frecuencia de archivos cerrados.
Esto permite ver si se tiene un maxfilop muy pequeño

Check event monitor state

Due to space or other reasons, an event monitor can be in a state off when wanting to be on. This check will respond the state of the event monitor.

db2diag check

Chequear el tamaño en KB
La cantidad de líneas en K
La cantidad de espacios en blanco (mensajes)

check log waits with admin view

select * from sysibmadm.lockwaits

Elapse CPU time

Monitorear los procesos que llevan más de cierto tiempo en CPU

List of open files

Asi se puede obtener la cantidad de archivos abiertos

ps -ef | awk '/db2sysc/ && /db2ins01/ && ! /awk/ {print "lsof -p",$2}' | source /dev/stdin

ps -ef | awk '/db2sysc/ && /db2ins01/ && ! /awk/ {print "ls -l /proc/"$2"/fd"}' | source /dev/stdin

Documentation

Este link explica varios de los valores que se deben monitorear en DB2.

Inlcuirlo en la documentación, y si es necesario, hacer varios de los scripts que propone.

Monitor the db2 processes

Monitorear la cantidad de procesos
CAntidad ejecutados por la instancia y por el usuario fenced

ps -ef | grep ^db2inst1 | grep -v grep
ps -ef | grep ^db2fenc1 | grep -v grep

No connections in check_connection_qty

Cuando no hay nadie conectado a las bases de datos de una instancia, y se devuelve el error SQL1611W, no se deberia devolver uknown sino ok.
No es un error, sino que no hay nadie conectado.

Put umask in the scripts

Poner umask para permitir que los archivos sean modificados por varios procesos.

check intance attach

Realizar un attach en el script check instance
Esto permite ver que la instancia esta en buen funcionamiento

Problem running plugins on AIX

I tried the scripts on AIX but i can't get them running. AIX Version 7.1 TL2 SP4, Bash 3.2.1, getopt-1.1.4-3

Here is the output with -vvv

./check_database_connection -i /db2/db2ez1 -d EZ1 -vvv
Usage: check_database_connection { -i instanceHomeDirectory -d databaseName [-K]
| -h | -V } [-T][-v]
Note: The test was not executed.|
|

Here the output with debug on

db2ez1> ./check_database_connection -i /db2/db2ez1 -d EZ1 -vvv

Locale to print messages in English. Prevent language problems.

export LANG=en_US

export LANG=en_US
LANG=en_US

Version of this script.

function print_revision {
echo Andres Gomez Casanova - AngocA
echo v1.1 2013-05-25
}

Function to show the help

function print_usage {
/bin/cat <<__EOT
Usage: ${1} { -i instanceHomeDirectory -d databaseName [-K]
| -h | -V } [-T][-v]
__EOT
}

function print_help {
print_revision
print_usage ${1}

Max 80 chars width.

/bin/cat <<__EOT

This script checks the connectivity to a database.
-d | --database STRING
Database name.
-h | --help
Shows the current documentation.
-i | --instance STRING
Instance home directory. It is usually /home/db2inst1.
-K | --mk
Changes the output to be compatible with Check_MK.
-T | --trace
Trace mode: writes date and output in /tmp.
-v | --verbose
Executes the script in verbose mode (multiple times).
-V | --version
Shows the current version of this script.
__EOT
}

Variable to control the flow execution. Prevent Spaghetti code.

CONTINUE=true

CONTINUE=true

Nagios return codes

OK=0

OK=0
WARNING=1
WARNING=1
CRITICAL=2
CRITICAL=2
UNKNOWN=3
UNKNOWN=3
This is the returned code.
RETURN=${UNKNOWN}
RETURN=3

Nagios Output

Text output 80 chars | Optional Perf Data Line 1

Long text Line 1

Long text Line 2 | Optional Perf Data Line 2

Optional Perf Data Line 3

OUTPUT=

OUTPUT=
PERFORMANCE=
PERFORMANCE=
LONG_OUTPUT=
LONG_OUTPUT=
LONG_PERFORMANCE=
LONG_PERFORMANCE=
PERF_MK="-"
PERF_MK=-

APPL_NAME=$(basename ${0})
basename ${0}
++ basename ./check_database_connection

APPL_NAME=check_database_connection

if [[ ${#} -eq 0 ]] ; then
print_usage ${APPL_NAME}
RETURN=${UNKNOWN}
CONTINUE=false
fi

[[ 5 -eq 0 ]]

The following requieres GNU getopt. See the following discusion.

http://stackoverflow.com/questions/402377

TEMP=$(getopt -o d:hi:KTvV --long database:,help,instance:,mk,trace,verbose,version
-n ${APPL_NAME} -- "${@}")
getopt -o d:hi:KTvV --long database:,help,instance:,mk,trace,verbose,version -n ${APPL_NAME} -- "${@}"
++ getopt -o d:hi:KTvV --long database:,help,instance:,mk,trace,verbose,version -n check_database_connection -- -i /db2/db2ez1 -d EZ1 -vvv

TEMP='-- d:hi:KTvV --long database:,help,instance:,mk,trace,verbose,version -n check_database_connection -- -i /db2/db2ez1 -d EZ1 -vvv '

if [[ ${?} -ne 0 ]] ; then
print_usage ${APPL_NAME}
RETURN=${UNKNOWN}
CONTINUE=false
fi

[[ 0 -ne 0 ]]

if [[ ${CONTINUE} == true ]] ; then

Note the quotes around ${TEMP}: they are essential!

eval set -- "${TEMP}"

HELP=false
VERSION=false
CHECK_MK=false

Verbosity level

VERBOSE=0

Trace activated

TRACE=false
LOG=/tmp/${APPL_NAME}.log
INSTANCE_HOME=
DATABASE_NAME=
while true; do
case "${1}" in
-d | --database ) DATABASE_NAME=$(echo ${2} | cut -d' ' -f1) ; shift 2 ;;
-h | --help ) HELP=true ; shift ;;
-i | --instance ) INSTANCE_HOME=$(echo ${2} | cut -d' ' -f1) ; shift 2 ;;
-K | --mk ) CHECK_MK=true ; shift ;;
-T | --trace ) TRACE=true ; shift ;;
-v | --verbose ) VERBOSE=$(( ${VERBOSE} + 1 )) ; shift ;;
-V | --version ) VERSION=true ; shift ;;
-- ) shift; break ;;
* ) break ;;
esac
done
fi

[[ true == true ]]
eval set -- '-- d:hi:KTvV --long database:,help,instance:,mk,trace,verbose,version -n check_database_connection -- -i /db2/db2ez1 -d EZ1 -vvv '
set -- -- d:hi:KTvV --long database:,help,instance:,mk,trace,verbose,version -n check_database_connection -- -i /db2/db2ez1 -d EZ1 -vvv
++ set -- -- d:hi:KTvV --long database:,help,instance:,mk,trace,verbose,version -n check_database_connection -- -i /db2/db2ez1 -d EZ1 -vvv
HELP=false
VERSION=false
CHECK_MK=false
VERBOSE=0
TRACE=false
LOG=/tmp/check_database_connection.log
INSTANCE_HOME=
DATABASE_NAME=
true
case "${1}" in
shift
break

if [[ ${TRACE} == true ]] ; then
echo ">>>>>" >> ${LOG}
date >> ${LOG}
echo "Instance at ${INSTANCE_HOME}" >> ${LOG}
echo "PID ${$}" >> ${LOG}
fi

[[ false == true ]]

ECHO="help:${HELP}, version:${VERSION}, verbose:${VERBOSE}"

ECHO='help:false, version:false, verbose:0'
ECHO="${ECHO}, check_mk:${CHECK_MK}"
ECHO='help:false, version:false, verbose:0, check_mk:false'
ECHO="${ECHO}, directory:${INSTANCE_HOME}, database:${DATABASE_NAME}"
ECHO='help:false, version:false, verbose:0, check_mk:false, directory:, database:'

if [[ ${VERBOSE} -ge 2 ]] ; then
echo ${ECHO}
fi

[[ 0 -ge 2 ]]

if [[ ${TRACE} == true ]] ; then
echo "PARAMS:${ECHO}" >> ${LOG}
fi

[[ false == true ]]

if [[ ${CONTINUE} == true && ${HELP} == true ]] ; then
print_help ${APPL_NAME}
RETURN=${UNKNOWN}
CONTINUE=false
fi

[[ true == true ]]
[[ false == true ]]

if [[ ${CONTINUE} == true && ${VERSION} == true ]] ; then
print_revision ${APPL_NAME}
RETURN=${UNKNOWN}
CONTINUE=false
fi

[[ true == true ]]
[[ false == true ]]

if [[ ${CONTINUE} == true && ${INSTANCE_HOME} == "" ]] ; then
print_usage ${APPL_NAME}
RETURN=${UNKNOWN}
CONTINUE=false
fi

[[ true == true ]]
[[ '' == '' ]]
print_usage check_database_connection
/bin/cat
Usage: check_database_connection { -i instanceHomeDirectory -d databaseName [-K]
| -h | -V } [-T][-v]
RETURN=3
CONTINUE=false

if [[ ${CONTINUE} == true && ${DATABASE_NAME} == "" ]] ; then
print_usage ${APPL_NAME}
RETURN=${UNKNOWN}
CONTINUE=false
fi

[[ false == true ]]

if [[ ${CONTINUE} == true ]] ; then
if [[ -d ${INSTANCE_HOME} && -e ${INSTANCE_HOME}/sqllib/db2profile ]] ; then
# Load the DB2 profile.
. ${INSTANCE_HOME}/sqllib/db2profile
INSTANCE_NAME=$(db2 get instance | awk '/instance/ {print $7}')
else
OUTPUT="Instance directory is invalid."
RETURN=${UNKNOWN}
CONTINUE=false
fi
fi

[[ false == true ]]

if [[ ${CONTINUE} == true ]] ; then
COMMAND_DATABASE="db2 list db directory"
if [[ ${VERBOSE} -ge 2 ]] ; then
echo "COMMAND: ${COMMAND_DATABASE}"
fi
DATABASE=$(${COMMAND_DATABASE})
if [[ ${TRACE} == true ]] ; then
echo "RESULT:'${DATABASE}'" >> ${LOG}
fi
DATABASE=$(printf '%s\n' "${DATABASE}" | awk '/Database alias/ {print $4}' | grep -iw ${DATABASE_NAME})
if [[ ${VERBOSE} -ge 3 ]] ; then
echo "RESULT:'${DATABASE}'"
fi

if [[ ${DATABASE} == "" ]] ; then
OUTPUT="The database ${DATABASE_NAME} is not cataloged."
RETURN=${UNKNOWN}
CONTINUE=false
fi
fi

[[ false == true ]]

if [[ ${CONTINUE} == true ]] ; then
COMMAND_ACTIVE="db2 list active databases"
if [[ ${VERBOSE} -ge 2 ]] ; then
echo "COMMAND: ${COMMAND_ACTIVE}"
fi
ACTIVE=$(${COMMAND_ACTIVE})
if [[ ${TRACE} == true ]] ; then
echo "RESULT:'${ACTIVE}'" >> ${LOG}
fi
ACTIVE=$(printf '%s\n' "${ACTIVE}" | awk '/Database name/ {print $4}' | grep -iw ${DATABASE_NAME})
if [[ ${VERBOSE} -ge 3 ]] ; then
echo "RESULT:'${ACTIVE}'"
fi

if [[ ${ACTIVE} == "" ]] ; then
OUTPUT_ACTIVE="The database is not active. "
LONG_OUTPUT="${OUTPUT_ACTIVE}"
LONG_PERFORMANCE_1="'Database_Active'=0.2;0.5"
else
OUTPUT_ACTIVE="The database is active. "
LONG_OUTPUT="${OUTPUT_ACTIVE}"
LONG_PERFORMANCE_1="'Database_Active'=0.8;0.5"
fi

COMMAND_CONNECTABLE="db2 -a connect to ${DATABASE_NAME}"
if [[ ${VERBOSE} -ge 2 ]] ; then
echo "COMMAND: ${COMMAND_CONNECTABLE}"
fi
CONNECTABLE=$(${COMMAND_CONNECTABLE})
if [[ ${TRACE} == true ]] ; then
echo "RESULT:'${CONNECTABLE}'" >> ${LOG}
fi
CONNECTABLE=$(printf '%s\n' "${CONNECTABLE}" | awk '/sqlcode/ {print $7}')
if [[ ${VERBOSE} -ge 3 ]] ; then
echo "RESULT:'${CONNECTABLE}'"
fi

if [[ ${CONNECTABLE} -eq 0 ]] ; then
OUTPUT="OK Connection to database ${DATABASE_NAME}. "${OUTPUT_ACTIVE}
RETURN=${OK}
PERFORMANCE="'Connectable_Database'=0.9;0.6;0.3"
elif [[ ${CONNECTABLE} -eq -20157 ]] ; then
OUTPUT="The database is in quiesce mode. "${OUTPUT_ACTIVE}
RETURN=${WARNING}
PERFORMANCE="'Connectable_Database'=0.4;0.6;0.3"
else
OUTPUT="A connection to database ${DATABASE_NAME} was not succesful. "${OUTPUT_ACTIVE}
LONG_OUTPUT="${LONG_OUTPUT} ${CONNECTABLE}"
RETURN=${CRITICAL}
PERFORMANCE="'Connectable_Database'=0.1;0.6;0.3"
fi

Check for HADR Window replay

COMMAND_ROLE="db2 get db cfg for ${DATABASE_NAME}"
if [[ ${VERBOSE} -ge 2 ]] ; then
echo "COMMAND: ${COMMAND_ROLE}"
fi
ROLE=$(${COMMAND_ROLE})
if [[ ${TRACE} == true ]] ; then
echo "RESULT:'${ROLE}'" >> ${LOG}
fi
ROLE=$(printf '%s\n' "${ROLE}" | awk '/HADR database role/ {print $5}')
if [[ ${VERBOSE} -ge 3 ]] ; then
echo "RESULT:'${ROLE}'"
fi
if [[ ${ROLE} == "STANDBY" ]] ; then
COMMAND_REPLAY="db2pd -db wfscpd -hadr"
if [[ ${VERBOSE} -ge 2 ]] ; then
echo "COMMAND: ${COMMAND_REPLAY}"
fi
REPLAY=$(${COMMAND_REPLAY})
if [[ ${TRACE} == true ]] ; then
echo "RESULT:'${REPLAY}'" >> ${LOG}
fi
REPLAY=$(printf '%s\n' "${REPLAY}" | awk '/^Active/ {print "active"}')
if [[ ${VERBOSE} -ge 3 ]] ; then
echo "RESULT:'${REPLAY}'"
fi
if [[ ${REPLAY} == "active" ]] ; then
LONG_PERFORMANCE_2="HADR-replay=0.3;0.5"
else
LONG_PERFORMANCE_2="HADR-replay=0.7;0.5"
fi
fi
LONG_PERFORMANCE="${LONG_PERFORMANCE_1} ${LONG_PERFORMANCE_2}"
if [[ ${LONG_PERFORMANCE_2} == "" ]] ; then
PERF_MK="${PERFORMANCE}|${LONG_PERFORMANCE_1}"
else
PERF_MK="${PERFORMANCE}|${LONG_PERFORMANCE_1}|${LONG_PERFORMANCE_2}"
fi
fi

[[ false == true ]]

Prints the output.

if [[ ${OUTPUT} == "" ]] ; then
OUTPUT="Note: The test was not executed."
fi

[[ '' == '' ]]
OUTPUT='Note: The test was not executed.'
Builds the output.
if [[ ${CHECK_MK} == true ]] ; then
echo "${RETURN} databaseConnection-${INSTANCE_NAME}-${DATABASE_NAME} ${PERF_MK} ${OUTPUT}"
else
echo -e "${OUTPUT}|${PERFORMANCE}\n${LONG_OUTPUT}|${LONG_PERFORMANCE}"
fi
[[ false == true ]]
echo -e 'Note: The test was not executed.|\n|'
Note: The test was not executed.|
|
Returns the error code.
if [[ ${VERBOSE} -ge 2 ]] ; then
echo "Return code: ${RETURN}"
fi
[[ 0 -ge 2 ]]
if [[ ${TRACE} == true ]] ; then
echo -e "OUTPUT:${OUTPUT}\nPERF:${PERFORMANCE}\nLONG_OUT:${LONG_OUTPUT}\nLONGPERF:${LONG_PERFORMANCE}\nRET_CODE:${RETURN}" >> ${LOG}
date >> ${LOG}
echo -e "<<<<<\n" >> ${LOG}
fi
[[ false == true ]]
exit ${RETURN}
exit 3

Best regards
Stephan

check_instance_memory: Database connection errors overwritten in loop

In the loop that checks for SELECT and CONNECT errors, the LONG_OUTPUT variable is overwritten on each pass. This means that if more than one database has errors, only the last database's errors are captured.

https://github.com/angoca/monitor-db2-with-nagios/blob/master/check_instance_memory#L392-L421

Use service parents

Thi si s a new feature in Nagios, and this allows to check other elements before others.

For example, a TS check should be passe, if the connection check is passed.

check lock wait should check if db2 is replaying

Cuando db2 esta en modo replay, no es necesario chequear en hadr, ya que no hay candados.
Retornar OK y listo
Hacer esto haciendo un db2 list applications, ya que un get snapshot es mas pesado

Long log wait results

Esta sonda tiene que mostrar:

Cantidad de procesos con candados demorados
Cantidad de conexiones esperando. Probablemente sería una sonda diferente a la anterior

Quantity of deadlock

Monitorear la cantidad de deadlocks detectados

De pronto mostrar promedios de los últimos 10 mins, 30 mins y hora

pass credentials

is there a way to run one of the check commands but use a user/password in the service command line?

Check changes of /var/db2

This file should be checked from changes.
The same for db2nodes.cfg

Monitor the quantity of EDUs

db2pd -edus | sed 's/[()]/ /g' | awk '/db2/ {print $3,$4}' | grep db2 | sort | uniq -c

Sticky bit T in documentation

Poner en la documentacion acerca del sticky bit T en el directorio /tmp para poder escribir alla

Quantity of timeouts

Contar la cantidad de timeouts

Mostrar promedios de 5, 10 y 30 mins

check que log configuration in HADR

Check that both machines have the same configuration

Set of views to monitor

This is an article with DDL to monitor DB2 with functions and many other things.

Monitor the minimal quantity of connections

Sometimes, a web server should keep a minimun quantity of connections to the database. When this minimum quantity is not satisfied, it should rise an error.

That means that the check_connection_qty should have a maximal (already existing) and a minimun by default 0.

Eventually, the ranges should be implemented with the Nagios notation (a little difficult difficult) but we can assure the well fonctionning of the database from the application server.

Fred's presentation

http://www.iiug.org/idug06/a12.pdf

Key DB2 performance indicators
• Bufferpool hit ratio
• Page-level I/O stats
• Prefetch efficiency
• Piped vs. overflowed sorts
• Statements per transaction
• Average lock wait time
• O/S level CPU load average
• Database files closed
• Stolen agents
• Package cache overflows
• Secondary log files open
• Longest running UOW
• Rows read per statement
• O/S level iowait percentage
…and many others as well.
Learn which ones are problems in your shop and watch them.
Don’t forget to monitor your application response time, too.