Code Monkey home page Code Monkey logo

pingcheck's Introduction

PingCheck

The purpose of this utility is to test ICMP reachability, alert if its down and run a config change. On recovery run another list of config changes.

Author

Jeremy Georges - Arista Networks - [email protected]

Description

PingCheck Utility

The purpose of this utility is to test ICMP reachability, alert if its down and run a configuration change. On recovery run another list of config changes specified in a recovery file.

The IP host test is a logical 'OR'. Therefore, if you have two IP Hosts listed, both of them need to fail in order for a config change to be triggered. Same situation if you have 3 IP hosts listed, in that all 3 will have to fail in order for a configuration change to be triggered. In other words, only 1 IP needs to be ICMP reachable to maintain a HealthStatus of GOOD, or not fail the health check.

This extension is similar to TCPCheck found here: https://github.com/arista-eosext/TCPCheck but PingCheck uses ICMP instead of HTTP.

To use PingCheck, after installing, add the following configuration snippets to change the default behavior.

daemon PingCheck
   exec /usr/local/bin/PingCheck
   option CHECKINTERVAL value 5
   option CONF_FAIL value /mnt/flash/failed.conf
   option CONF_RECOVER value /mnt/flash/recover.conf
   option PINGCOUNT value 2
   option PINGTIMEOUT value 2
   option HOLDDOWN value 1
   option HOLDUP value 1
   option IPv4 value 10.1.1.1,10.1.2.1
   option SOURCE value et1
   option VRF value mgmt
   no shutdown
Config Option explanation:
    - CHECKINTERVAL is the time in seconds to check hosts. Default is 5 seconds.
    - IPv4 is the address(s) to check. Mandatory parameter. Multiple addresses are comma separated
    - CONF_FAIL is the config file to apply the snippets of config changes. Mandatory parameter.
    - CONF_RECOVER is the config file to apply the snippets of config changes
      after recovery of Neighbor. Mandatory parameter.
    - PINGCOUNT is the number of ICMP Ping Request messages to send. Default is 2.
    - HOLDDOWN is the number of iterations to wait before declaring all hosts up. Default is 1
      which means take immediate action.
    - HOLDUP is the number of iterations to wait before declaring all hosts down. Default is 1
    - SOURCE is the source interface (as instantiated to the kernel) to generate the pings fromself.
      This is optional. Default is to use RIB/FIB route to determine which interface to use as sourceself.
    - VRF is the name of the VRF in which to source the pings. This is optional. For default VRF, no need
      to specify this option.
    - PINGTIMEOUT is the ICMP ping timeout in seconds. Default value is 2 seconds.

The CONF_FAIL and CONF_RECOVER files are just a list of commands to run at failure or at recovery. These commands should be full commands just as if you were configuration the switch from the CLI (i.e. not abbreviated commands).

For example the above referenced /mnt/flash/failed.conf file could include the following commands, which would shutdown the BGP neighbor on failure:

router bgp 65001.65500
neighbor 10.1.1.1 shutdown

The recover.conf file would do the opposite and remove the shutdown statement:

router bgp 65001.65500
no neighbor 10.1.1.1 shutdown

This is of course just an example, and your use case would determine what config changes you'd make.

Please note, this uses the EOS SDK eAPI interaction module. You do not need to specify 'enable' and 'configure' in your configuration files, because it automatically goes into configuration mode.

Additional dependencies and caveats:

  • This requires EOS SDK.
  • All new EOS releases include the SDK.
  • Tested on EOS release 4.24.1, 4.22.3, 4.20.10 & 4.20.1

Example

Output of 'show daemon' command

Agent: PingCheck (running with PID 14058)
Uptime: 0:37:46 (Start time: Tue Nov 06 16:32:34 2018)
Configuration:
Option              Value                   
------------------- ----------------------- 
CHECKINTERVAL       10                      
CONF_FAIL           /mnt/flash/failed.conf  
CONF_RECOVER        /mnt/flash/recover.conf 
HOLDDOWN            2                       
IPv4                10.1.1.2,10.1.1.6       
PINGCOUNT           3                       
SOURCE              lo0                     

Status:
Data                  Value                   
--------------------- ----------------------- 
CHECKINTERVAL:        10                      
CONF_FAIL:            /mnt/flash/failed.conf  
CONF_RECOVER:         /mnt/flash/recover.conf 
HOLDDOWN:             2                       
HealthStatus:         GOOD                    
IPv4 Ping List:       10.1.1.2,10.1.1.6       
PINGCOUNT:            3                       
Status:               Administratively Up   

Syslog Messages

Nov  6 16:31:27 edge-leaf-A Launcher: %LAUNCHER-6-PROCESS_STOP: Configuring process 'PingCheck' to stop in role 'ActiveSupervisor'
Nov  6 16:32:34 edge-leaf-A Launcher: %LAUNCHER-6-PROCESS_START: Configuring process 'PingCheck' to start in role 'ActiveSupervisor'
Nov  6 16:32:34 edge-leaf-A PingCheck-ALERT-AGENT[14058]: %AGENT-6-INITIALIZED: Agent 'PingCheck-PingCheck' initialized; pid=14058
Nov  6 16:32:34 edge-leaf-A PingCheck-ALERT-AGENT[14058]: PingCheck Initialized
Nov  6 16:32:34 edge-leaf-A PingCheck-ALERT-AGENT[14058]: Source Interface lo0 and Src IP 172.1.1.4 will be used.
.
After Host(s) goes down...
.
Nov  6 16:32:46 edge-leaf-A PingCheck-ALERT-AGENT[14058]: PingCheck host 10.1.1.2 is down
Nov  6 16:59:26 edge-leaf-A PingCheck-ALERT-AGENT[14058]: PingCheck host 10.1.1.6 is down
Nov  6 17:00:00 edge-leaf-A PingCheck-ALERT-AGENT[14058]: PingCheck Failure State. Changing configuration for failed state
Nov  6 17:00:00 edge-leaf-A PingCheck-ALERT-AGENT[14058]: Applied Configuration changes from /mnt/flash/failed.conf
.
After Recovery of Host...
.
Nov  6 17:00:25 edge-leaf-A PingCheck-ALERT-AGENT[14058]: PingCheck host 10.1.1.6 is back up
Nov  6 17:00:37 edge-leaf-A PingCheck-ALERT-AGENT[14058]: PingCheck host 10.1.1.2 is back up
Nov  6 17:00:39 edge-leaf-A PingCheck-ALERT-AGENT[14058]: PingCheck Recovering. Changing configure for recovered state.
Nov  6 17:00:39 edge-leaf-A PingCheck-ALERT-AGENT[14058]: Applied Configuration changes from /mnt/flash/recover.conf

INSTALLATION:

Because newer releases of EOS require a SysdbMountProfile, you'll need two files - PingCheck.py and PingCheck.sysdb PingCheck.py will need to go to an appropriate location such as /mnt/flash and PingCheck.sysdb will need to be placed in /usr/lib/SysdbMountProfiles. The mount profile file name MUST match the python file name. In other words, if you place the mount profile PingCheck in /usr/lib/SysdbMountProfiles as PingCheck, then the executable filename PingCheck.py must be changed to PingCheck. The filename (agent name) and mount profile name must be the same.

An RPM has been included that allows you to easily just install PingCheck as an extension and it takes care of all the file requirements. The RPM also installs the PingCheck SDK app in /usr/local/bin. This is the preferred distribution method for this application.

This release has been tested on EOS 4.24.1F,4.22.3 & 4.21.1F.

WHAT'S NEW:

  • RPM now supports both 32 & 64bit EOS
  • VRF support
  • Additional syntax checking

License

BSD-3, See LICENSE file

pingcheck's People

Contributors

archjeb avatar brokenpackets avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.