Monitoring OMD sites with check_omd

Using OMD it is possible to setup a working monitoring for checking infrastructure setups just within minutes.

But - what monitored OMD-relevant processes? Usually, a OMD site consists of multiple pre-configured services such as:

  • Icinga or Nagios
  • Apache web server
  • Cron service
  • rrdcached (for RRD graphs)
  • npcd (performance data)

For checking the functionality of a site, OMD offers a special command:

1# omd status hansel
2Doing 'status' on site hansel:
3rrdcached:      running
4npcd:           running
5nagios:         stopped
6apache:         running
7crontab:        running
8-----------------------
9Overall state:  partially running

In this case, one of the required services - Nagios - has crashed. As a result, the OMD site is only partially running.

Of course it is possible to monitor the particular services, e.g. using NRPE. If you're often playing around with OMD sites, it would be great to have a more simple solution.

I created a Python plugin for monitoring OMD sites: check_omd:

check_omd

The script monitores the particular services of a OMD site and reports failures. Because not all services are essential for a site, it is possible to ignore particular services. Because the script uses the omd status command internally, altering the configration after OMD upgrades or reconfigurations is not needed.

The script needs to be executed in the context of a site - this requires a sudo rule. You can find a RPM specfile and a sudo template in the GitHub repository.

The following examples demonstrate the plugin with two OMD sites: a working one, and a crashed site:

1$ /opt/check_omd.py
2OK: OMD site 'stankowic' services are running.
3
4$ /opt/check_omd.py
5CRITICAL: OMD site 'hansel' has failed service(s): 'nagios'

You can find detailed installation and configuration information on GitHub and the Icinga Exchange website.

Translations: