Admin:Nagios

Nagios is used to monitor services for problems. In case something breaks, Nagios will notify administrators. It also logs to the IRC channel (#wikimedia-toolserver).

Nagios runs on the HA cluster under the 'nagios' resource group. It has a strong positive resource affinity with the 'www' group, which means it tries to start on the same node as the HA web server. This is necessary for the web interface to work.

The configuration is in /global/misc/nagios/etc. To restart it:

(test the new configuration first)
 * 1) /opt/ts/nagios/bin/nagios -v /global/misc/nagios/etc/nagios.cfg
 * 2) clrs disable nagios
 * 3) clrs enable nagios