Icinga, nagios and other monitoring tools can monitor a specified daemon or process running. Though they can monitor the icinga or nagios daemon and check that they are running, what would happen if icinga or nagios daemon themselves stop.
Monit is capable of monitoring a daemon by checking a specified process or port running and restarting the daemon or even stopping it.
"Monit is a free open source utility for managing and monitoring, processes, programs, files, directories and filesystems on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations." MONIT Official
I'd like to introduce about installing monit first, and how to monitor icinga with monit then.
The configurations are released on my github, here.
Reference
Install monit
- setup rpmforge repository
# rpm -ivh http://pkgs.repoforge.org/rpmforge-release/rpmforge-release-0.5.2-2.el6.rf.x86_64.rpm
# sed -i 's/enabled = 1/enabled = 0/' /etc/yum.repos.d/rpmforge.repo
# yum -y --enablerepo=rpmforge install monit
# monit -V
This is Monit version 5.3.2
Copyright (C) 2000-2011 Tildeslash Ltd. All Rights Reserved.
Configuration
- /etc/monitrc (monit control file)
Please see the official documentation if you need further information about monit control file.
The set alert directive below means that monit sends alert if it matches the actions except for from checksum to timestamp.
# cat > /etc/monitrc << EOF
set daemon 120 with start delay 30
set logfile /var/log/monit/monit.log
## Sending E-mail, put off the comment below
set mailserver localhost
set alert username@domain not {
checksum
content
data
exec
gid
icmp
invalid
fsflags
permission
pid
ppid
size
timestamp
#action
#nonexist
#timeout
}
mail-format {
from: monit@$HOST
subject: Monit Alert -- $SERVICE $EVENT --
message:
Hostname: $HOST
Service: $SERVICE
Action: $ACTION
Date/Time: $DATE
Info: $DESCRIPTION
}
set idfile /var/monit/id
set statefile /var/monit/state
set eventqueue
basedir /var/monit
slots 100
set httpd port 2812 and
allow localhost
allow 192.168.0.0/24
allow admin:monit
include /etc/monit.d/*.conf
EOF
# mkdir /var/log/monit
# cat > /etc/logrotate.d/monit <<EOF
/var/log/monit/*.log {
missingok
notifempty
rotate 12
weekly
compress
postrotate
/usr/bin/monit quit
endscript
}
EOF
- setup include file (service entry statement)
The following is example of monitoring ntpd.
# cat > /etc/monit.d/ntpd.conf
check process ntpd
with pidfile "/var/run/ntpd.pid"
start program = "/etc/init.d/ntpd start"
stop program = "/etc/init.d/ntpd stop"
if 3 restarts within 3 cycles then alert
EOF
# monit -t
Control file syntax OK
Start up
- run monit from init
It is enable to run monit from init script, but I want to make it certain of always having a running Monit daemon on the system.
# cat >> /etc/inittab <<EOF
mo:2345:respawn:/usr/bin/monit -Ic /etc/monitrc
EOF
# telinit q
# tail -f /var/log/messages
May 13 12:34:35 ha-mgr02 init: Re-reading inittab
# ps awuxc | grep 'monit'
root 1431 0.0 0.0 57432 1876 ? Ssl 11:38 0:00 monit
- stop monit process and check that init begins monit
# kill `pgrep monit` ; ps cawux | grep 'monit'
root 13661 0.0 0.0 57432 1780 ? Ssl 13:31 0:00 monit
# show status
Process 'ntpd'
status Running
monitoring status Monitored
pid 32307
parent pid 1
uptime 12d 17h 44m
children 0
memory kilobytes 5040
memory kilobytes total 5040
memory percent 0.2%
memory percent total 0.2%
cpu percent 0.0%
cpu percent total 0.0%
data collected Sun, 13 May 2012 12:34:35
System 'system_ha-mgr02.forschooner.net'
status Running
monitoring status Monitored
load average [0.09] [0.20] [0.14]
cpu 1.6%us 3.2%sy 0.3%wa
memory usage 672540 kB [32.6%]
swap usage 120 kB [0.0%]
data collected Sun, 13 May 2012 12:32:35
# monit summary
The Monit daemon 5.3.2 uptime: 58m
Process 'sshd' Running
Process 'ntpd' Running
System 'system_ha-mgr02.forschooner.net' Running
Start up from upstart
As RHEL-6.x and CentOS-6.x adopts upstart, it is necessary to use upstart but for init with those OS.
- setup /etc/init/monit.conf
# monit_bin=$(which monit)
# cat > /etc/init/monit.conf << EOF
# monit respawn
description "Monit"
start on runlevel [2345]
stop on runlevel [!2345]
respawn
exec $monit_bin -Ic /etc/monit.conf
EOF
- show a list of the known jobs and instances
# initctl list
rc stop/waiting
tty (/dev/tty3) start/running, process 1249
...
monit stop/waiting
serial (hvc0) start/running, process 1239
rcS-sulogin stop/waiting
# initctl start monit
monit start/running, process 6873
- see the status of the job(monit)
# initctl status monit
monit start/running, process 6873
# kill `pgrep monit`
- check that upstart begins monit
# ps cawux | grep monit
root 7140 0.0 0.1 7004 1840 ? Ss 21:42 0:00 monit
- see the log file that monit is respawning
# tail -1 /var/log/messages
Oct 20 12:42:41 ip-10-171-47-212 init: monit main process ended, respawning
Verification
- access to the monit service manager (http://IP Address:2812)
- check ntp daemon starts if it stops
# /etc/init.d/ntpd status
ntpd (pid 32307) is running...
# /etc/init.d/ntpd stop
Shutting down ntpd: [ OK ]
- see the log file that monit starts ntpd
# cat /var/log/monit/monit.log
[JST May 13 12:52:24] error : 'ntpd' process is not running
[JST May 13 12:52:24] info : 'ntpd' trying to restart
[JST May 13 12:52:24] info : 'ntpd' start: /etc/init.d/ntpd
# /etc/init.d/ntpd status
ntpd (pid 9475) is running...
Mail sample format
The following is examples of alert mail when monit works.
- notifying that the daemon is stopped
<Subject>
Monit Alert -- ntpd Does not exist --
<Body>
Hostname: ha-mgr02.forschooner.net
Service: ntpd
Action: restart
Date/Time: Sun, 13 May 2012 12:52:24
Info: process is not running
- notifying that the daemon starts
<Subject>
Monit Alert -- ntpd Action done --
<Body>
Hostname: ha-mgr02.forschooner.net
Service: ntpd
Action: alert
Date/Time: Sun, 13 May 2012 12:54:15
Info: start action done
- notifying that the daemon is stopped
<Subject>
Monit Alert -- ntpd Exists --
<Body>
Hostname: ha-mgr02.forschooner.net
Service: ntpd
Action: alert
Date/Time: Sun, 13 May 2012 12:54:15
Info: process is running with pid 9475
No comments:
Post a Comment