It is necessary to construct a redundant system with High Availability to serve it all times or reduce downtime.
High availability is a system design
approach and associated service implementation that ensures a
prearranged level of operational performance will be met during a
contractual measurement period. Wikipedia High Availability
And, it is also important to assure that a system with high availability i s running under such a condition, Master/Slave or Primary/Secondary.
I am going to introduce of how to monitor a such a system with nagios in the following examples and let's see the mysql replication first.
- MySQL Replication
- PostgresSQL Replication
- HA Cluster with DRBD & Pacemaker
MySQL Replication
It is important to monitor a master server binary dump running and a slave server I/O, SQL thread running and slave lag(seconds behind master). MySQL official introduces about the details about MySQL replication implementation ,
here. I would like to show you about monitoring the status of slave server( I/O and SQL thread) with a nagios plug-in called
check_mysql_health, released by
Console Labs.
This plug-in, by the way, is absolutely useful because it is enable to check the various mysql parameters, such as the number of connections, query cache hit rate, or the number of slow queries including the health of mysql replication.
System Structure
OS |
CentOS-5.8 |
Kernel |
2.6.18-274.el5 |
DB |
mysql-5.5.24 |
Scripting Language |
perl-5.14.2 |
Nagios Plugin |
check_mysql_health-2.1.5.1 |
icinga core |
icinga-1.6.1 |
Install check_mysql_health
# wget http://labs.consol.de/wp-content/uploads/2011/04/check_mysql_health-2.1.5.1.tar.gz
# tar zxf check_mysql_health-2.1.5.1.tar.gz
# cd check_mysql_health-2.1.5.1
# ./configure \
--with-nagios-user=nagios \
--with-nagios-group=nagios \
--with-mymodules-dir=/usr/lib64/nagios/plugins
# make
# make instal
# cp -p plugins-scripts/check_mysql_health /usr/local/nagios/libexec
# for modules in \
DBI \
DBD::mysql \
Time::HiRes \
IO::File \
File::Copy \
File::Temp \
Time::HiRes \
IO::File \
Data::Dumper \File::Basename \
Getopt::Long
do cpan -i $modules
done
- grant privileges for mysql user
# mysql -uroot -p mysql -e "GRANT SELECT, SUPER,REPLICATION CLIENT ON *.* TO nagios@'localhost' IDENTIFIED BY 'nagios'; FLUSH PRIVILEGES ;"
# mysql -uroot -p mysql -e "SELECT * FROM user WHERE User = 'nagios'\G;"
*************************** 1. row ***************************
Host: localhost
User: nagios
Password: *82802C50A7A5CDFDEA2653A1503FC4B8939C4047
Select_priv: Y
Insert_priv: N
Update_priv: N
Delete_priv: N
Create_priv: N
Drop_priv: N
Reload_priv: N
Shutdown_priv: N
Process_priv: N
File_priv: N
Grant_priv: N
References_priv: N
Index_priv: N
Alter_priv: N
Show_db_priv: N
Super_priv: Y
Create_tmp_table_priv: N
Lock_tables_priv: N
Execute_priv: N
Repl_slave_priv: N
Repl_client_priv: Y
Create_view_priv: N
Show_view_priv: N
Create_routine_priv: N
Alter_routine_priv: N
Create_user_priv: N
Event_priv: N
Trigger_priv: N
Create_tablespace_priv: N
ssl_type:
ssl_cipher:
x509_issuer:
x509_subject:
max_questions: 0
max_updates: 0
max_connections: 0
max_user_connections: 0
plugin:
authentication_string: NULL
- revise parentheses deprecated error
Please just revise the line below if parentheses deprecated error detected.
# check_mysql_health --hostname localhost --username root --mode uptime
Use of qw(...) as parentheses is deprecated at check_mysql_health line 1247.
Use of qw(...) as parentheses is deprecated at check_mysql_health line 2596.
Use of qw(...) as parentheses is deprecated at check_mysql_health line 3473.
OK - database is up since 2677 minutes | uptime=160628s
# cp -p check_mysql_health{,.bak}
# vi check_mysql_health
...
# diff -u check_mysql_health.bak check_mysql_health
--- check_mysql_health.bak 2011-07-15 17:46:28.000000000 +0900
+++ check_mysql_health 2011-07-17 14:04:45.000000000 +0900
@@ -1244,7 +1244,7 @@
my $message = shift;
push(@{$self->{nagios}->{messages}->{$level}}, $message);
# recalc current level
- foreach my $llevel qw(CRITICAL WARNING UNKNOWN OK) {
+ foreach my $llevel (qw(CRITICAL WARNING UNKNOWN OK)) {
if (scalar(@{$self->{nagios}->{messages}->{$ERRORS{$llevel}}})) {
$self->{nagios_level} = $ERRORS{$llevel};
}
@@ -2593,7 +2593,7 @@
my $message = shift;
push(@{$self->{nagios}->{messages}->{$level}}, $message);
# recalc current level
- foreach my $llevel qw(CRITICAL WARNING UNKNOWN OK) {
+ foreach my $llevel (qw(CRITICAL WARNING UNKNOWN OK)) {
if (scalar(@{$self->{nagios}->{messages}->{$ERRORS{$llevel}}})) {
$self->{nagios_level} = $ERRORS{$llevel};
}
@@ -3469,8 +3469,8 @@
$needs_restart = 1;
# if the calling script has a path for shared libs and there is no --environment
# parameter then the called script surely needs the variable too.
- foreach my $important_env qw(LD_LIBRARY_PATH SHLIB_PATH
- ORACLE_HOME TNS_ADMIN ORA_NLS ORA_NLS33 ORA_NLS10) {
+ foreach my $important_env (qw(LD_LIBRARY_PATH SHLIB_PATH
+ ORACLE_HOME TNS_ADMIN ORA_NLS ORA_NLS33 ORA_NLS10)) {
if ($ENV{$important_env} && ! scalar(grep { /^$important_env=/ }
keys %{$commandline{environment}})) {
$commandline{environment}->{$important_env} = $ENV{$important_env};
Verification
I am going to verify the mysql replication status about slave lag, I/O thread and SQL thread in the following condition, supposing that mysql replication is running.
Please see
the official information of how to setup mysql replication.
- Both I/O thread and SQL thread running
- I/O thread stopped, SQL thread running
- I/O thread running, SQL thread stopped
- Both I/O thread and SQL thread running
# check_mysql_health --hostname localhost --username nagios --password nagios --warning 5 --critical 10 --mode slave-lag
OK - Slave is 0 seconds behind master | slave_lag=0;5;1
# check_mysql_health --hostname localhost --username nagios --password nagios --warning 5 --critical 10 --mode slave-io-running
OK - Slave io is running
# check_mysql_health --hostname localhost --username nagios --password nagios --warning 5 --critical 10 --mode slave-sql-running
OK - Slave sql is running
# mysql -uroot -p myql -e "STOP SLAVE IO_THREAD;"
# check_mysql_health --hostname localhost --username nagios --password nagios --warning 5 --critical 10 --mode slave-lag
CRITICAL - unable to get slave lag, because io thread is not running
# check_mysql_health --hostname localhost --username nagios --password nagios --warning 5 --critical 10 --mode slave-io-running
CRITICAL - Slave io is not running
# check_mysql_health --hostname localhost --username nagios --password nagios --warning 5 --critical 10 --mode slave-sql-running
OK - Slave sql is running
- I/O thread running, SQL thread stopped
# mysql -uroot -p myql -e "STOP SLAVE SQL_THREAD;"
# check_mysql_health --hostname localhost --username nagios --password nagios --warning 5 --critical 10 --mode slave-lag
CRITICAL - unable to get replication inf
# check_mysql_health --hostname localhost --username nagios --password nagios --warning 5 --critical 10 --mode slave-io-running
OK - Slave io is running
# check_mysql_health --hostname localhost --username nagios --password nagios --warning 5 --critical 10 --mode slave-sql-running
CRITICAL - Slave sql is not running
Let's see how to monitor PostgreSQL streaming replication, next.