hashnao's Blog: mha

Showing posts with label mha. Show all posts

Saturday, November 19, 2011

MySQL MHA - init script

As I wrote a typical init script for mhamanager and verified the each process of running, stopping, showing the status.
Tough I used to open the script here, I moved it to my github.

start daemon

# /etc/init.d/mhamanager start
Starting /etc/init.d/mhamanager: nohup: appending output to `nohup.out'

verify if daemon stops

# /etc/init.d/mhamanager status
app1 (pid:15293) is running(0:PING_OK), master:ha-db01

stop

# /etc/init.d/mhamanager stop
Shutting down /etc/init.d/mhamanager: Stopped app1 successfully.

verify if daemon is running

# /etc/init.d/mhamanager status
app1 is stopped(2:NOT_RUNNING).

show usage

# /etc/init.d/mhamanager
Usage: /etc/init.d/mhamanager {start|stop|restart|condrestart|status|checkrepl}

Lastly, We'll see the mysql master failover. I'm going to make it happen by stopping mysql daemon.
The relations between Master and Slave, MHA Node and MHA Master have been the same before.

Moreover, configuration files such as global and application are too.

Please see the MySQL MHA - Switchover if you need to read the detail of those configuration files.

DB(Master) + MHA Manager	192.168.100.200(ha-mgr01)
DB(Slave) + MHA Node	192.168.100.197(ha-db01)
DB(Slave) + MHA Node	192.168.100.198(ha-db02)

Firstly, stop Master DB(ha-db01)
Secondly, transfer Master_Host from Master(ha-db01) to Slave(ha-db02)

Currently logged in MHA Manger(ha-mgr01)
run node manager

# masterha_manager --conf=/etc/app1.cnf
Thu Jul 28 12:27:19 2011 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Thu Jul 28 12:27:19 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Thu Jul 28 12:27:19 2011 - [info] Reading server configurations from /etc/app1.cnf..

check the status of Master server

# masterha_check_status --conf=/etc/app1.cnf
app1 (pid:26638) is running(0:PING_OK), master:ha-db01

stop mysql daemon on Master DB

# ssh ha-db01 '/etc/init.d/mysql stop'

check the status of Master server

# masterha_check_status --conf=/etc/app1.cnf
app1 is stopped(2:NOT_RUNNING).

check the status of slave hosts

# mysql -uroot -pmysql -e 'SHOW SLAVE HOSTS\G' -h ha-db02
*************************** 1. row ***************************
Server_id: 300
     Host: 
     Port: 3306
Master_id: 200

verify if the Master_Host has been transferred from Mster(ha-db01) to Slave(ha-db02)

# mysql -uroo -p mysql -e 'SHOW SLAVE STATUS\G' -h localhost
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.100.198
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 107
               Relay_Log_File: ha-mgr01-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 412
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 200

# mysql -uroot -pmysql -e 'SHOW SLAVE STATUS\G' -h ha-db02
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 192.168.100.197
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysqld-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 126
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 100

Watching the MHA Master server's log over the failover

# tail -f /var/log/masterha/app1/app1.log 
Wed Jul 27 17:21:34 2011 - [info] MHA::MasterMonitor version 0.50.
Wed Jul 27 17:21:35 2011 - [info] Dead Servers:
Wed Jul 27 17:21:35 2011 - [info] Alive Servers:
Wed Jul 27 17:21:35 2011 - [info]   ha-db01(192.168.100.197:3306)
Wed Jul 27 17:21:35 2011 - [info]   ha-db02(192.168.100.198:3306)
Wed Jul 27 17:21:35 2011 - [info]   ha-mgr01(192.168.100.200:3306)
Wed Jul 27 17:21:35 2011 - [info] Alive Slaves:
Wed Jul 27 17:21:35 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:21:35 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:21:35 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:21:35 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:21:35 2011 - [info] Current Master: ha-db01(192.168.100.197:3306)
Wed Jul 27 17:21:35 2011 - [info] Checking slave configurations..
Wed Jul 27 17:21:35 2011 - [warn]  read_only=1 is not set on slave ha-db02(192.168.100.198:3306).
Wed Jul 27 17:21:35 2011 - [warn]  relay_log_purge=0 is not set on slave ha-db02(192.168.100.198:3306).
Wed Jul 27 17:21:35 2011 - [warn]  read_only=1 is not set on slave ha-mgr01(192.168.100.200:3306).
Wed Jul 27 17:21:35 2011 - [warn]  relay_log_purge=0 is not set on slave ha-mgr01(192.168.100.200:3306).
Wed Jul 27 17:21:35 2011 - [info] Checking replication filtering settings..
Wed Jul 27 17:21:35 2011 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Jul 27 17:21:35 2011 - [info]  Replication filtering check ok.
Wed Jul 27 17:21:35 2011 - [info] Starting SSH connection tests..
Wed Jul 27 17:21:36 2011 - [info] All SSH connection tests passed successfully.
Wed Jul 27 17:21:36 2011 - [info] Checking MHA Node version..
Wed Jul 27 17:21:37 2011 - [info]  Version check ok.
Wed Jul 27 17:21:37 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on the current master..
Wed Jul 27 17:21:37 2011 - [info]   Executing command: save_binary_logs --command=test --start_file=mysql-bin.000044 --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.50 
Wed Jul 27 17:21:37 2011 - [info]   Connecting to root@ha-db01(ha-db01).. 
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000044
Wed Jul 27 17:21:37 2011 - [info] Master setting check done.
Wed Jul 27 17:21:37 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Jul 27 17:21:37 2011 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=ha-db02 --slave_ip=192.168.100.198 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --manager_version=0.50 --relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
Wed Jul 27 17:21:37 2011 - [info]   Connecting to root@192.168.100.198(ha-db02).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000018
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000018
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jul 27 17:21:38 2011 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=ha-mgr01 --slave_ip=192.168.100.200 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --manager_version=0.50 --relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
Wed Jul 27 17:21:38 2011 - [info]   Connecting to root@192.168.100.200(ha-mgr01).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to ha-mgr01-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/ha-mgr01-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jul 27 17:21:38 2011 - [info] Slaves settings check done.
Wed Jul 27 17:21:38 2011 - [info] 
ha-db01 (current master)
 +--ha-db02
 +--ha-mgr01
Wed Jul 27 17:21:38 2011 - [warn] master_ip_failover_script is not defined.
Wed Jul 27 17:21:38 2011 - [warn] shutdown_script is not defined.
Wed Jul 27 17:21:38 2011 - [info] Set master ping interval 3 seconds.
Wed Jul 27 17:21:38 2011 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s 192.168.100.198 --user=root --master_ip=192.168.100.197 --master_port=3306 --master_host=ha-db01 -s 192.168.100.197
Wed Jul 27 17:21:38 2011 - [info] Starting ping health check on ha-db01(192.168.100.197:3306)..
Wed Jul 27 17:21:38 2011 - [info] Ping succeeded, sleeping until it doesn't respond..
Wed Jul 27 17:22:53 2011 - [warn] Got error on MySQL ping: 2006 (MySQL server has gone away)
Wed Jul 27 17:22:53 2011 - [info] Executing seconary network check script: /usr/local/bin/masterha_secondary_check -s 192.168.100.198 --user=root --master_ip=192.168.100.197 --master_port=3306 --master_host=ha-db01 -s 192.168.100.197  --user=root  --master_host=ha-db01  --master_ip=192.168.100.197  --master_port=3306
Wed Jul 27 17:22:53 2011 - [info] HealthCheck: SSH to ha-db01 is reachable.
Monitoring server 192.168.100.198 is reachable, Master is not reachable from 192.168.100.198. OK.
Monitoring server 192.168.100.197 is reachable, Master is not reachable from 192.168.100.197. OK.
Wed Jul 27 17:22:54 2011 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Wed Jul 27 17:22:56 2011 - [warn] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.197' (111))
Wed Jul 27 17:22:56 2011 - [warn] Connection failed 1 time(s)..
Wed Jul 27 17:22:59 2011 - [warn] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.197' (111))
Wed Jul 27 17:22:59 2011 - [warn] Connection failed 2 time(s)..
Wed Jul 27 17:23:02 2011 - [warn] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.197' (111))
Wed Jul 27 17:23:02 2011 - [warn] Connection failed 3 time(s)..
Wed Jul 27 17:23:02 2011 - [warn] Master is not reachable from health checker!
Wed Jul 27 17:23:02 2011 - [warn] Master ha-db01(192.168.100.197:3306) is not reachable!
Wed Jul 27 17:23:02 2011 - [warn] SSH is reachable.
Wed Jul 27 17:23:02 2011 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Wed Jul 27 17:23:02 2011 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Wed Jul 27 17:23:02 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Wed Jul 27 17:23:02 2011 - [info] Reading server configurations from /etc/app1.cnf..
Wed Jul 27 17:23:02 2011 - [info] Dead Servers:
Wed Jul 27 17:23:02 2011 - [info]   ha-db01(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] Alive Servers:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)
Wed Jul 27 17:23:02 2011 - [info] Alive Slaves:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] Checking slave configurations..
Wed Jul 27 17:23:02 2011 - [warn]  read_only=1 is not set on slave ha-db02(192.168.100.198:3306).
Wed Jul 27 17:23:02 2011 - [warn]  relay_log_purge=0 is not set on slave ha-db02(192.168.100.198:3306).
Wed Jul 27 17:23:02 2011 - [warn]  read_only=1 is not set on slave ha-mgr01(192.168.100.200:3306).
Wed Jul 27 17:23:02 2011 - [warn]  relay_log_purge=0 is not set on slave ha-mgr01(192.168.100.200:3306).
Wed Jul 27 17:23:02 2011 - [info] Checking replication filtering settings..
Wed Jul 27 17:23:02 2011 - [info]  Replication filtering check ok.
Wed Jul 27 17:23:02 2011 - [info] Master is down!
Wed Jul 27 17:23:02 2011 - [info] Terminating monitoring script.
Wed Jul 27 17:23:02 2011 - [info] Got exit code 20 (Master dead).
Wed Jul 27 17:23:02 2011 - [info] MHA::MasterFailover version 0.50.
Wed Jul 27 17:23:02 2011 - [info] Starting master failover.
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 1: Configuration Check Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] Dead Servers:
Wed Jul 27 17:23:02 2011 - [info]   ha-db01(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] Checking master reachability via mysql(double check)..
Wed Jul 27 17:23:02 2011 - [info]  ok.
Wed Jul 27 17:23:02 2011 - [info] Alive Servers:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)
Wed Jul 27 17:23:02 2011 - [info] Alive Slaves:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] ** Phase 1: Configuration Check Phase completed.
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 2: Dead Master Shutdown Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] Forcing shutdown so that applications never connect to the current master..
Wed Jul 27 17:23:02 2011 - [warn] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Wed Jul 27 17:23:02 2011 - [warn] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Wed Jul 27 17:23:02 2011 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 3: Master Recovery Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] The latest binary log file/position on all slaves is mysql-bin.000044:107
Wed Jul 27 17:23:02 2011 - [info] Latest slaves (Slaves that received relay log files to the latest):
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] The oldest binary log file/position on all slaves is mysql-bin.000044:107
Wed Jul 27 17:23:02 2011 - [info] Oldest slaves:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] Fetching dead master's binary logs..
Wed Jul 27 17:23:02 2011 - [info] Executing command on the dead master ha-db01(192.168.100.197:3306): save_binary_logs --command=save --start_file=mysql-bin.000044  --start_pos=107 --binlog_dir=/var/lib/mysql --output_file=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000044 pos 107 to mysql-bin.000044 EOF into /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog ..
  Dumping binlog format description event, from position 0 to 107.. ok.
  Dumping effective binlog data from /var/lib/mysql/mysql-bin.000044 position 107 to tail(126).. ok.
 Concat succeeded.
Wed Jul 27 17:23:03 2011 - [info] scp from root@192.168.100.197:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog succeeded.
Wed Jul 27 17:23:03 2011 - [info] HealthCheck: SSH to ha-db02 is reachable.
Wed Jul 27 17:23:04 2011 - [info] HealthCheck: SSH to ha-mgr01 is reachable.
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] * Phase 3.3: Determining New Master Phase..
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Jul 27 17:23:04 2011 - [info] All slaves received relay logs to the same position. No need to resync each other.
Wed Jul 27 17:23:04 2011 - [info] Searching new master from slaves..
Wed Jul 27 17:23:04 2011 - [info]  Candidate masters from the configuration file:
Wed Jul 27 17:23:04 2011 - [info]  Non-candidate masters:
Wed Jul 27 17:23:04 2011 - [info] New master is ha-db02(192.168.100.198:3306)
Wed Jul 27 17:23:04 2011 - [info] Starting master failover..
Wed Jul 27 17:23:04 2011 - [info] 
From:
ha-db01 (current master)
 +--ha-db02
 +--ha-mgr01
To:
ha-db02 (new master)
 +--ha-mgr01
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jul 27 17:23:04 2011 - [info] Sending binlog..
Wed Jul 27 17:23:04 2011 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog to root@ha-db02:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog succeeded.
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] * Phase 3.4: Master Log Apply Phase..
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Jul 27 17:23:04 2011 - [info] Starting recovery on ha-db02(192.168.100.198:3306)..
Wed Jul 27 17:23:04 2011 - [info]  Generating diffs succeeded.
Wed Jul 27 17:23:04 2011 - [info] Waiting until all relay logs are applied.
Wed Jul 27 17:23:04 2011 - [info]  done.
Wed Jul 27 17:23:05 2011 - [info] Getting slave status..
Wed Jul 27 17:23:05 2011 - [info] This slave(ha-db02)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000044:107). No need to recover from Exec_Master_Log_Pos.
Wed Jul 27 17:23:05 2011 - [info] Connecting to the target slave host ha-db02, running recover script..
Wed Jul 27 17:23:05 2011 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=ha-db02 --slave_ip=192.168.100.198  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --timestamp=20110727172302 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50 --slave_pass=xxx
Wed Jul 27 17:23:05 2011 - [info] 
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog on ha-db02:3306. This may take long time...
Applying log files succeeded.
Wed Jul 27 17:23:05 2011 - [info]  All relay logs were successfully applied.
Wed Jul 27 17:23:05 2011 - [info] Getting new master's binlog name and position..
Wed Jul 27 17:23:05 2011 - [info]  mysql-bin.000006:107
Wed Jul 27 17:23:05 2011 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='ha-db02 or 192.168.100.198', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=107, MASTER_USER='replication', MASTER_PASSWORD='xxx';
Wed Jul 27 17:23:05 2011 - [warn] master_ip_failover_script is not set. Skipping taking over new master ip address.
Wed Jul 27 17:23:05 2011 - [info] ** Finished master recovery successfully.
Wed Jul 27 17:23:05 2011 - [info] * Phase 3: Master Recovery Phase completed.
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] * Phase 4: Slaves Recovery Phase..
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] -- Slave diff file generation on host ha-mgr01(192.168.100.200:3306) started, pid: 23533. Check tmp log /var/log/masterha/app1/ha-mgr02_3306_20110727172302.log if it takes time..
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] Log messages from ha-mgr01 ...
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jul 27 17:23:05 2011 - [info] End of log messages from ha-mgr01.
Wed Jul 27 17:23:05 2011 - [info] -- ha-mgr01(192.168.100.200:3306) has the latest relay log events.
Wed Jul 27 17:23:05 2011 - [info] Generating relay diff files from the latest slave succeeded.
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] -- Slave recovery on host ha-mgr01(192.168.100.200:3306) started, pid: 23535. Check tmp log /var/log/masterha/app1/ha-mgr02_3306_20110727172302.log if it takes time..
Wed Jul 27 17:23:06 2011 - [info] 
Wed Jul 27 17:23:06 2011 - [info] Log messages from ha-mgr01 ...
Wed Jul 27 17:23:06 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] Sending binlog..
Wed Jul 27 17:23:05 2011 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog to root@ha-mgr01:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog succeeded.
Wed Jul 27 17:23:05 2011 - [info] Starting recovery on ha-mgr01(192.168.100.200:3306)..
Wed Jul 27 17:23:05 2011 - [info]  Generating diffs succeeded.
Wed Jul 27 17:23:05 2011 - [info] Waiting until all relay logs are applied.
Wed Jul 27 17:23:05 2011 - [info]  done.
Wed Jul 27 17:23:05 2011 - [info] Getting slave status..
Wed Jul 27 17:23:05 2011 - [info] This slave(ha-mgr01)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000044:107). No need to recover from Exec_Master_Log_Pos.
Wed Jul 27 17:23:05 2011 - [info] Connecting to the target slave host ha-mgr01, running recover script..
Wed Jul 27 17:23:05 2011 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=ha-mgr01 --slave_ip=192.168.100.200  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --timestamp=20110727172302 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50 --slave_pass=xxx
Wed Jul 27 17:23:06 2011 - [info] 
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog on ha-mgr01:3306. This may take long time...
Applying log files succeeded.
Wed Jul 27 17:23:06 2011 - [info]  All relay logs were successfully applied.
Wed Jul 27 17:23:06 2011 - [info]  Resetting slave ha-mgr01(192.168.100.200:3306) and starting replication from the new master ha-db02(192.168.100.198:3306)..
Wed Jul 27 17:23:06 2011 - [info]  Executed CHANGE MASTER.
Wed Jul 27 17:23:06 2011 - [info]  Slave started.
Wed Jul 27 17:23:06 2011 - [info] End of log messages from ha-mgr01.
Wed Jul 27 17:23:06 2011 - [info] -- Slave recovery on host ha-mgr01(192.168.100.200:3306) succeeded.
Wed Jul 27 17:23:06 2011 - [info] All new slave servers recovered successfully.
Wed Jul 27 17:23:06 2011 - [info] 
Wed Jul 27 17:23:06 2011 - [info] * Phase 5: New master cleanup phease..
Wed Jul 27 17:23:06 2011 - [info] 
Wed Jul 27 17:23:06 2011 - [info] Resetting slave info on the new master..
Wed Jul 27 17:23:06 2011 - [info] Master failover to ha-db02(192.168.100.198:3306) completed successfully.
Wed Jul 27 17:23:06 2011 - [info] 
----- Failover Report -----
app1: MySQL Master failover ha-db01 to ha-db02 succeeded
Master ha-db01 is down!
Check MHA Manager logs at ha-mgr01.forschooner.net:/var/log/masterha/app1/app1.log for details.
Started automated(non-interactive) failover.
The latest slave ha-db02(192.168.100.198:3306) has all relay logs for recovery.
Selected ha-db02 as a new master.
ha-db02: OK: Applying all logs succeeded.
ha-mgr01: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
ha-mgr01: OK: Applying all logs succeeded. Slave started, replicating from ha-db02.
ha-db02: Resetting slave info succeeded.
Master failover to ha-db02(192.168.100.198:3306) completed successfully.
Wed Jul 27 17:23:06 2011 - [info] Sending mail..
Unknown option: conf

Wednesday, August 3, 2011

MySQL MHA - switchover

As MySQL MHA Node and Manager have been installed, next is switching master server manually.

The relation Master and Slaves, including MHA manager and nodes, is below.

ha-db01: Master + MHA Node → MySQL stopped
ha-db02: Slave + MHA Node → Master
ha-mgr01: Slave + MHA Manager → Slave

This is how switchover occurs.

stop mysql on master.
switch master to Slave(ha-db02) manually.
slave server(ha-mgr02) recognizes ha-db02 as master.

Currently logged in MHA Manger(ha-mgr01)
Setting up hosts file to connect via hostname

# cat >> /etc/hosts <<EOF
192.168.100.197 ha-db01
192.168.100.198 ha-db02
192.168.100.200 ha-mgr01
EOF

Displaying a list of replication slaves registered with Master

# mysql -uroot -pmysql -e 'SHOW SLAVE HOSTS\G' -h ha-db01
*************************** 1. row ***************************
Server_id: 300
     Host: 
     Port: 3306
Master_id: 100
*************************** 2. row ***************************
Server_id: 200
     Host: 
     Port: 3306
Master_id: 100

Checking the replication status of the slave servers

# mysql -uroot -pmysql -e 'SHOW SLAVE STATUS\G' -h ha-db02
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.100.197
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000045
          Read_Master_Log_Pos: 107
               Relay_Log_File: mysqld-relay-bin.00001
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000045
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 556
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 100

# mysql -uroot -pmysql -e 'SHOW SLAVE STATUS\G' -h localhost
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.100.197
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000045
          Read_Master_Log_Pos: 107
               Relay_Log_File: ha-mgr01-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000045
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 412
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 100

Setting up the Global configuration file

# mkdir -p /var/log/masterha/app1
# cat > /etc/masterha_default.cnf <<EOF
[server default]
user=root
password=mysql
ssh_user=root
master_binlog_dir=/var/lib/mysql
remote_workdir=/tmp/masterha
secondary_check_script=/usr/local/bin/masterha_secondary_check -s ha-db02 --user=root --master_ip=192.168.100.197 --master_port=3306 --master_host=ha-db01 -s ha-db01
ping_interval=3
#master_ip_failover_script=/usr/local/bin/master_ip_failover
#master_ip_online_change_script=/usr/local/bin/master_ip_online_change
#shutdown_script=/usr/local/bin/power_manager
report_script=/usr/local/bin/send_report
EOF

Setting up the Application configuration file

# cat > /etc/app1.cnf <<EOF
[server default]
user=root
password=mysql
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log
remote_workdir=/var/log/masterha/app1
[server1]
hostname=ha-db01
[server2]
hostname=ha-db02
[server2]
hostname=ha-mgr01
EOF

Checking to connect via ssh with MHA Manager and Nodes

# masterha_check_ssh --conf=/etc/app1.cnf 
Thu Jul 28 10:32:27 2011 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Thu Jul 28 10:32:27 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Thu Jul 28 10:32:27 2011 - [info] Reading server configurations from /etc/app1.cnf..
Thu Jul 28 10:32:27 2011 - [info] Starting SSH connection tests..
Thu Jul 28 10:32:28 2011 - [debug] 
Thu Jul 28 10:32:27 2011 - [debug]  Connecting via SSH from root@ha-db01(192.168.100.197) to root@ha-db02(192.168.100.198)..
Thu Jul 28 10:32:28 2011 - [debug]   ok.
Thu Jul 28 10:32:28 2011 - [debug]  Connecting via SSH from root@ha-db01(192.168.100.197) to root@ha-mgr02(192.168.100.200)..
Thu Jul 28 10:32:28 2011 - [debug]   ok.
Thu Jul 28 10:32:29 2011 - [debug] 
Thu Jul 28 10:32:28 2011 - [debug]  Connecting via SSH from root@ha-db02(192.168.100.198) to root@ha-db01(192.168.100.197)..
Thu Jul 28 10:32:28 2011 - [debug]   ok.
Thu Jul 28 10:32:28 2011 - [debug]  Connecting via SSH from root@ha-db02(192.168.100.198) to root@ha-mgr02(192.168.100.200)..
Thu Jul 28 10:32:28 2011 - [debug]   ok.
Thu Jul 28 10:32:29 2011 - [debug] 
Thu Jul 28 10:32:28 2011 - [debug]  Connecting via SSH from root@ha-mgr02(192.168.100.200) to root@ha-db01(192.168.100.197)..
Thu Jul 28 10:32:29 2011 - [debug]   ok.
Thu Jul 28 10:32:29 2011 - [debug]  Connecting via SSH from root@ha-mgr02(192.168.100.200) to root@ha-db02(192.168.100.198)..
Thu Jul 28 10:32:29 2011 - [debug]   ok.
Thu Jul 28 10:32:29 2011 - [info] All SSH connection tests passed successfully.

Checking if MySQL replication is enabled

# masterha_check_repl --conf=/etc/app1.cnf 
Thu Jul 28 10:32:39 2011 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Thu Jul 28 10:32:39 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Thu Jul 28 10:32:39 2011 - [info] Reading server configurations from /etc/app1.cnf..
Thu Jul 28 10:32:39 2011 - [info] MHA::MasterMonitor version 0.50.
Thu Jul 28 10:32:39 2011 - [info] Dead Servers:
Thu Jul 28 10:32:39 2011 - [info] Alive Servers:
Thu Jul 28 10:32:39 2011 - [info]   ha-db01(192.168.100.197:3306)
Thu Jul 28 10:32:39 2011 - [info]   ha-db02(192.168.100.198:3306)
Thu Jul 28 10:32:39 2011 - [info]   ha-mgr02(192.168.100.200:3306)
Thu Jul 28 10:32:39 2011 - [info] Alive Slaves:
Thu Jul 28 10:32:39 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 10:32:39 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Thu Jul 28 10:32:39 2011 - [info]   ha-mgr02(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 10:32:39 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Thu Jul 28 10:32:39 2011 - [info] Current Master: ha-db01(192.168.100.197:3306)
Thu Jul 28 10:32:39 2011 - [info] Checking slave configurations..
Thu Jul 28 10:32:39 2011 - [warn]  read_only=1 is not set on slave ha-db02(192.168.100.198:3306).
Thu Jul 28 10:32:39 2011 - [warn]  relay_log_purge=0 is not set on slave ha-db02(192.168.100.198:3306).
Thu Jul 28 10:32:39 2011 - [warn]  read_only=1 is not set on slave ha-mgr02(192.168.100.200:3306).
Thu Jul 28 10:32:39 2011 - [warn]  relay_log_purge=0 is not set on slave ha-mgr02(192.168.100.200:3306).
Thu Jul 28 10:32:39 2011 - [info] Checking replication filtering settings..
Thu Jul 28 10:32:39 2011 - [info]  binlog_do_db= , binlog_ignore_db= 
Thu Jul 28 10:32:39 2011 - [info]  Replication filtering check ok.
Thu Jul 28 10:32:39 2011 - [info] Starting SSH connection tests..
Thu Jul 28 10:32:41 2011 - [info] All SSH connection tests passed successfully.
Thu Jul 28 10:32:41 2011 - [info] Checking MHA Node version..
Thu Jul 28 10:32:41 2011 - [info]  Version check ok.
Thu Jul 28 10:32:41 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on the current master..
Thu Jul 28 10:32:42 2011 - [info]   Executing command: save_binary_logs --command=test --start_file=mysql-bin.000045 --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.50 
Thu Jul 28 10:32:42 2011 - [info]   Connecting to root@ha-db01(ha-db01).. 
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000045
Thu Jul 28 10:32:42 2011 - [info] Master setting check done.
Thu Jul 28 10:32:42 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Thu Jul 28 10:32:42 2011 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=ha-db02 --slave_ip=192.168.100.198 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --manager_version=0.50 --relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
Thu Jul 28 10:32:42 2011 - [info]   Connecting to root@192.168.100.198(ha-db02).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000017
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000017
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Thu Jul 28 10:32:42 2011 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=ha-mgr02 --slave_ip=192.168.100.200 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --manager_version=0.50 --relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
Thu Jul 28 10:32:42 2011 - [info]   Connecting to root@192.168.100.200(ha-mgr02).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to ha-mgr02-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/ha-mgr02-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Thu Jul 28 10:32:42 2011 - [info] Slaves settings check done.
Thu Jul 28 10:32:42 2011 - [info] 
ha-db01 (current master)
 +--ha-db02
 +--ha-mgr02
Thu Jul 28 10:32:42 2011 - [info] Checking replication health on ha-db02..
Thu Jul 28 10:32:42 2011 - [info]  ok.
Thu Jul 28 10:32:42 2011 - [info] Checking replication health on ha-mgr02..
Thu Jul 28 10:32:42 2011 - [info]  ok.
Thu Jul 28 10:32:42 2011 - [warn] master_ip_failover_script is not defined.
Thu Jul 28 10:32:42 2011 - [warn] shutdown_script is not defined.
Thu Jul 28 10:32:42 2011 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.

Starting up the Node Manager

# masterha_manager --conf=/etc/app1.cnf

Checking the status

# masterha_check_status --conf=/etc/app1.cnf

Stopping MySQL Daemon

# ssh ha-db01 '/etc/init.d/mysql stop'

Changing Master Server ha-db01 to ha-db02 manually

# masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=ha-db01 --new_master_host=ha-db02
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.100.197.
--dead_master_port=<dead_master_port> is not set. Using 3306.
Thu Jul 28 10:39:09 2011 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Thu Jul 28 10:39:09 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Thu Jul 28 10:39:09 2011 - [info] Reading server configurations from /etc/app1.cnf..
Thu Jul 28 10:39:09 2011 - [info] MHA::MasterFailover version 0.50.
Thu Jul 28 10:39:09 2011 - [info] Starting master failover.
Thu Jul 28 10:39:09 2011 - [info] 
Thu Jul 28 10:39:09 2011 - [info] * Phase 1: Configuration Check Phase..
Thu Jul 28 10:39:09 2011 - [info] 
Thu Jul 28 10:39:09 2011 - [info] Dead Servers:
Thu Jul 28 10:39:09 2011 - [info]   ha-db01(192.168.100.197:3306)
Thu Jul 28 10:39:09 2011 - [info] Checking master reachability via mysql(double check)..
Thu Jul 28 10:39:09 2011 - [info]  ok.
Thu Jul 28 10:39:09 2011 - [info] Alive Servers:
Thu Jul 28 10:39:09 2011 - [info]   ha-db02(192.168.100.198:3306)
Thu Jul 28 10:39:09 2011 - [info]   ha-mgr01(192.168.100.200:3306)
Thu Jul 28 10:39:09 2011 - [info] Alive Slaves:
Thu Jul 28 10:39:09 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 10:39:09 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Thu Jul 28 10:39:09 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 10:39:09 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Master ha-db01 is dead. Proceed? (yes/NO): yes
Thu Jul 28 10:39:19 2011 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Jul 28 10:39:19 2011 - [info] 
Thu Jul 28 10:39:19 2011 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Jul 28 10:39:19 2011 - [info] 
Thu Jul 28 10:39:19 2011 - [info] HealthCheck: SSH to ha-db01 is reachable.
Thu Jul 28 10:39:19 2011 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Jul 28 10:39:19 2011 - [warn] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Thu Jul 28 10:39:19 2011 - [warn] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Jul 28 10:39:19 2011 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Jul 28 10:39:19 2011 - [info] 
Thu Jul 28 10:39:19 2011 - [info] * Phase 3: Master Recovery Phase..
Thu Jul 28 10:39:19 2011 - [info] 
Thu Jul 28 10:39:19 2011 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Jul 28 10:39:19 2011 - [info] 
Thu Jul 28 10:39:19 2011 - [info] The latest binary log file/position on all slaves is mysql-bin.000045:107
Thu Jul 28 10:39:19 2011 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Jul 28 10:39:19 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 10:39:19 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Thu Jul 28 10:39:19 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 10:39:19 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Thu Jul 28 10:39:19 2011 - [info] The oldest binary log file/position on all slaves is mysql-bin.000045:107
Thu Jul 28 10:39:19 2011 - [info] Oldest slaves:
Thu Jul 28 10:39:19 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 10:39:19 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Thu Jul 28 10:39:19 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Thu Jul 28 10:39:19 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Thu Jul 28 10:39:19 2011 - [info] 
Thu Jul 28 10:39:19 2011 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Thu Jul 28 10:39:19 2011 - [info] 
Thu Jul 28 10:39:19 2011 - [info] Fetching dead master's binary logs..
Thu Jul 28 10:39:19 2011 - [info] Executing command on the dead master ha-db01(192.168.100.197:3306): save_binary_logs --command=save --start_file=mysql-bin.000045  --start_pos=107 --binlog_dir=/var/lib/mysql --output_file=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000045 pos 107 to mysql-bin.000045 EOF into /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog ..
  Dumping binlog format description event, from position 0 to 107.. ok.
  Dumping effective binlog data from /var/lib/mysql/mysql-bin.000045 position 107 to tail(126).. ok.
 Concat succeeded.
saved_master_binlog_from_ha-db01_3306_20110728103909.binlog                                    100%  126     0.1KB/s   00:00    
Thu Jul 28 10:39:20 2011 - [info] scp from root@192.168.100.197:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog succeeded.
Thu Jul 28 10:39:20 2011 - [info] HealthCheck: SSH to ha-db02 is reachable.
Thu Jul 28 10:39:21 2011 - [info] HealthCheck: SSH to ha-mgr01 is reachable.
Thu Jul 28 10:39:21 2011 - [info] 
Thu Jul 28 10:39:21 2011 - [info] * Phase 3.3: Determining New Master Phase..
Thu Jul 28 10:39:21 2011 - [info] 
Thu Jul 28 10:39:21 2011 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Thu Jul 28 10:39:21 2011 - [info] All slaves received relay logs to the same position. No need to resync each other.
Thu Jul 28 10:39:21 2011 - [info] ha-db02 can be new master.
Thu Jul 28 10:39:21 2011 - [info] New master is ha-db02(192.168.100.198:3306)
Thu Jul 28 10:39:21 2011 - [info] Starting master failover..
Thu Jul 28 10:39:21 2011 - [info] 
From:
ha-db01 (current master)
 +--ha-db02
 +--ha-mgr01
To:
ha-db02 (new master)
 +--ha-mgr01
Starting master switch from ha-db01(192.168.100.197:3306) to ha-db02(192.168.100.198:3306)? (yes/NO): yes
Thu Jul 28 10:39:29 2011 - [info] New master decided manually is ha-db02(192.168.100.198:3306)
Thu Jul 28 10:39:29 2011 - [info] 
Thu Jul 28 10:39:29 2011 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Thu Jul 28 10:39:29 2011 - [info] 
Thu Jul 28 10:39:29 2011 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Thu Jul 28 10:39:29 2011 - [info] Sending binlog..
saved_master_binlog_from_ha-db01_3306_20110728103909.binlog                                    100%  126     0.1KB/s   00:00    
Thu Jul 28 10:39:30 2011 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog to root@ha-db02:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog succeeded.
Thu Jul 28 10:39:30 2011 - [info] 
Thu Jul 28 10:39:30 2011 - [info] * Phase 3.4: Master Log Apply Phase..
Thu Jul 28 10:39:30 2011 - [info] 
Thu Jul 28 10:39:30 2011 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Thu Jul 28 10:39:30 2011 - [info] Starting recovery on ha-db02(192.168.100.198:3306)..
Thu Jul 28 10:39:30 2011 - [info]  Generating diffs succeeded.
Thu Jul 28 10:39:30 2011 - [info] Waiting until all relay logs are applied.
Thu Jul 28 10:39:30 2011 - [info]  done.
Thu Jul 28 10:39:30 2011 - [info] Getting slave status..
Thu Jul 28 10:39:30 2011 - [info] This slave(ha-db02)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000045:107). No need to recover from Exec_Master_Log_Pos.
Thu Jul 28 10:39:30 2011 - [info] Connecting to the target slave host ha-db02, running recover script..
Thu Jul 28 10:39:30 2011 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=ha-db02 --slave_ip=192.168.100.198  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --timestamp=20110728103909 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50 --slave_pass=xxx
Thu Jul 28 10:39:30 2011 - [info] 
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog on ha-db02:3306. This may take long time...
Applying log files succeeded.
Thu Jul 28 10:39:30 2011 - [info]  All relay logs were successfully applied.
Thu Jul 28 10:39:30 2011 - [info] Getting new master's binlog name and position..
Thu Jul 28 10:39:30 2011 - [info]  mysql-bin.000006:107
Thu Jul 28 10:39:30 2011 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='ha-db02 or 192.168.100.198', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=107, MASTER_USER='replication', MASTER_PASSWORD='xxx';
Thu Jul 28 10:39:30 2011 - [warn] master_ip_failover_script is not set. Skipping taking over new master ip address.
Thu Jul 28 10:39:30 2011 - [info] ** Finished master recovery successfully.
Thu Jul 28 10:39:30 2011 - [info] * Phase 3: Master Recovery Phase completed.
Thu Jul 28 10:39:30 2011 - [info] 
Thu Jul 28 10:39:30 2011 - [info] * Phase 4: Slaves Recovery Phase..
Thu Jul 28 10:39:30 2011 - [info] 
Thu Jul 28 10:39:30 2011 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Thu Jul 28 10:39:30 2011 - [info] 
Thu Jul 28 10:39:30 2011 - [info] -- Slave diff file generation on host ha-mgr01(192.168.100.200:3306) started, pid: 18983. Check tmp log /var/log/masterha/app1/ha-mgr02_3306_20110728103909.log if it takes time..
Thu Jul 28 10:39:30 2011 - [info] 
Thu Jul 28 10:39:30 2011 - [info] Log messages from ha-mgr01 ...
Thu Jul 28 10:39:30 2011 - [info] 
Thu Jul 28 10:39:30 2011 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Thu Jul 28 10:39:30 2011 - [info] End of log messages from ha-mgr01.
Thu Jul 28 10:39:30 2011 - [info] -- ha-mgr01(192.168.100.200:3306) has the latest relay log events.
Thu Jul 28 10:39:30 2011 - [info] Generating relay diff files from the latest slave succeeded.
Thu Jul 28 10:39:30 2011 - [info] 
Thu Jul 28 10:39:30 2011 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Thu Jul 28 10:39:30 2011 - [info] 
Thu Jul 28 10:39:30 2011 - [info] -- Slave recovery on host ha-mgr01(192.168.100.200:3306) started, pid: 18985. Check tmp log /var/log/masterha/app1/ha-mgr02_3306_20110728103909.log if it takes time..
saved_master_binlog_from_ha-db01_3306_20110728103909.binlog                                    100%  126     0.1KB/s   00:00    
Thu Jul 28 10:39:31 2011 - [info] 
Thu Jul 28 10:39:31 2011 - [info] Log messages from ha-mgr01 ...
Thu Jul 28 10:39:31 2011 - [info] 
Thu Jul 28 10:39:30 2011 - [info] Sending binlog..
Thu Jul 28 10:39:31 2011 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog to root@ha-mgr01:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog succeeded.
Thu Jul 28 10:39:31 2011 - [info] Starting recovery on ha-mgr01(192.168.100.200:3306)..
Thu Jul 28 10:39:31 2011 - [info]  Generating diffs succeeded.
Thu Jul 28 10:39:31 2011 - [info] Waiting until all relay logs are applied.
Thu Jul 28 10:39:31 2011 - [info]  done.
Thu Jul 28 10:39:31 2011 - [info] Getting slave status..
Thu Jul 28 10:39:31 2011 - [info] This slave(ha-mgr01)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000045:107). No need to recover from Exec_Master_Log_Pos.
Thu Jul 28 10:39:31 2011 - [info] Connecting to the target slave host ha-mgr01, running recover script..
Thu Jul 28 10:39:31 2011 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=ha-mgr01 --slave_ip=192.168.100.200  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --timestamp=20110728103909 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50 --slave_pass=xxx
Thu Jul 28 10:39:31 2011 - [info] 
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog on ha-mgr01:3306. This may take long time...
Applying log files succeeded.
Thu Jul 28 10:39:31 2011 - [info]  All relay logs were successfully applied.
Thu Jul 28 10:39:31 2011 - [info]  Resetting slave ha-mgr01(192.168.100.200:3306) and starting replication from the new master ha-db02(192.168.100.198:3306)..
Thu Jul 28 10:39:31 2011 - [info]  Executed CHANGE MASTER.
Thu Jul 28 10:39:31 2011 - [info]  Slave started.
Thu Jul 28 10:39:31 2011 - [info] End of log messages from ha-mgr01.
Thu Jul 28 10:39:31 2011 - [info] -- Slave recovery on host ha-mgr01(192.168.100.200:3306) succeeded.
Thu Jul 28 10:39:31 2011 - [info] All new slave servers recovered successfully.
Thu Jul 28 10:39:31 2011 - [info] 
Thu Jul 28 10:39:31 2011 - [info] * Phase 5: New master cleanup phease..
Thu Jul 28 10:39:31 2011 - [info] 
Thu Jul 28 10:39:31 2011 - [info] Resetting slave info on the new master..
Thu Jul 28 10:39:31 2011 - [info] Master failover to ha-db02(192.168.100.198:3306) completed successfully.
Thu Jul 28 10:39:31 2011 - [info] 
----- Failover Report -----
app1: MySQL Master failover ha-db01 to ha-db02 succeeded
Master ha-db01 is down!

Check MHA Manager logs at ha-mgr01 for details.


Started manual(interactive) failover.
The latest slave ha-db02(192.168.100.198:3306) has all relay logs for recovery.
Selected ha-db02 as a new master.
ha-db02: OK: Applying all logs succeeded.
ha-mgr01: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
ha-mgr01: OK: Applying all logs succeeded. Slave started, replicating from ha-db02.
ha-db02: Resetting slave info succeeded.
Master failover to ha-db02(192.168.100.198:3306) completed successfully.
Thu Jul 28 10:39:31 2011 - [info] Sending mail..
Unknown option: conf

Displaying a list of replication slaves registered with new Master

# mysql -uroot -pmysql -e 'SHOW SLAVE HOSTS\G' -h ha-db02
*************************** 1. row ***************************
Server_id: 300
     Host: 
     Port: 3306
Master_id: 200

Checking the replication status of the slave servers

# mysql -uroot -pmysql -e 'SHOW SLAVE STATUS\G' -h localhost
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.100.198
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 107
               Relay_Log_File: ha-mgr01-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 412
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 200

# mysql -uroot -pmysql -e 'SHOW SLAVE STATUS\G' -h ha-db02
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 192.168.100.197
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysqld-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 126
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 100

Lastly, let's see how master server fails over automatically.

Monday, August 1, 2011

MySQL MHA - Installation

I'm going to start my blog, mainly about OSS. The reason is that I was inspired by an article on Yoshinori Matsumoto's blog. The title of the article is Announcing MySQL-MHA: "MySQL Master High Availability manager and tools"
I'm thinking that the first article of mine should be MySQL MHA, as I was just looking for something nice of how to automate mysql master failover.

A primary objective of MHA is automating master failover and slave promotion within short (usually 10-30 seconds) downtime, without suffering from replication consistency problems, without spending money for lots of new servers, without performance penalty, without complexity (easy-to-install), and without changing existing deployments.

The official project is here, MySQL MHA.

Before installing MySQL MHA, you need to install mysql server and enable replication between master and slave database server.

MHA Nodes & Manager

DB(Master) + MHA Manager	192.168.100.200(ha-mgr01)
DB(Slave) + MHA Node	192.168.100.197(ha-db01)
DB(Slave) + MHA Node	192.168.100.198(ha-db02)

*Make sure that you need to setup public key authentication to login as root user between those hosts without passphrase.

Installing MHA Node

$ wget http://mysql-master-ha.googlecode.com/files/MySQL-MasterHA-Node-0.50.tar.gz
$ tar zxf MySQL-MasterHA-Node-0.50.tar.gz
$ cd MySQL-MasterHA-Node-0.50
$ perl Makefile.PL
$ make
$ sudo make install

Installing MHA Manager

$ for modules in ¥ DBD::mysql ¥ Config::Tiny ¥
Log::Dispatch ¥
Parallel::ForkManager ¥
Time::HiRes ; do 
sudo cpan install $modules
done
$ wget http://mysql-master-ha.googlecode.com/files/MySQL-MasterHA-Manager-0.50.tar.gz
$ tar zxf MySQL-MasterHA-Manager-0.50.tar.gz
$ cd MySQL-MasterHA-Manager-0.50
$ perl Makefile.PL
$ make
$ sudo make install
$ sudo cp -p samples/scripts/* /usr/local/bin/

Next will be about automating fail-over

My bookshelf on Goodreads

Naoya's bookshelf: read

Building a Home Security System with BeagleBone

by Bill Pretty

Getting Started with Phalcon

by Stephan A Miller

I had known nothing about the Phalcon PHP framework before reading "Getting Started with Phalcon", but I learned the basic step by step with the following chapters: * Chapter 1: Installing Phalcon * Chapter 2: Setting Up a Phalcon Projec...

Mastering Zabbix

by Andrea Dalle Vacche

Mastering Zabbix is quite helpful for the beginners or the intermediate those who need to manage and monitor IT infrastructure especially for mission critical environment or with hundreds of nodes because it covers database-sizing, capac...

hashnao's Blog

ページ

Saturday, November 19, 2011

MySQL MHA - init script

MySQL MHA - Failover

Wednesday, August 3, 2011

MySQL MHA - switchover

Monday, August 1, 2011

MySQL MHA - Installation

iJAWS@Doorkeeper

AWS Certified SysOps Administrator - Associate Level

AWS Certified Solutions Architect Associate Level