Showing posts with label mha. Show all posts
Showing posts with label mha. Show all posts

Saturday, November 19, 2011

MySQL MHA - init script

As I wrote a typical init script for mhamanager and verified the each process of running, stopping, showing the status.
Tough I used to open the script here, I moved it to my github.

  • start daemon
# /etc/init.d/mhamanager start
Starting /etc/init.d/mhamanager: nohup: appending output to `nohup.out' 
  • verify if daemon stops
# /etc/init.d/mhamanager status
app1 (pid:15293) is running(0:PING_OK), master:ha-db01
  • stop
# /etc/init.d/mhamanager stop
Shutting down /etc/init.d/mhamanager: Stopped app1 successfully.
  • verify if daemon is running
# /etc/init.d/mhamanager status
app1 is stopped(2:NOT_RUNNING). 
  • show usage
# /etc/init.d/mhamanager
Usage: /etc/init.d/mhamanager {start|stop|restart|condrestart|status|checkrepl}

MySQL MHA - Failover

Lastly, We'll see the mysql master failover. I'm going to make it happen by stopping mysql daemon.
The relations between Master and Slave, MHA Node and MHA Master have been the same before.

Moreover, configuration files such as global and application are too.
Please see the MySQL MHA - Switchover if you need to read the detail of those configuration files.

DB(Master) + MHA Manager192.168.100.200(ha-mgr01)
DB(Slave) + MHA Node192.168.100.197(ha-db01)
DB(Slave) + MHA Node192.168.100.198(ha-db02)

Firstly, stop Master DB(ha-db01)
Secondly, transfer Master_Host from Master(ha-db01) to Slave(ha-db02)

  • Currently logged in MHA Manger(ha-mgr01)
  • run node manager
# masterha_manager --conf=/etc/app1.cnf
Thu Jul 28 12:27:19 2011 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Thu Jul 28 12:27:19 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Thu Jul 28 12:27:19 2011 - [info] Reading server configurations from /etc/app1.cnf.. 
  • check the status of Master server
# masterha_check_status --conf=/etc/app1.cnf
app1 (pid:26638) is running(0:PING_OK), master:ha-db01 
  • stop mysql daemon on Master DB
# ssh ha-db01 '/etc/init.d/mysql stop'
  • check the status of Master server
# masterha_check_status --conf=/etc/app1.cnf
app1 is stopped(2:NOT_RUNNING).
  • check the status of slave hosts
# mysql -uroot -pmysql -e 'SHOW SLAVE HOSTS\G' -h ha-db02
*************************** 1. row ***************************
Server_id: 300
     Host: 
     Port: 3306
Master_id: 200
  • verify if the Master_Host has been transferred from Mster(ha-db01) to Slave(ha-db02)
# mysql -uroo -p mysql -e 'SHOW SLAVE STATUS\G' -h localhost
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.100.198
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000006
          Read_Master_Log_Pos: 107
               Relay_Log_File: ha-mgr01-relay-bin.000002
                Relay_Log_Pos: 253
        Relay_Master_Log_File: mysql-bin.000006
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 107
              Relay_Log_Space: 412
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 200
# mysql -uroot -pmysql -e 'SHOW SLAVE STATUS\G' -h ha-db02
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 192.168.100.197
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 
          Read_Master_Log_Pos: 4
               Relay_Log_File: mysqld-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File: 
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 126
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 100
  • Watching the MHA Master server's log over the failover
# tail -f /var/log/masterha/app1/app1.log 
Wed Jul 27 17:21:34 2011 - [info] MHA::MasterMonitor version 0.50.
Wed Jul 27 17:21:35 2011 - [info] Dead Servers:
Wed Jul 27 17:21:35 2011 - [info] Alive Servers:
Wed Jul 27 17:21:35 2011 - [info]   ha-db01(192.168.100.197:3306)
Wed Jul 27 17:21:35 2011 - [info]   ha-db02(192.168.100.198:3306)
Wed Jul 27 17:21:35 2011 - [info]   ha-mgr01(192.168.100.200:3306)
Wed Jul 27 17:21:35 2011 - [info] Alive Slaves:
Wed Jul 27 17:21:35 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:21:35 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:21:35 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:21:35 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:21:35 2011 - [info] Current Master: ha-db01(192.168.100.197:3306)
Wed Jul 27 17:21:35 2011 - [info] Checking slave configurations..
Wed Jul 27 17:21:35 2011 - [warn]  read_only=1 is not set on slave ha-db02(192.168.100.198:3306).
Wed Jul 27 17:21:35 2011 - [warn]  relay_log_purge=0 is not set on slave ha-db02(192.168.100.198:3306).
Wed Jul 27 17:21:35 2011 - [warn]  read_only=1 is not set on slave ha-mgr01(192.168.100.200:3306).
Wed Jul 27 17:21:35 2011 - [warn]  relay_log_purge=0 is not set on slave ha-mgr01(192.168.100.200:3306).
Wed Jul 27 17:21:35 2011 - [info] Checking replication filtering settings..
Wed Jul 27 17:21:35 2011 - [info]  binlog_do_db= , binlog_ignore_db= 
Wed Jul 27 17:21:35 2011 - [info]  Replication filtering check ok.
Wed Jul 27 17:21:35 2011 - [info] Starting SSH connection tests..
Wed Jul 27 17:21:36 2011 - [info] All SSH connection tests passed successfully.
Wed Jul 27 17:21:36 2011 - [info] Checking MHA Node version..
Wed Jul 27 17:21:37 2011 - [info]  Version check ok.
Wed Jul 27 17:21:37 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on the current master..
Wed Jul 27 17:21:37 2011 - [info]   Executing command: save_binary_logs --command=test --start_file=mysql-bin.000044 --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.50 
Wed Jul 27 17:21:37 2011 - [info]   Connecting to root@ha-db01(ha-db01).. 
  Creating /var/log/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000044
Wed Jul 27 17:21:37 2011 - [info] Master setting check done.
Wed Jul 27 17:21:37 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Jul 27 17:21:37 2011 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=ha-db02 --slave_ip=192.168.100.198 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --manager_version=0.50 --relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
Wed Jul 27 17:21:37 2011 - [info]   Connecting to root@192.168.100.198(ha-db02).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000018
    Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000018
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jul 27 17:21:38 2011 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=ha-mgr01 --slave_ip=192.168.100.200 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --manager_version=0.50 --relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
Wed Jul 27 17:21:38 2011 - [info]   Connecting to root@192.168.100.200(ha-mgr01).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to ha-mgr01-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/ha-mgr01-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Wed Jul 27 17:21:38 2011 - [info] Slaves settings check done.
Wed Jul 27 17:21:38 2011 - [info] 
ha-db01 (current master)
 +--ha-db02
 +--ha-mgr01
Wed Jul 27 17:21:38 2011 - [warn] master_ip_failover_script is not defined.
Wed Jul 27 17:21:38 2011 - [warn] shutdown_script is not defined.
Wed Jul 27 17:21:38 2011 - [info] Set master ping interval 3 seconds.
Wed Jul 27 17:21:38 2011 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s 192.168.100.198 --user=root --master_ip=192.168.100.197 --master_port=3306 --master_host=ha-db01 -s 192.168.100.197
Wed Jul 27 17:21:38 2011 - [info] Starting ping health check on ha-db01(192.168.100.197:3306)..
Wed Jul 27 17:21:38 2011 - [info] Ping succeeded, sleeping until it doesn't respond..
Wed Jul 27 17:22:53 2011 - [warn] Got error on MySQL ping: 2006 (MySQL server has gone away)
Wed Jul 27 17:22:53 2011 - [info] Executing seconary network check script: /usr/local/bin/masterha_secondary_check -s 192.168.100.198 --user=root --master_ip=192.168.100.197 --master_port=3306 --master_host=ha-db01 -s 192.168.100.197  --user=root  --master_host=ha-db01  --master_ip=192.168.100.197  --master_port=3306
Wed Jul 27 17:22:53 2011 - [info] HealthCheck: SSH to ha-db01 is reachable.
Monitoring server 192.168.100.198 is reachable, Master is not reachable from 192.168.100.198. OK.
Monitoring server 192.168.100.197 is reachable, Master is not reachable from 192.168.100.197. OK.
Wed Jul 27 17:22:54 2011 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Wed Jul 27 17:22:56 2011 - [warn] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.197' (111))
Wed Jul 27 17:22:56 2011 - [warn] Connection failed 1 time(s)..
Wed Jul 27 17:22:59 2011 - [warn] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.197' (111))
Wed Jul 27 17:22:59 2011 - [warn] Connection failed 2 time(s)..
Wed Jul 27 17:23:02 2011 - [warn] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.100.197' (111))
Wed Jul 27 17:23:02 2011 - [warn] Connection failed 3 time(s)..
Wed Jul 27 17:23:02 2011 - [warn] Master is not reachable from health checker!
Wed Jul 27 17:23:02 2011 - [warn] Master ha-db01(192.168.100.197:3306) is not reachable!
Wed Jul 27 17:23:02 2011 - [warn] SSH is reachable.
Wed Jul 27 17:23:02 2011 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/app1.cnf again, and trying to connect to all servers to check server status..
Wed Jul 27 17:23:02 2011 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Wed Jul 27 17:23:02 2011 - [info] Reading application default configurations from /etc/app1.cnf..
Wed Jul 27 17:23:02 2011 - [info] Reading server configurations from /etc/app1.cnf..
Wed Jul 27 17:23:02 2011 - [info] Dead Servers:
Wed Jul 27 17:23:02 2011 - [info]   ha-db01(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] Alive Servers:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)
Wed Jul 27 17:23:02 2011 - [info] Alive Slaves:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] Checking slave configurations..
Wed Jul 27 17:23:02 2011 - [warn]  read_only=1 is not set on slave ha-db02(192.168.100.198:3306).
Wed Jul 27 17:23:02 2011 - [warn]  relay_log_purge=0 is not set on slave ha-db02(192.168.100.198:3306).
Wed Jul 27 17:23:02 2011 - [warn]  read_only=1 is not set on slave ha-mgr01(192.168.100.200:3306).
Wed Jul 27 17:23:02 2011 - [warn]  relay_log_purge=0 is not set on slave ha-mgr01(192.168.100.200:3306).
Wed Jul 27 17:23:02 2011 - [info] Checking replication filtering settings..
Wed Jul 27 17:23:02 2011 - [info]  Replication filtering check ok.
Wed Jul 27 17:23:02 2011 - [info] Master is down!
Wed Jul 27 17:23:02 2011 - [info] Terminating monitoring script.
Wed Jul 27 17:23:02 2011 - [info] Got exit code 20 (Master dead).
Wed Jul 27 17:23:02 2011 - [info] MHA::MasterFailover version 0.50.
Wed Jul 27 17:23:02 2011 - [info] Starting master failover.
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 1: Configuration Check Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] Dead Servers:
Wed Jul 27 17:23:02 2011 - [info]   ha-db01(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] Checking master reachability via mysql(double check)..
Wed Jul 27 17:23:02 2011 - [info]  ok.
Wed Jul 27 17:23:02 2011 - [info] Alive Servers:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)
Wed Jul 27 17:23:02 2011 - [info] Alive Slaves:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] ** Phase 1: Configuration Check Phase completed.
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 2: Dead Master Shutdown Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] Forcing shutdown so that applications never connect to the current master..
Wed Jul 27 17:23:02 2011 - [warn] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Wed Jul 27 17:23:02 2011 - [warn] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Wed Jul 27 17:23:02 2011 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 3: Master Recovery Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] The latest binary log file/position on all slaves is mysql-bin.000044:107
Wed Jul 27 17:23:02 2011 - [info] Latest slaves (Slaves that received relay log files to the latest):
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] The oldest binary log file/position on all slaves is mysql-bin.000044:107
Wed Jul 27 17:23:02 2011 - [info] Oldest slaves:
Wed Jul 27 17:23:02 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
Wed Jul 27 17:23:02 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Wed Jul 27 17:23:02 2011 - [info] 
Wed Jul 27 17:23:02 2011 - [info] Fetching dead master's binary logs..
Wed Jul 27 17:23:02 2011 - [info] Executing command on the dead master ha-db01(192.168.100.197:3306): save_binary_logs --command=save --start_file=mysql-bin.000044  --start_pos=107 --binlog_dir=/var/lib/mysql --output_file=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000044 pos 107 to mysql-bin.000044 EOF into /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog ..
  Dumping binlog format description event, from position 0 to 107.. ok.
  Dumping effective binlog data from /var/lib/mysql/mysql-bin.000044 position 107 to tail(126).. ok.
 Concat succeeded.
Wed Jul 27 17:23:03 2011 - [info] scp from root@192.168.100.197:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog succeeded.
Wed Jul 27 17:23:03 2011 - [info] HealthCheck: SSH to ha-db02 is reachable.
Wed Jul 27 17:23:04 2011 - [info] HealthCheck: SSH to ha-mgr01 is reachable.
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] * Phase 3.3: Determining New Master Phase..
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Jul 27 17:23:04 2011 - [info] All slaves received relay logs to the same position. No need to resync each other.
Wed Jul 27 17:23:04 2011 - [info] Searching new master from slaves..
Wed Jul 27 17:23:04 2011 - [info]  Candidate masters from the configuration file:
Wed Jul 27 17:23:04 2011 - [info]  Non-candidate masters:
Wed Jul 27 17:23:04 2011 - [info] New master is ha-db02(192.168.100.198:3306)
Wed Jul 27 17:23:04 2011 - [info] Starting master failover..
Wed Jul 27 17:23:04 2011 - [info] 
From:
ha-db01 (current master)
 +--ha-db02
 +--ha-mgr01
To:
ha-db02 (new master)
 +--ha-mgr01
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jul 27 17:23:04 2011 - [info] Sending binlog..
Wed Jul 27 17:23:04 2011 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog to root@ha-db02:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog succeeded.
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] * Phase 3.4: Master Log Apply Phase..
Wed Jul 27 17:23:04 2011 - [info] 
Wed Jul 27 17:23:04 2011 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Jul 27 17:23:04 2011 - [info] Starting recovery on ha-db02(192.168.100.198:3306)..
Wed Jul 27 17:23:04 2011 - [info]  Generating diffs succeeded.
Wed Jul 27 17:23:04 2011 - [info] Waiting until all relay logs are applied.
Wed Jul 27 17:23:04 2011 - [info]  done.
Wed Jul 27 17:23:05 2011 - [info] Getting slave status..
Wed Jul 27 17:23:05 2011 - [info] This slave(ha-db02)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000044:107). No need to recover from Exec_Master_Log_Pos.
Wed Jul 27 17:23:05 2011 - [info] Connecting to the target slave host ha-db02, running recover script..
Wed Jul 27 17:23:05 2011 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=ha-db02 --slave_ip=192.168.100.198  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --timestamp=20110727172302 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50 --slave_pass=xxx
Wed Jul 27 17:23:05 2011 - [info] 
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog on ha-db02:3306. This may take long time...
Applying log files succeeded.
Wed Jul 27 17:23:05 2011 - [info]  All relay logs were successfully applied.
Wed Jul 27 17:23:05 2011 - [info] Getting new master's binlog name and position..
Wed Jul 27 17:23:05 2011 - [info]  mysql-bin.000006:107
Wed Jul 27 17:23:05 2011 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='ha-db02 or 192.168.100.198', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=107, MASTER_USER='replication', MASTER_PASSWORD='xxx';
Wed Jul 27 17:23:05 2011 - [warn] master_ip_failover_script is not set. Skipping taking over new master ip address.
Wed Jul 27 17:23:05 2011 - [info] ** Finished master recovery successfully.
Wed Jul 27 17:23:05 2011 - [info] * Phase 3: Master Recovery Phase completed.
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] * Phase 4: Slaves Recovery Phase..
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] -- Slave diff file generation on host ha-mgr01(192.168.100.200:3306) started, pid: 23533. Check tmp log /var/log/masterha/app1/ha-mgr02_3306_20110727172302.log if it takes time..
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] Log messages from ha-mgr01 ...
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jul 27 17:23:05 2011 - [info] End of log messages from ha-mgr01.
Wed Jul 27 17:23:05 2011 - [info] -- ha-mgr01(192.168.100.200:3306) has the latest relay log events.
Wed Jul 27 17:23:05 2011 - [info] Generating relay diff files from the latest slave succeeded.
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Jul 27 17:23:05 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] -- Slave recovery on host ha-mgr01(192.168.100.200:3306) started, pid: 23535. Check tmp log /var/log/masterha/app1/ha-mgr02_3306_20110727172302.log if it takes time..
Wed Jul 27 17:23:06 2011 - [info] 
Wed Jul 27 17:23:06 2011 - [info] Log messages from ha-mgr01 ...
Wed Jul 27 17:23:06 2011 - [info] 
Wed Jul 27 17:23:05 2011 - [info] Sending binlog..
Wed Jul 27 17:23:05 2011 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog to root@ha-mgr01:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog succeeded.
Wed Jul 27 17:23:05 2011 - [info] Starting recovery on ha-mgr01(192.168.100.200:3306)..
Wed Jul 27 17:23:05 2011 - [info]  Generating diffs succeeded.
Wed Jul 27 17:23:05 2011 - [info] Waiting until all relay logs are applied.
Wed Jul 27 17:23:05 2011 - [info]  done.
Wed Jul 27 17:23:05 2011 - [info] Getting slave status..
Wed Jul 27 17:23:05 2011 - [info] This slave(ha-mgr01)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000044:107). No need to recover from Exec_Master_Log_Pos.
Wed Jul 27 17:23:05 2011 - [info] Connecting to the target slave host ha-mgr01, running recover script..
Wed Jul 27 17:23:05 2011 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=ha-mgr01 --slave_ip=192.168.100.200  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --timestamp=20110727172302 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50 --slave_pass=xxx
Wed Jul 27 17:23:06 2011 - [info] 
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110727172302.binlog on ha-mgr01:3306. This may take long time...
Applying log files succeeded.
Wed Jul 27 17:23:06 2011 - [info]  All relay logs were successfully applied.
Wed Jul 27 17:23:06 2011 - [info]  Resetting slave ha-mgr01(192.168.100.200:3306) and starting replication from the new master ha-db02(192.168.100.198:3306)..
Wed Jul 27 17:23:06 2011 - [info]  Executed CHANGE MASTER.
Wed Jul 27 17:23:06 2011 - [info]  Slave started.
Wed Jul 27 17:23:06 2011 - [info] End of log messages from ha-mgr01.
Wed Jul 27 17:23:06 2011 - [info] -- Slave recovery on host ha-mgr01(192.168.100.200:3306) succeeded.
Wed Jul 27 17:23:06 2011 - [info] All new slave servers recovered successfully.
Wed Jul 27 17:23:06 2011 - [info] 
Wed Jul 27 17:23:06 2011 - [info] * Phase 5: New master cleanup phease..
Wed Jul 27 17:23:06 2011 - [info] 
Wed Jul 27 17:23:06 2011 - [info] Resetting slave info on the new master..
Wed Jul 27 17:23:06 2011 - [info] Master failover to ha-db02(192.168.100.198:3306) completed successfully.
Wed Jul 27 17:23:06 2011 - [info] 
----- Failover Report -----
app1: MySQL Master failover ha-db01 to ha-db02 succeeded
Master ha-db01 is down!
Check MHA Manager logs at ha-mgr01.forschooner.net:/var/log/masterha/app1/app1.log for details.
Started automated(non-interactive) failover.
The latest slave ha-db02(192.168.100.198:3306) has all relay logs for recovery.
Selected ha-db02 as a new master.
ha-db02: OK: Applying all logs succeeded.
ha-mgr01: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
ha-mgr01: OK: Applying all logs succeeded. Slave started, replicating from ha-db02.
ha-db02: Resetting slave info succeeded.
Master failover to ha-db02(192.168.100.198:3306) completed successfully.
Wed Jul 27 17:23:06 2011 - [info] Sending mail..
Unknown option: conf

    Wednesday, August 3, 2011

    MySQL MHA - switchover

    As MySQL MHA Node and Manager have been installed, next is switching master server manually. 

    The relation Master and Slaves, including MHA manager and nodes, is below. 

    ha-db01:   Master + MHA Node → MySQL stopped 
    ha-db02:   Slave + MHA Node → Master  
    ha-mgr01: Slave + MHA Manager → Slave 

    This is how switchover occurs.
    1. stop mysql on master. 
    2. switch master to Slave(ha-db02) manually.
    3. slave server(ha-mgr02) recognizes ha-db02 as master.
    • Currently logged in MHA Manger(ha-mgr01)
    • Setting up hosts file to connect via hostname
    # cat >> /etc/hosts <<EOF
    192.168.100.197 ha-db01
    192.168.100.198 ha-db02
    192.168.100.200 ha-mgr01
    EOF
    • Displaying a list of replication slaves registered with Master
    # mysql -uroot -pmysql -e 'SHOW SLAVE HOSTS\G' -h ha-db01
    *************************** 1. row ***************************
    Server_id: 300
         Host: 
         Port: 3306
    Master_id: 100
    *************************** 2. row ***************************
    Server_id: 200
         Host: 
         Port: 3306
    Master_id: 100
    
    • Checking the replication status of the slave servers
    # mysql -uroot -pmysql -e 'SHOW SLAVE STATUS\G' -h ha-db02
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.100.197
                      Master_User: replication
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000045
              Read_Master_Log_Pos: 107
                   Relay_Log_File: mysqld-relay-bin.00001
                    Relay_Log_Pos: 253
            Relay_Master_Log_File: mysql-bin.000045
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 107
                  Relay_Log_Space: 556
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 100 
    
    # mysql -uroot -pmysql -e 'SHOW SLAVE STATUS\G' -h localhost
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.100.197
                      Master_User: replication
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000045
              Read_Master_Log_Pos: 107
                   Relay_Log_File: ha-mgr01-relay-bin.000002
                    Relay_Log_Pos: 253
            Relay_Master_Log_File: mysql-bin.000045
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 107
                  Relay_Log_Space: 412
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 100 
    
    • Setting up the Global configuration file
    # mkdir -p /var/log/masterha/app1
    # cat > /etc/masterha_default.cnf <<EOF
    [server default]
    user=root
    password=mysql
    ssh_user=root
    master_binlog_dir=/var/lib/mysql
    remote_workdir=/tmp/masterha
    secondary_check_script=/usr/local/bin/masterha_secondary_check -s ha-db02 --user=root --master_ip=192.168.100.197 --master_port=3306 --master_host=ha-db01 -s ha-db01
    ping_interval=3
    #master_ip_failover_script=/usr/local/bin/master_ip_failover
    #master_ip_online_change_script=/usr/local/bin/master_ip_online_change
    #shutdown_script=/usr/local/bin/power_manager
    report_script=/usr/local/bin/send_report
    EOF
    
    • Setting up the Application configuration file
    # cat > /etc/app1.cnf <<EOF
    [server default]
    user=root
    password=mysql
    manager_workdir=/var/log/masterha/app1
    manager_log=/var/log/masterha/app1/app1.log
    remote_workdir=/var/log/masterha/app1
    [server1]
    hostname=ha-db01
    [server2]
    hostname=ha-db02
    [server2]
    hostname=ha-mgr01
    EOF
    
    • Checking to connect via ssh with MHA Manager and Nodes
    # masterha_check_ssh --conf=/etc/app1.cnf 
    Thu Jul 28 10:32:27 2011 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
    Thu Jul 28 10:32:27 2011 - [info] Reading application default configurations from /etc/app1.cnf..
    Thu Jul 28 10:32:27 2011 - [info] Reading server configurations from /etc/app1.cnf..
    Thu Jul 28 10:32:27 2011 - [info] Starting SSH connection tests..
    Thu Jul 28 10:32:28 2011 - [debug] 
    Thu Jul 28 10:32:27 2011 - [debug]  Connecting via SSH from root@ha-db01(192.168.100.197) to root@ha-db02(192.168.100.198)..
    Thu Jul 28 10:32:28 2011 - [debug]   ok.
    Thu Jul 28 10:32:28 2011 - [debug]  Connecting via SSH from root@ha-db01(192.168.100.197) to root@ha-mgr02(192.168.100.200)..
    Thu Jul 28 10:32:28 2011 - [debug]   ok.
    Thu Jul 28 10:32:29 2011 - [debug] 
    Thu Jul 28 10:32:28 2011 - [debug]  Connecting via SSH from root@ha-db02(192.168.100.198) to root@ha-db01(192.168.100.197)..
    Thu Jul 28 10:32:28 2011 - [debug]   ok.
    Thu Jul 28 10:32:28 2011 - [debug]  Connecting via SSH from root@ha-db02(192.168.100.198) to root@ha-mgr02(192.168.100.200)..
    Thu Jul 28 10:32:28 2011 - [debug]   ok.
    Thu Jul 28 10:32:29 2011 - [debug] 
    Thu Jul 28 10:32:28 2011 - [debug]  Connecting via SSH from root@ha-mgr02(192.168.100.200) to root@ha-db01(192.168.100.197)..
    Thu Jul 28 10:32:29 2011 - [debug]   ok.
    Thu Jul 28 10:32:29 2011 - [debug]  Connecting via SSH from root@ha-mgr02(192.168.100.200) to root@ha-db02(192.168.100.198)..
    Thu Jul 28 10:32:29 2011 - [debug]   ok.
    Thu Jul 28 10:32:29 2011 - [info] All SSH connection tests passed successfully.
    
    • Checking if MySQL replication is enabled
    # masterha_check_repl --conf=/etc/app1.cnf 
    Thu Jul 28 10:32:39 2011 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
    Thu Jul 28 10:32:39 2011 - [info] Reading application default configurations from /etc/app1.cnf..
    Thu Jul 28 10:32:39 2011 - [info] Reading server configurations from /etc/app1.cnf..
    Thu Jul 28 10:32:39 2011 - [info] MHA::MasterMonitor version 0.50.
    Thu Jul 28 10:32:39 2011 - [info] Dead Servers:
    Thu Jul 28 10:32:39 2011 - [info] Alive Servers:
    Thu Jul 28 10:32:39 2011 - [info]   ha-db01(192.168.100.197:3306)
    Thu Jul 28 10:32:39 2011 - [info]   ha-db02(192.168.100.198:3306)
    Thu Jul 28 10:32:39 2011 - [info]   ha-mgr02(192.168.100.200:3306)
    Thu Jul 28 10:32:39 2011 - [info] Alive Slaves:
    Thu Jul 28 10:32:39 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
    Thu Jul 28 10:32:39 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
    Thu Jul 28 10:32:39 2011 - [info]   ha-mgr02(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
    Thu Jul 28 10:32:39 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
    Thu Jul 28 10:32:39 2011 - [info] Current Master: ha-db01(192.168.100.197:3306)
    Thu Jul 28 10:32:39 2011 - [info] Checking slave configurations..
    Thu Jul 28 10:32:39 2011 - [warn]  read_only=1 is not set on slave ha-db02(192.168.100.198:3306).
    Thu Jul 28 10:32:39 2011 - [warn]  relay_log_purge=0 is not set on slave ha-db02(192.168.100.198:3306).
    Thu Jul 28 10:32:39 2011 - [warn]  read_only=1 is not set on slave ha-mgr02(192.168.100.200:3306).
    Thu Jul 28 10:32:39 2011 - [warn]  relay_log_purge=0 is not set on slave ha-mgr02(192.168.100.200:3306).
    Thu Jul 28 10:32:39 2011 - [info] Checking replication filtering settings..
    Thu Jul 28 10:32:39 2011 - [info]  binlog_do_db= , binlog_ignore_db= 
    Thu Jul 28 10:32:39 2011 - [info]  Replication filtering check ok.
    Thu Jul 28 10:32:39 2011 - [info] Starting SSH connection tests..
    Thu Jul 28 10:32:41 2011 - [info] All SSH connection tests passed successfully.
    Thu Jul 28 10:32:41 2011 - [info] Checking MHA Node version..
    Thu Jul 28 10:32:41 2011 - [info]  Version check ok.
    Thu Jul 28 10:32:41 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on the current master..
    Thu Jul 28 10:32:42 2011 - [info]   Executing command: save_binary_logs --command=test --start_file=mysql-bin.000045 --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.50 
    Thu Jul 28 10:32:42 2011 - [info]   Connecting to root@ha-db01(ha-db01).. 
      Creating /var/log/masterha/app1 if not exists..    ok.
      Checking output directory is accessible or not..
       ok.
      Binlog found at /var/lib/mysql, up to mysql-bin.000045
    Thu Jul 28 10:32:42 2011 - [info] Master setting check done.
    Thu Jul 28 10:32:42 2011 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
    Thu Jul 28 10:32:42 2011 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=ha-db02 --slave_ip=192.168.100.198 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --manager_version=0.50 --relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
    Thu Jul 28 10:32:42 2011 - [info]   Connecting to root@192.168.100.198(ha-db02).. 
      Checking slave recovery environment settings..
        Opening /var/lib/mysql/relay-log.info ... ok.
        Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000017
        Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000017
        Testing mysql connection and privileges.. done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Thu Jul 28 10:32:42 2011 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=ha-mgr02 --slave_ip=192.168.100.200 --slave_port=3306 --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --manager_version=0.50 --relay_log_info=/var/lib/mysql/relay-log.info  --slave_pass=xxx
    Thu Jul 28 10:32:42 2011 - [info]   Connecting to root@192.168.100.200(ha-mgr02).. 
      Checking slave recovery environment settings..
        Opening /var/lib/mysql/relay-log.info ... ok.
        Relay log found at /var/lib/mysql, up to ha-mgr02-relay-bin.000002
        Temporary relay log file is /var/lib/mysql/ha-mgr02-relay-bin.000002
        Testing mysql connection and privileges.. done.
        Testing mysqlbinlog output.. done.
        Cleaning up test file(s).. done.
    Thu Jul 28 10:32:42 2011 - [info] Slaves settings check done.
    Thu Jul 28 10:32:42 2011 - [info] 
    ha-db01 (current master)
     +--ha-db02
     +--ha-mgr02
    Thu Jul 28 10:32:42 2011 - [info] Checking replication health on ha-db02..
    Thu Jul 28 10:32:42 2011 - [info]  ok.
    Thu Jul 28 10:32:42 2011 - [info] Checking replication health on ha-mgr02..
    Thu Jul 28 10:32:42 2011 - [info]  ok.
    Thu Jul 28 10:32:42 2011 - [warn] master_ip_failover_script is not defined.
    Thu Jul 28 10:32:42 2011 - [warn] shutdown_script is not defined.
    Thu Jul 28 10:32:42 2011 - [info] Got exit code 0 (Not master dead).
    MySQL Replication Health is OK. 
    
    • Starting up the Node Manager
    # masterha_manager --conf=/etc/app1.cnf 
    • Checking the status 
    # masterha_check_status --conf=/etc/app1.cnf 
    • Stopping MySQL Daemon
    # ssh ha-db01 '/etc/init.d/mysql stop' 
    • Changing Master Server ha-db01 to ha-db02 manually
    # masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=ha-db01 --new_master_host=ha-db02
    --dead_master_ip=<dead_master_ip> is not set. Using 192.168.100.197.
    --dead_master_port=<dead_master_port> is not set. Using 3306.
    Thu Jul 28 10:39:09 2011 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
    Thu Jul 28 10:39:09 2011 - [info] Reading application default configurations from /etc/app1.cnf..
    Thu Jul 28 10:39:09 2011 - [info] Reading server configurations from /etc/app1.cnf..
    Thu Jul 28 10:39:09 2011 - [info] MHA::MasterFailover version 0.50.
    Thu Jul 28 10:39:09 2011 - [info] Starting master failover.
    Thu Jul 28 10:39:09 2011 - [info] 
    Thu Jul 28 10:39:09 2011 - [info] * Phase 1: Configuration Check Phase..
    Thu Jul 28 10:39:09 2011 - [info] 
    Thu Jul 28 10:39:09 2011 - [info] Dead Servers:
    Thu Jul 28 10:39:09 2011 - [info]   ha-db01(192.168.100.197:3306)
    Thu Jul 28 10:39:09 2011 - [info] Checking master reachability via mysql(double check)..
    Thu Jul 28 10:39:09 2011 - [info]  ok.
    Thu Jul 28 10:39:09 2011 - [info] Alive Servers:
    Thu Jul 28 10:39:09 2011 - [info]   ha-db02(192.168.100.198:3306)
    Thu Jul 28 10:39:09 2011 - [info]   ha-mgr01(192.168.100.200:3306)
    Thu Jul 28 10:39:09 2011 - [info] Alive Slaves:
    Thu Jul 28 10:39:09 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
    Thu Jul 28 10:39:09 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
    Thu Jul 28 10:39:09 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
    Thu Jul 28 10:39:09 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
    Master ha-db01 is dead. Proceed? (yes/NO): yes
    Thu Jul 28 10:39:19 2011 - [info] ** Phase 1: Configuration Check Phase completed.
    Thu Jul 28 10:39:19 2011 - [info] 
    Thu Jul 28 10:39:19 2011 - [info] * Phase 2: Dead Master Shutdown Phase..
    Thu Jul 28 10:39:19 2011 - [info] 
    Thu Jul 28 10:39:19 2011 - [info] HealthCheck: SSH to ha-db01 is reachable.
    Thu Jul 28 10:39:19 2011 - [info] Forcing shutdown so that applications never connect to the current master..
    Thu Jul 28 10:39:19 2011 - [warn] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
    Thu Jul 28 10:39:19 2011 - [warn] shutdown_script is not set. Skipping explicit shutting down of the dead master.
    Thu Jul 28 10:39:19 2011 - [info] * Phase 2: Dead Master Shutdown Phase completed.
    Thu Jul 28 10:39:19 2011 - [info] 
    Thu Jul 28 10:39:19 2011 - [info] * Phase 3: Master Recovery Phase..
    Thu Jul 28 10:39:19 2011 - [info] 
    Thu Jul 28 10:39:19 2011 - [info] * Phase 3.1: Getting Latest Slaves Phase..
    Thu Jul 28 10:39:19 2011 - [info] 
    Thu Jul 28 10:39:19 2011 - [info] The latest binary log file/position on all slaves is mysql-bin.000045:107
    Thu Jul 28 10:39:19 2011 - [info] Latest slaves (Slaves that received relay log files to the latest):
    Thu Jul 28 10:39:19 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
    Thu Jul 28 10:39:19 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
    Thu Jul 28 10:39:19 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
    Thu Jul 28 10:39:19 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
    Thu Jul 28 10:39:19 2011 - [info] The oldest binary log file/position on all slaves is mysql-bin.000045:107
    Thu Jul 28 10:39:19 2011 - [info] Oldest slaves:
    Thu Jul 28 10:39:19 2011 - [info]   ha-db02(192.168.100.198:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
    Thu Jul 28 10:39:19 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
    Thu Jul 28 10:39:19 2011 - [info]   ha-mgr01(192.168.100.200:3306)  Version=5.5.12-log (oldest major version between slaves) log-bin:enabled
    Thu Jul 28 10:39:19 2011 - [info]     Replicating from 192.168.100.197(192.168.100.197:3306)
    Thu Jul 28 10:39:19 2011 - [info] 
    Thu Jul 28 10:39:19 2011 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
    Thu Jul 28 10:39:19 2011 - [info] 
    Thu Jul 28 10:39:19 2011 - [info] Fetching dead master's binary logs..
    Thu Jul 28 10:39:19 2011 - [info] Executing command on the dead master ha-db01(192.168.100.197:3306): save_binary_logs --command=save --start_file=mysql-bin.000045  --start_pos=107 --binlog_dir=/var/lib/mysql --output_file=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50
      Creating /var/log/masterha/app1 if not exists..    ok.
     Concat binary/relay logs from mysql-bin.000045 pos 107 to mysql-bin.000045 EOF into /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog ..
      Dumping binlog format description event, from position 0 to 107.. ok.
      Dumping effective binlog data from /var/lib/mysql/mysql-bin.000045 position 107 to tail(126).. ok.
     Concat succeeded.
    saved_master_binlog_from_ha-db01_3306_20110728103909.binlog                                    100%  126     0.1KB/s   00:00    
    Thu Jul 28 10:39:20 2011 - [info] scp from root@192.168.100.197:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog succeeded.
    Thu Jul 28 10:39:20 2011 - [info] HealthCheck: SSH to ha-db02 is reachable.
    Thu Jul 28 10:39:21 2011 - [info] HealthCheck: SSH to ha-mgr01 is reachable.
    Thu Jul 28 10:39:21 2011 - [info] 
    Thu Jul 28 10:39:21 2011 - [info] * Phase 3.3: Determining New Master Phase..
    Thu Jul 28 10:39:21 2011 - [info] 
    Thu Jul 28 10:39:21 2011 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
    Thu Jul 28 10:39:21 2011 - [info] All slaves received relay logs to the same position. No need to resync each other.
    Thu Jul 28 10:39:21 2011 - [info] ha-db02 can be new master.
    Thu Jul 28 10:39:21 2011 - [info] New master is ha-db02(192.168.100.198:3306)
    Thu Jul 28 10:39:21 2011 - [info] Starting master failover..
    Thu Jul 28 10:39:21 2011 - [info] 
    From:
    ha-db01 (current master)
     +--ha-db02
     +--ha-mgr01
    To:
    ha-db02 (new master)
     +--ha-mgr01
    Starting master switch from ha-db01(192.168.100.197:3306) to ha-db02(192.168.100.198:3306)? (yes/NO): yes
    Thu Jul 28 10:39:29 2011 - [info] New master decided manually is ha-db02(192.168.100.198:3306)
    Thu Jul 28 10:39:29 2011 - [info] 
    Thu Jul 28 10:39:29 2011 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
    Thu Jul 28 10:39:29 2011 - [info] 
    Thu Jul 28 10:39:29 2011 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
    Thu Jul 28 10:39:29 2011 - [info] Sending binlog..
    saved_master_binlog_from_ha-db01_3306_20110728103909.binlog                                    100%  126     0.1KB/s   00:00    
    Thu Jul 28 10:39:30 2011 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog to root@ha-db02:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog succeeded.
    Thu Jul 28 10:39:30 2011 - [info] 
    Thu Jul 28 10:39:30 2011 - [info] * Phase 3.4: Master Log Apply Phase..
    Thu Jul 28 10:39:30 2011 - [info] 
    Thu Jul 28 10:39:30 2011 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
    Thu Jul 28 10:39:30 2011 - [info] Starting recovery on ha-db02(192.168.100.198:3306)..
    Thu Jul 28 10:39:30 2011 - [info]  Generating diffs succeeded.
    Thu Jul 28 10:39:30 2011 - [info] Waiting until all relay logs are applied.
    Thu Jul 28 10:39:30 2011 - [info]  done.
    Thu Jul 28 10:39:30 2011 - [info] Getting slave status..
    Thu Jul 28 10:39:30 2011 - [info] This slave(ha-db02)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000045:107). No need to recover from Exec_Master_Log_Pos.
    Thu Jul 28 10:39:30 2011 - [info] Connecting to the target slave host ha-db02, running recover script..
    Thu Jul 28 10:39:30 2011 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=ha-db02 --slave_ip=192.168.100.198  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --timestamp=20110728103909 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50 --slave_pass=xxx
    Thu Jul 28 10:39:30 2011 - [info] 
    Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog on ha-db02:3306. This may take long time...
    Applying log files succeeded.
    Thu Jul 28 10:39:30 2011 - [info]  All relay logs were successfully applied.
    Thu Jul 28 10:39:30 2011 - [info] Getting new master's binlog name and position..
    Thu Jul 28 10:39:30 2011 - [info]  mysql-bin.000006:107
    Thu Jul 28 10:39:30 2011 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='ha-db02 or 192.168.100.198', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=107, MASTER_USER='replication', MASTER_PASSWORD='xxx';
    Thu Jul 28 10:39:30 2011 - [warn] master_ip_failover_script is not set. Skipping taking over new master ip address.
    Thu Jul 28 10:39:30 2011 - [info] ** Finished master recovery successfully.
    Thu Jul 28 10:39:30 2011 - [info] * Phase 3: Master Recovery Phase completed.
    Thu Jul 28 10:39:30 2011 - [info] 
    Thu Jul 28 10:39:30 2011 - [info] * Phase 4: Slaves Recovery Phase..
    Thu Jul 28 10:39:30 2011 - [info] 
    Thu Jul 28 10:39:30 2011 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
    Thu Jul 28 10:39:30 2011 - [info] 
    Thu Jul 28 10:39:30 2011 - [info] -- Slave diff file generation on host ha-mgr01(192.168.100.200:3306) started, pid: 18983. Check tmp log /var/log/masterha/app1/ha-mgr02_3306_20110728103909.log if it takes time..
    Thu Jul 28 10:39:30 2011 - [info] 
    Thu Jul 28 10:39:30 2011 - [info] Log messages from ha-mgr01 ...
    Thu Jul 28 10:39:30 2011 - [info] 
    Thu Jul 28 10:39:30 2011 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
    Thu Jul 28 10:39:30 2011 - [info] End of log messages from ha-mgr01.
    Thu Jul 28 10:39:30 2011 - [info] -- ha-mgr01(192.168.100.200:3306) has the latest relay log events.
    Thu Jul 28 10:39:30 2011 - [info] Generating relay diff files from the latest slave succeeded.
    Thu Jul 28 10:39:30 2011 - [info] 
    Thu Jul 28 10:39:30 2011 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
    Thu Jul 28 10:39:30 2011 - [info] 
    Thu Jul 28 10:39:30 2011 - [info] -- Slave recovery on host ha-mgr01(192.168.100.200:3306) started, pid: 18985. Check tmp log /var/log/masterha/app1/ha-mgr02_3306_20110728103909.log if it takes time..
    saved_master_binlog_from_ha-db01_3306_20110728103909.binlog                                    100%  126     0.1KB/s   00:00    
    Thu Jul 28 10:39:31 2011 - [info] 
    Thu Jul 28 10:39:31 2011 - [info] Log messages from ha-mgr01 ...
    Thu Jul 28 10:39:31 2011 - [info] 
    Thu Jul 28 10:39:30 2011 - [info] Sending binlog..
    Thu Jul 28 10:39:31 2011 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog to root@ha-mgr01:/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog succeeded.
    Thu Jul 28 10:39:31 2011 - [info] Starting recovery on ha-mgr01(192.168.100.200:3306)..
    Thu Jul 28 10:39:31 2011 - [info]  Generating diffs succeeded.
    Thu Jul 28 10:39:31 2011 - [info] Waiting until all relay logs are applied.
    Thu Jul 28 10:39:31 2011 - [info]  done.
    Thu Jul 28 10:39:31 2011 - [info] Getting slave status..
    Thu Jul 28 10:39:31 2011 - [info] This slave(ha-mgr01)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000045:107). No need to recover from Exec_Master_Log_Pos.
    Thu Jul 28 10:39:31 2011 - [info] Connecting to the target slave host ha-mgr01, running recover script..
    Thu Jul 28 10:39:31 2011 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=ha-mgr01 --slave_ip=192.168.100.200  --slave_port=3306 --apply_files=/var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog --workdir=/var/log/masterha/app1 --target_version=5.5.12-log --timestamp=20110728103909 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.50 --slave_pass=xxx
    Thu Jul 28 10:39:31 2011 - [info] 
    Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_ha-db01_3306_20110728103909.binlog on ha-mgr01:3306. This may take long time...
    Applying log files succeeded.
    Thu Jul 28 10:39:31 2011 - [info]  All relay logs were successfully applied.
    Thu Jul 28 10:39:31 2011 - [info]  Resetting slave ha-mgr01(192.168.100.200:3306) and starting replication from the new master ha-db02(192.168.100.198:3306)..
    Thu Jul 28 10:39:31 2011 - [info]  Executed CHANGE MASTER.
    Thu Jul 28 10:39:31 2011 - [info]  Slave started.
    Thu Jul 28 10:39:31 2011 - [info] End of log messages from ha-mgr01.
    Thu Jul 28 10:39:31 2011 - [info] -- Slave recovery on host ha-mgr01(192.168.100.200:3306) succeeded.
    Thu Jul 28 10:39:31 2011 - [info] All new slave servers recovered successfully.
    Thu Jul 28 10:39:31 2011 - [info] 
    Thu Jul 28 10:39:31 2011 - [info] * Phase 5: New master cleanup phease..
    Thu Jul 28 10:39:31 2011 - [info] 
    Thu Jul 28 10:39:31 2011 - [info] Resetting slave info on the new master..
    Thu Jul 28 10:39:31 2011 - [info] Master failover to ha-db02(192.168.100.198:3306) completed successfully.
    Thu Jul 28 10:39:31 2011 - [info] 
    ----- Failover Report -----
    app1: MySQL Master failover ha-db01 to ha-db02 succeeded
    Master ha-db01 is down!
    
    • Check MHA Manager logs at ha-mgr01 for details.
    
    Started manual(interactive) failover.
    The latest slave ha-db02(192.168.100.198:3306) has all relay logs for recovery.
    Selected ha-db02 as a new master.
    ha-db02: OK: Applying all logs succeeded.
    ha-mgr01: This host has the latest relay log events.
    Generating relay diff files from the latest slave succeeded.
    ha-mgr01: OK: Applying all logs succeeded. Slave started, replicating from ha-db02.
    ha-db02: Resetting slave info succeeded.
    Master failover to ha-db02(192.168.100.198:3306) completed successfully.
    Thu Jul 28 10:39:31 2011 - [info] Sending mail..
    Unknown option: conf
    
    • Displaying a list of replication slaves registered with new Master
    # mysql -uroot -pmysql -e 'SHOW SLAVE HOSTS\G' -h ha-db02
    *************************** 1. row ***************************
    Server_id: 300
         Host: 
         Port: 3306
    Master_id: 200 
    
    • Checking the replication status of the slave servers
    # mysql -uroot -pmysql -e 'SHOW SLAVE STATUS\G' -h localhost
    *************************** 1. row ***************************
                   Slave_IO_State: Waiting for master to send event
                      Master_Host: 192.168.100.198
                      Master_User: replication
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: mysql-bin.000006
              Read_Master_Log_Pos: 107
                   Relay_Log_File: ha-mgr01-relay-bin.000002
                    Relay_Log_Pos: 253
            Relay_Master_Log_File: mysql-bin.000006
                 Slave_IO_Running: Yes
                Slave_SQL_Running: Yes
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 107
                  Relay_Log_Space: 412
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 200
    
    # mysql -uroot -pmysql -e 'SHOW SLAVE STATUS\G' -h ha-db02
    *************************** 1. row ***************************
                   Slave_IO_State: 
                      Master_Host: 192.168.100.197
                      Master_User: replication
                      Master_Port: 3306
                    Connect_Retry: 60
                  Master_Log_File: 
              Read_Master_Log_Pos: 4
                   Relay_Log_File: mysqld-relay-bin.000001
                    Relay_Log_Pos: 4
            Relay_Master_Log_File: 
                 Slave_IO_Running: No
                Slave_SQL_Running: No
                  Replicate_Do_DB: 
              Replicate_Ignore_DB: 
               Replicate_Do_Table: 
           Replicate_Ignore_Table: 
          Replicate_Wild_Do_Table: 
      Replicate_Wild_Ignore_Table: 
                       Last_Errno: 0
                       Last_Error: 
                     Skip_Counter: 0
              Exec_Master_Log_Pos: 0
                  Relay_Log_Space: 126
                  Until_Condition: None
                   Until_Log_File: 
                    Until_Log_Pos: 0
               Master_SSL_Allowed: No
               Master_SSL_CA_File: 
               Master_SSL_CA_Path: 
                  Master_SSL_Cert: 
                Master_SSL_Cipher: 
                   Master_SSL_Key: 
            Seconds_Behind_Master: NULL
    Master_SSL_Verify_Server_Cert: No
                    Last_IO_Errno: 0
                    Last_IO_Error: 
                   Last_SQL_Errno: 0
                   Last_SQL_Error: 
      Replicate_Ignore_Server_Ids: 
                 Master_Server_Id: 100
    

    Lastly, let's see how master server fails over automatically.

    Monday, August 1, 2011

    MySQL MHA - Installation

    I'm going to start my blog, mainly about OSS. The reason is that I was inspired by an article on Yoshinori Matsumoto's blog. The title of the article is Announcing MySQL-MHA: "MySQL Master High Availability manager and tools"
    I'm thinking that the first article of mine should be MySQL MHA, as I was just looking for something nice of how to automate mysql master failover.
    A primary objective of MHA is automating master failover and slave promotion within short (usually 10-30 seconds) downtime, without suffering from replication consistency problems, without spending money for lots of new servers, without performance penalty, without complexity (easy-to-install), and without changing existing deployments.

    The official project is here, MySQL MHA.

    Before installing MySQL MHA, you need to install mysql server and enable replication between master and slave database server.
    • MHA Nodes & Manager


      DB(Master) + MHA Manager192.168.100.200(ha-mgr01)
      DB(Slave) + MHA Node192.168.100.197(ha-db01)
      DB(Slave) + MHA Node192.168.100.198(ha-db02)


      *Make sure that you need to setup public key authentication to login as root user between those hosts without passphrase.
      • Installing MHA Node 

      $ wget http://mysql-master-ha.googlecode.com/files/MySQL-MasterHA-Node-0.50.tar.gz
      $ tar zxf MySQL-MasterHA-Node-0.50.tar.gz
      $ cd MySQL-MasterHA-Node-0.50
      $ perl Makefile.PL
      $ make
      $ sudo make install 
      
      • Installing MHA Manager

      $ for modules in ¥ DBD::mysql ¥ Config::Tiny ¥
      Log::Dispatch ¥
      Parallel::ForkManager ¥
      Time::HiRes ; do 
      sudo cpan install $modules
      done
      $ wget http://mysql-master-ha.googlecode.com/files/MySQL-MasterHA-Manager-0.50.tar.gz
      $ tar zxf MySQL-MasterHA-Manager-0.50.tar.gz
      $ cd MySQL-MasterHA-Manager-0.50
      $ perl Makefile.PL
      $ make
      $ sudo make install
      $ sudo cp -p samples/scripts/* /usr/local/bin/
      
        Next will be about automating fail-over

        iJAWS@Doorkeeper