MySQL高可用方案MHA在线切换的步子同原理

以一般工作中,会碰到如下的面貌,如mysql数据库升级,主服务器硬件升级当,这个上即便用用写操作切换至另外一宝服务器上,那么什么样进展在线切换为?同时,要求切换过程不够,对作业的影响比较小。

MHA就提供了这般平等栽优雅的道,只见面杜绝业务0.5~2s的岁月,在就段时日内,业务无法读取和写入。

 

集群信息

角色                             IP地址                 ServerID    
 类型

Master                         192.168.244.10   1                 写入

Candicate master          192.168.244.20   2                 读

Slave                           192.168.244.30   3                 读

Monitor host                 192.168.244.40                     
监控集群组

 

MHA具体的搭建步骤同原理,可参照另外一首博客:

MySQL高可用方案MHA的布与公理

 

在线切换的步子

  1. 关闭MHA监控

# masterha_stop –conf=/etc/masterha/app1.cnf

  1. 在线切换

# /usr/local/bin/masterha_master_switch –conf=/etc/masterha/app1.cnf
–master_state=alive –new_master_host=192.168.244.20
–new_master_port=3306 –orig_master_is_new_slave
–running_updates_limit=10000

其中,

–orig_master_is_new_slave是以本来master切换为新主的slave,默认情况下,是无长的。

–running_updates_limit默认为1s,即只要基本延迟时间(Seconds_Behind_Master),或master
show processlist中dml操作逾1s,则非会见尽切换。

 

在线切换的出口

Tue Apr 11 15:28:32 2017 - [info] MHA::MasterRotate version 0.56.
Tue Apr 11 15:28:32 2017 - [info] Starting online master switch..
Tue Apr 11 15:28:32 2017 - [info] 
Tue Apr 11 15:28:32 2017 - [info] * Phase 1: Configuration Check Phase..
Tue Apr 11 15:28:32 2017 - [info] 
Tue Apr 11 15:28:32 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr 11 15:28:32 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Apr 11 15:28:32 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Apr 11 15:28:34 2017 - [info] GTID failover mode = 0
Tue Apr 11 15:28:34 2017 - [info] Current Alive Master: 192.168.244.10(192.168.244.10:3306)
Tue Apr 11 15:28:34 2017 - [info] Alive Slaves:
Tue Apr 11 15:28:34 2017 - [info]   192.168.244.20(192.168.244.20:3306)  Version=5.6.31-log (oldest major version between slaves) log
-bin:enabledTue Apr 11 15:28:34 2017 - [info]     Replicating from 192.168.244.10(192.168.244.10:3306)
Tue Apr 11 15:28:34 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr 11 15:28:34 2017 - [info]   192.168.244.30(192.168.244.30:3306)  Version=5.6.31-log (oldest major version between slaves) log
-bin:enabledTue Apr 11 15:28:34 2017 - [info]     Replicating from 192.168.244.10(192.168.244.10:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.244.10(192.168
.244.10:3306)? (YES/no): yes
Tue Apr 11 15:28:47 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Tue Apr 11 15:28:47 2017 - [info]  ok.
Tue Apr 11 15:28:47 2017 - [info] Checking MHA is not monitoring or doing failover..
Tue Apr 11 15:28:47 2017 - [info] Checking replication health on 192.168.244.20..
Tue Apr 11 15:28:47 2017 - [info]  ok.
Tue Apr 11 15:28:47 2017 - [info] Checking replication health on 192.168.244.30..
Tue Apr 11 15:28:47 2017 - [info]  ok.
Tue Apr 11 15:28:47 2017 - [info] 192.168.244.20 can be new master.
Tue Apr 11 15:28:47 2017 - [info] 
From:
192.168.244.10(192.168.244.10:3306) (current master)
 +--192.168.244.20(192.168.244.20:3306)
 +--192.168.244.30(192.168.244.30:3306)

To:
192.168.244.20(192.168.244.20:3306) (new master)
 +--192.168.244.30(192.168.244.30:3306)
 +--192.168.244.10(192.168.244.10:3306)

Starting master switch from 192.168.244.10(192.168.244.10:3306) to 192.168.244.20(192.168.244.20:3306)? (yes/NO): yes
Tue Apr 11 15:29:00 2017 - [info] Checking whether 192.168.244.20(192.168.244.20:3306) is ok for the new master..
Tue Apr 11 15:29:00 2017 - [info]  ok.
Tue Apr 11 15:29:00 2017 - [info] 192.168.244.10(192.168.244.10:3306): SHOW SLAVE STATUS returned empty result. To check replication 
filtering rules, temporarily executing CHANGE MASTER to a dummy host.Tue Apr 11 15:29:00 2017 - [info] 192.168.244.10(192.168.244.10:3306): Resetting slave pointing to the dummy host.
Tue Apr 11 15:29:00 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Apr 11 15:29:00 2017 - [info] 
Tue Apr 11 15:29:00 2017 - [info] * Phase 2: Rejecting updates Phase..
Tue Apr 11 15:29:00 2017 - [info] 
Tue Apr 11 15:29:00 2017 - [info] Executing master ip online change script to disable write on the current master:
Tue Apr 11 15:29:00 2017 - [info]   /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.244.10 --orig_ma
ster_ip=192.168.244.10 --orig_master_port=3306 --orig_master_user='monitor' --orig_master_password='monitor123' --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slaveTue Apr 11 15:29:00 2017 476501 Set read_only on the new master.. ok.
Tue Apr 11 15:29:00 2017 911951 Set read_only=1 on the orig master.. ok.
Tue Apr 11 15:29:00 2017 919517 Killing all application threads..
Tue Apr 11 15:29:00 2017 919552 done.
Disabling the VIP an old master: 192.168.244.10 
SIOCSIFFLAGS: Cannot assign requested address
Tue Apr 11 15:29:00 2017 - [info]  ok.
Tue Apr 11 15:29:00 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Tue Apr 11 15:29:00 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..
Tue Apr 11 15:29:00 2017 - [info]  ok.
Tue Apr 11 15:29:00 2017 - [info] Orig master binlog:pos is mysql-bin.000016:211.
Tue Apr 11 15:29:00 2017 - [info]  Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306)..
Tue Apr 11 15:29:01 2017 - [info]  master_pos_wait(mysql-bin.000016:211) completed on 192.168.244.20(192.168.244.20:3306). Executed 0
 events.Tue Apr 11 15:29:01 2017 - [info]   done.
Tue Apr 11 15:29:01 2017 - [info] Getting new master's binlog name and position..
Tue Apr 11 15:29:01 2017 - [info]  mysql-bin.000009:211
Tue Apr 11 15:29:01 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_
HOST='192.168.244.20', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000009', MASTER_LOG_POS=211, MASTER_USER='repl', MASTER_PASSWORD='xxx';Tue Apr 11 15:29:01 2017 - [info] Executing master ip online change script to allow write on the new master:
Tue Apr 11 15:29:01 2017 - [info]   /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.244.10 --orig_m
aster_ip=192.168.244.10 --orig_master_port=3306 --orig_master_user='monitor' --orig_master_password='monitor123' --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port=3306 --new_master_user='monitor' --new_master_password='monitor123' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slaveTue Apr 11 15:29:01 2017 109040 Set read_only=0 on the new master.
Enabling the VIP 192.168.244.188 on the new master: 192.168.244.20 
Tue Apr 11 15:29:01 2017 - [info]  ok.
Tue Apr 11 15:29:01 2017 - [info] 
Tue Apr 11 15:29:01 2017 - [info] * Switching slaves in parallel..
Tue Apr 11 15:29:01 2017 - [info] 
Tue Apr 11 15:29:01 2017 - [info] -- Slave switch on host 192.168.244.30(192.168.244.30:3306) started, pid: 17651
Tue Apr 11 15:29:01 2017 - [info] 
Tue Apr 11 15:29:02 2017 - [info] Log messages from 192.168.244.30 ...
Tue Apr 11 15:29:02 2017 - [info] 
Tue Apr 11 15:29:01 2017 - [info]  Waiting to execute all relay logs on 192.168.244.30(192.168.244.30:3306)..
Tue Apr 11 15:29:01 2017 - [info]  master_pos_wait(mysql-bin.000016:211) completed on 192.168.244.30(192.168.244.30:3306). Executed 0
 events.Tue Apr 11 15:29:01 2017 - [info]   done.
Tue Apr 11 15:29:01 2017 - [info]  Resetting slave 192.168.244.30(192.168.244.30:3306) and starting replication from the new master 1
92.168.244.20(192.168.244.20:3306)..Tue Apr 11 15:29:01 2017 - [info]  Executed CHANGE MASTER.
Tue Apr 11 15:29:01 2017 - [info]  Slave started.
Tue Apr 11 15:29:02 2017 - [info] End of log messages from 192.168.244.30 ...
Tue Apr 11 15:29:02 2017 - [info] 
Tue Apr 11 15:29:02 2017 - [info] -- Slave switch on host 192.168.244.30(192.168.244.30:3306) succeeded.
Tue Apr 11 15:29:02 2017 - [info] Unlocking all tables on the orig master:
Tue Apr 11 15:29:02 2017 - [info] Executing UNLOCK TABLES..
Tue Apr 11 15:29:02 2017 - [info]  ok.
Tue Apr 11 15:29:02 2017 - [info] Starting orig master as a new slave..
Tue Apr 11 15:29:02 2017 - [info]  Resetting slave 192.168.244.10(192.168.244.10:3306) and starting replication from the new master 1
92.168.244.20(192.168.244.20:3306)..Tue Apr 11 15:29:02 2017 - [info]  Executed CHANGE MASTER.
Tue Apr 11 15:29:02 2017 - [info]  Slave started.
Tue Apr 11 15:29:02 2017 - [info] All new slave servers switched successfully.
Tue Apr 11 15:29:02 2017 - [info] 
Tue Apr 11 15:29:02 2017 - [info] * Phase 5: New master cleanup phase..
Tue Apr 11 15:29:02 2017 - [info] 
Tue Apr 11 15:29:02 2017 - [info]  192.168.244.20: Resetting slave info succeeded.
Tue Apr 11 15:29:02 2017 - [info] Switching master to 192.168.244.20(192.168.244.20:3306) completed successfully.

 

MHA在线切换的法则

  1. 自我批评时底布置信息与骨干服务器的信

    包括读取MHA的布文件/etc/masterha/app1.cnf以及检查时slave的正常状态

  1. 阻挡对目前master的翻新

   主要通过如下步骤:

   1>
等待1.5s($time_until_kill_threads*100ms),等待时连续断开。

   2> 执行 read_only=1,阻止新的DML操作

   3> 等待0.5s,等待眼前DML操作就。

   4> kill掉所有连接。

   5> FLUSH NO_WRITE_TO_BINLOG TABLES

   6> FLUSH TABLES WITH READ LOCK

  1. 等候新master执行了所有的relay log

    Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306)..

  2. 将新master的read_only设置为off,并添加VIP

  3. slave切换到新master上。

   1> 等待slave(192.168.244.30)应用完原主从复制产生的relay
log,然后实施change master操作切换到新master上。

   2> 释放原master上加的缉。

   3>
因masterha_master_switch命令行中蕴藏–orig_master_is_new_slave参数,故原master也切换为新master的自。

  1. 清理新master的连带信息。

    主要是推行了reset slave all操作,清除之前的复制信息。

 

MHA在线切换需满足的准

MHA在执行在线切换之前,会判定当前之主从复制信息,只有满足了以下条件,才能够履行切换动作:

  1. 有SLAVE的IO线程和SQL线程都于运转。

2.
所有slave的Seconds_Behind_Master小于或等于running_updates_limit的值,该参数如果无亮指定的话,则默认为1s

  1. 在master上,通过show
    processlist输出,没有一个DML操作的时大于running_updates_limit的值。

 

在线切换时,打开general
log,各个服务器的操作信息

注:在执行masterha_master_switch命令时,会生有限潮承认操作

  1. It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the
    master before switching. Is it ok to execute on 192.168.244.10(192.168
    .244.10:3306)? (YES/no):

  2. Starting master switch from 192.168.244.10(192.168.244.10:3306) to
    192.168.244.20(192.168.244.20:3306)? (yes/NO):

以下输出中间都发生三三两两不行空白,其中第一不善空白之前的出口对应第一糟糕确认之前,第二蹩脚之前的输出对应第二涂鸦确认之前。

 

原master 192.168.244.10

170412 16:52:38    23 Connect    monitor@node4 on 
                   23 Query    set autocommit=1
                   23 Query    SELECT CONNECTION_ID() AS Value
170412 16:52:39    24 Connect    monitor@node4 on 
                   24 Query    set autocommit=1
                   24 Query    SELECT CONNECTION_ID() AS Value
                   24 Query    SET wait_timeout=86400
                   24 Query    SELECT @@global.server_id As Value
                   24 Query    SELECT VERSION() AS Value
                   24 Query    SELECT @@global.gtid_mode As Value
                   24 Query    SHOW GLOBAL VARIABLES LIKE 'log_bin'
                   24 Query    SHOW MASTER STATUS
                   24 Query    SELECT @@global.datadir AS Value
                   24 Query    SELECT @@global.slave_parallel_workers AS Value
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT @@global.read_only As Value
                   24 Query    SELECT @@global.relay_log_purge As Value


170412 16:54:06    24 Query    FLUSH NO_WRITE_TO_BINLOG TABLES
                   24 Query    SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '0') AS Value
                   24 Query    SHOW PROCESSLIST


170412 16:55:51    24 Query    SHOW SLAVE STATUS
                   24 Query    CHANGE MASTER TO MASTER_HOST='dummy_host'
170412 16:55:52    24 Query    SHOW SLAVE STATUS
                   24 Query    RESET SLAVE /*!50516 ALL */
                   24 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Monitor') As Value
                   24 Quit    
                   25 Connect    monitor@node4 on 
                   25 Query    set autocommit=1
                   25 Query    SELECT CONNECTION_ID() AS Value
                   25 Query    SET sql_log_bin=0
                   25 Query    SHOW PROCESSLIST
                   25 Query    SELECT @@global.read_only As Value
                   25 Query    SET GLOBAL read_only=1
                   25 Query    SELECT @@global.read_only As Value
                   25 Query    SHOW PROCESSLIST
                   25 Query    SET sql_log_bin=1
                   25 Quit    
                   26 Connect    monitor@node4 on 
                   26 Query    set autocommit=1
                   26 Query    SELECT CONNECTION_ID() AS Value
                   26 Query    SET wait_timeout=86400
                   26 Query    FLUSH TABLES WITH READ LOCK
                   26 Query    SHOW MASTER STATUS
170412 16:55:53    26 Query    UNLOCK TABLES
                   26 Query    CHANGE MASTER TO MASTER_HOST = '192.168.244.20' MASTER_USER = 'repl' MASTER_PASSWORD = <secret> MASTE
R_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = 120           26 Query    SET GLOBAL relay_log_purge=0
                   26 Query    START SLAVE
                   27 Connect Out    repl@192.168.244.20:3306
                   26 Query    SHOW SLAVE STATUS
                   26 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
                   26 Quit    

 

新master 192.168.244.20

170412 16:52:38    23 Connect    monitor@node4 on 
                   23 Query    set autocommit=1
                   23 Query    SELECT CONNECTION_ID() AS Value
170412 16:52:39    24 Connect    monitor@node4 on 
                   24 Query    set autocommit=1
                   24 Query    SELECT CONNECTION_ID() AS Value
                   24 Query    SET wait_timeout=86400
                   24 Query    SELECT @@global.server_id As Value
                   24 Query    SELECT VERSION() AS Value
                   24 Query    SELECT @@global.gtid_mode As Value
                   24 Query    SHOW GLOBAL VARIABLES LIKE 'log_bin'
                   24 Query    SHOW MASTER STATUS
                   24 Query    SELECT @@global.datadir AS Value
                   24 Query    SELECT @@global.slave_parallel_workers AS Value
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT @@global.read_only As Value
                   24 Query    SELECT @@global.relay_log_purge As Value
                   24 Query    SELECT @@global.relay_log_info_repository AS Value
                   24 Query    SELECT @@global.datadir AS Value
                   24 Query    SELECT @@global.relay_log_info_file AS Value
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl'



170412 16:54:06    24 Query    SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SHOW SLAVE STATUS


170412 16:55:52    24 Query    SHOW PROCESSLIST
                   25 Connect    monitor@node4 on 
                   25 Query    set autocommit=1
                   25 Query    SELECT CONNECTION_ID() AS Value
                   25 Query    SELECT @@global.read_only As Value
                   25 Query    SELECT @@global.read_only As Value
                   25 Quit    
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT MASTER_POS_WAIT('mysql-bin.000017','120',0) AS Result
                   24 Query    STOP SLAVE SQL_THREAD
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SHOW MASTER STATUS
                   26 Connect    monitor@node4 on 
                   26 Query    set autocommit=1
                   26 Query    SELECT CONNECTION_ID() AS Value
                   26 Query    SET sql_log_bin=0
                   26 Query    SELECT @@global.read_only As Value
                   26 Query    SET GLOBAL read_only=0
                   26 Query    SET sql_log_bin=1
                   26 Quit    
                   24 Query    SELECT @@global.read_only As Value
                   27 Connect    repl@node3 on 
                   27 Query    SELECT UNIX_TIMESTAMP()
                   27 Query    SHOW VARIABLES LIKE 'SERVER_ID'
                   27 Query    SET @master_heartbeat_period= 1799999979520
                   27 Query    SET @master_binlog_checksum= @@global.binlog_checksum
                   27 Query    SELECT @master_binlog_checksum
                   27 Query    SELECT @@GLOBAL.GTID_MODE
                   27 Query    SHOW VARIABLES LIKE 'SERVER_UUID'
                   27 Query    SET @slave_uuid= '8a1093c8-1d00-11e7-954f-000c299a5715'
                   27 Binlog Dump    Log: 'mysql-bin.000010'  Pos: 120
170412 16:55:53    28 Connect    repl@node1 on 
                   28 Query    SELECT UNIX_TIMESTAMP()
                   28 Query    SHOW VARIABLES LIKE 'SERVER_ID'
                   28 Query    SET @master_heartbeat_period= 1799999979520
                   28 Query    SET @master_binlog_checksum= @@global.binlog_checksum
                   28 Query    SELECT @master_binlog_checksum
                   28 Query    SELECT @@GLOBAL.GTID_MODE
                   28 Query    SHOW VARIABLES LIKE 'SERVER_UUID'
                   24 Query    STOP SLAVE
                   28 Query    SET @slave_uuid= '2a6365e0-1d05-11e7-956d-000c29c64704'
                   28 Binlog Dump    Log: 'mysql-bin.000010'  Pos: 120
                   24 Query    SHOW SLAVE STATUS
                   24 Query    RESET SLAVE /*!50516 ALL */
                   24 Query    SHOW SLAVE STATUS
                   24 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
                   24 Quit    

 

slave 192.168.244.30

170412 16:52:37    16 Connect    monitor@node4 on 
                   16 Query    set autocommit=1
                   16 Query    SELECT CONNECTION_ID() AS Value
170412 16:52:38    17 Connect    monitor@node4 on 
                   17 Query    set autocommit=1
                   17 Query    SELECT CONNECTION_ID() AS Value
                   17 Query    SET wait_timeout=86400
                   17 Query    SELECT @@global.server_id As Value
                   17 Query    SELECT VERSION() AS Value
                   17 Query    SELECT @@global.gtid_mode As Value
                   17 Query    SHOW GLOBAL VARIABLES LIKE 'log_bin'
                   17 Query    SHOW MASTER STATUS
                   17 Query    SELECT @@global.datadir AS Value
                   17 Query    SELECT @@global.slave_parallel_workers AS Value
                   17 Query    SHOW SLAVE STATUS
                   17 Query    SELECT @@global.read_only As Value
                   17 Query    SELECT @@global.relay_log_purge As Value
                   17 Query    SELECT @@global.relay_log_info_repository AS Value
                   17 Query    SELECT @@global.datadir AS Value
                   17 Query    SELECT @@global.relay_log_info_file AS Value
                   17 Query    SHOW SLAVE STATUS
                   17 Query    SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl'


170412 16:54:05    17 Query    SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '0') AS Value
                   17 Query    SHOW SLAVE STATUS
                   17 Query    SHOW SLAVE STATUS


170412 16:55:50    17 Query    SHOW SLAVE STATUS
170412 16:55:51    17 Query    SHOW SLAVE STATUS
                   17 Query    SELECT MASTER_POS_WAIT('mysql-bin.000017','120',0) AS Result
                   17 Query    STOP SLAVE SQL_THREAD
                   17 Query    SHOW SLAVE STATUS
                   17 Query    STOP SLAVE
                   17 Query    STOP SLAVE
                   17 Query    SHOW SLAVE STATUS
                   17 Query    RESET SLAVE
                   17 Query    CHANGE MASTER TO MASTER_HOST = '192.168.244.20' MASTER_USER = 'repl' MASTER_PASSWORD = <secret> MASTE
R_PORT = 3306 MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = 120           17 Query    SET GLOBAL relay_log_purge=0
                   17 Query    START SLAVE
                   18 Connect Out    repl@192.168.244.20:3306
                   17 Query    SHOW SLAVE STATUS
170412 16:55:52    17 Query    SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
                   17 Quit    

 

参考

《深入浅出MySQL》

网站地图xml地图