命令 | 含义 |
---|---|
list | 查看守护进程的配置信息 |
show global info | 查看所有实例组的信息 |
tip | 查看系统当前运行状态 |
login | 登录监视器 |
logout | 退出登录 |
choose switchover GDW1 | 主机正常:查看可切换为主机的实例列表 |
switchover GDW1.实例名 | 主机正常:使用指定组的指定实例,切换为主机 |
choose takeover GDW1 | 主机故障:查看可切换为主机的实例列表 |
takeover GDW1.实例名 | 主机故障:使用指定组的指定实例,切换为主机 |
choose takeover force GDW1 | 强制切换:查看可切换为主机的实例列表 |
takeover force GDW1.实例名 | 强制切换:使用指定组的指定实例,切换为主机 |
主机故障后,在备机执行SELECT SF_DW_CHECK_TAKEOVER();【1:可接管 0:不可接管】 |
login
用户名:SYSDBA
密码:
[monitor] 2022-04-29 13:30:47: 登录监视器成功!
SWITCHOVER
[monitor] 2022-04-29 13:30:54: 开始切换实例DM03_B
[monitor] 2022-04-29 13:30:54: 通知守护进程DM03切换SWITCHOVER状态
[monitor] 2022-04-29 13:30:54: 守护进程(DM03)状态切换 [OPEN-->SWITCHOVER]
[monitor] 2022-04-29 13:30:55: 切换守护进程DM03为SWITCHOVER状态成功
[monitor] 2022-04-29 13:30:55: 通知守护进程DM03_B切换SWITCHOVER状态
[monitor] 2022-04-29 13:30:55: 守护进程(DM03_B)状态切换 [OPEN-->SWITCHOVER]
[monitor] 2022-04-29 13:30:57: 切换守护进程DM03_B为SWITCHOVER状态成功
[monitor] 2022-04-29 13:30:57: 实例DM03开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句
[monitor] 2022-04-29 13:30:57: 实例DM03执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功
[monitor] 2022-04-29 13:30:57: 实例DM03_B开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句
[monitor] 2022-04-29 13:30:57: 实例DM03_B执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功
[monitor] 2022-04-29 13:30:57: 实例DM03开始执行ALTER DATABASE MOUNT语句
[monitor] 2022-04-29 13:30:57: 实例DM03执行ALTER DATABASE MOUNT语句成功
[monitor] 2022-04-29 13:30:57: 实例DM03_B开始执行SP_APPLY_KEEP_PKG()语句
[monitor] 2022-04-29 13:30:57: 实例DM03_B执行SP_APPLY_KEEP_PKG()语句成功
[monitor] 2022-04-29 13:30:57: 实例DM03_B开始执行ALTER DATABASE MOUNT语句
[monitor] 2022-04-29 13:30:57: 实例DM03_B执行ALTER DATABASE MOUNT语句成功
[monitor] 2022-04-29 13:30:57: 实例DM03开始执行ALTER DATABASE STANDBY语句
[monitor] 2022-04-29 13:30:57: 实例DM03执行ALTER DATABASE STANDBY语句成功
[monitor] 2022-04-29 13:30:57: 实例DM03_B开始执行ALTER DATABASE PRIMARY语句
[monitor] 2022-04-29 13:30:58: 实例DM03_B执行ALTER DATABASE PRIMARY语句成功
[monitor] 2022-04-29 13:30:58: 通知实例DM03_B修改所有归档状态无效
[monitor] 2022-04-29 13:30:58: 修改所有实例归档为无效状态成功
[monitor] 2022-04-29 13:30:58: 实例DM03开始执行ALTER DATABASE OPEN FORCE语句
[monitor] 2022-04-29 13:30:58: 实例DM03执行ALTER DATABASE OPEN FORCE语句成功
[monitor] 2022-04-29 13:30:58: 实例DM03_B开始执行ALTER DATABASE OPEN FORCE语句
[monitor] 2022-04-29 13:30:59: 实例DM03_B执行ALTER DATABASE OPEN FORCE语句成功
[monitor] 2022-04-29 13:30:59: 实例DM03开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句
[monitor] 2022-04-29 13:30:59: 实例DM03执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功
[monitor] 2022-04-29 13:30:59: 实例DM03_B开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句
[monitor] 2022-04-29 13:30:59: 实例DM03_B执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功
[monitor] 2022-04-29 13:30:59: 通知守护进程DM03切换OPEN状态
[monitor] 2022-04-29 13:31:00: 守护进程(DM03)状态切换 [SWITCHOVER-->OPEN]
[monitor] 2022-04-29 13:31:00: 切换守护进程DM03为OPEN状态成功
[monitor] 2022-04-29 13:31:00: 通知守护进程DM03_B切换OPEN状态
[monitor] 2022-04-29 13:31:01: 守护进程(DM03_B)状态切换 [SWITCHOVER-->OPEN]
[monitor] 2022-04-29 13:31:01: 切换守护进程DM03_B为OPEN状态成功
[monitor] 2022-04-29 13:31:01: 通知组(GDW1)的守护进程执行清理 *** 作
[monitor] 2022-04-29 13:31:01: 清理守护进程(DM03)请求成功
[monitor] 2022-04-29 13:31:01: 清理守护进程(DM03_B)请求成功
[monitor] 2022-04-29 13:31:01: 实例DM03_B切换成功
2022-04-29 13:31:01
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GDW1 83765937 FALSE AUTO FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.130 5437 2022-04-29 13:31:01 GLOBAL VALID OPEN DM03_B OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.130 5237 OK DM03_B OPEN PRIMARY 0 0 REALTIME VALID 10119 63732 10119 63732 NONE
<>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.129 5437 2022-04-29 13:31:01 GLOBAL VALID OPEN DM03 OK 1 1 OPEN STANDBY DSC_OPEN REALTIME INVALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.129 5237 OK DM03 OPEN STANDBY 0 0 REALTIME INVALID 10116 61285 10116 61285 NONE
DATABASE (DM03) APPLY INFO FROM (DM03_B), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[10116, 10116, 10116], (RLSN, SLSN, KLSN)[61285, 61285, 61285], N_TSK[0], TSK_MEM_USE[0]
REDO_LSN_ARR: (61285)
#================================================================================#
[monitor] 2022-04-29 13:31:03: 守护进程(DM03_B)状态切换 [OPEN-->RECOVERY]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:31:03 RECOVERY OK DM03_B OPEN PRIMARY VALID 10 63732 63733
[monitor] 2022-04-29 13:31:05: 守护进程(DM03_B)状态切换 [RECOVERY-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:31:05 OPEN OK DM03_B OPEN PRIMARY VALID 10 63733 63733
【结论】
查看monitor的前台日志时,我们关注点主要有以下部分:
- MODE:auto表示自动故障切换,manul表示手动切换
- WSTATUS:OPEN + ISTATUS:OPEN + RSTAT:VALID
- IMODE :表示node在集群中的角色
【实验过程】
将B服务器上的dmserver kill -9 杀掉,保持dmwatcher正常
root@dw1_01 172.16.72.129 13:32:04 [pwd:~]# ps -ef | grep dmserver
dmdba 13768 1 0 12:06 ? 00:00:18 /home/dmdba/dmdbms/bin/dmserver /home/dmdba/dmdata/dm03/dm.ini mount
root 14167 13981 0 13:32 pts/1 00:00:00 grep --color=auto dmserver
root@dw1_01 172.16.72.129 13:32:07 [pwd:~]# kill -9 13768
【观察集群状态】
守护进程(DM03_B)状态切换 [OPEN–>MON CONFIRM]
–>守护进程(DM03_B)状态切换 [MON CONFIRM–>FAILOVER]
–>守护进程(DM03_B)状态切换 [FAILOVER–>OPEN]
–>守护进程(DM03_B)状态切换 [OPEN–>RECOVERY]
–>守护进程(DM03_B)状态切换 [RECOVERY–>OPEN]
实例DM03[STANDBY, OPEN, ISTAT_SAME:TRUE]故障
–>实例DM03[STANDBY, OPEN, ISTAT_SAME:TRUE]恢复正常
#================================================================================#
[monitor] 2022-04-29 13:36:14: 守护进程(DM03_B)状态切换 [OPEN-->MON CONFIRM]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:36:14 MON CONFIRM OK DM03_B SUSPEND PRIMARY VALID 10 63835 63836
[monitor] 2022-04-29 13:36:15: 守护进程(DM03_B)状态切换 [MON CONFIRM-->FAILOVER]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:36:15 FAILOVER OK DM03_B SUSPEND PRIMARY VALID 10 63835 63836
[monitor] 2022-04-29 13:36:15: 实例DM03[STANDBY, OPEN, ISTAT_SAME:TRUE]故障
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:36:15 STARTUP ERROR DM03 OPEN STANDBY VALID 10 63835 63835
[monitor] 2022-04-29 13:36:15: 守护进程(DM03)状态切换 [OPEN-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:36:15 STARTUP ERROR DM03 OPEN STANDBY VALID 10 63835 63835
[monitor] 2022-04-29 13:36:15: [!!! 实例DM03的守护进程配置为故障自动切换模式,但本监视器不是确认监视器,无法对实例DM03执行自动接管 !!!]
[monitor] 2022-04-29 13:36:16: 守护进程(DM03_B)状态切换 [FAILOVER-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:36:16 OPEN OK DM03_B OPEN PRIMARY VALID 10 63836 63836
[monitor] 2022-04-29 13:36:37: 实例DM03[STANDBY, OPEN, ISTAT_SAME:TRUE]恢复正常
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:36:37 STARTUP OK DM03 OPEN STANDBY INVALID 10 63835 63835
[monitor] 2022-04-29 13:36:38: 守护进程(DM03_B)状态切换 [OPEN-->RECOVERY]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:36:38 RECOVERY OK DM03_B OPEN PRIMARY VALID 10 63843 63843
[monitor] 2022-04-29 13:36:38: 守护进程(DM03)状态切换 [STARTUP-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:36:38 OPEN OK DM03 OPEN STANDBY INVALID 10 63835 63835
[monitor] 2022-04-29 13:36:39: 守护进程(DM03_B)状态切换 [RECOVERY-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:36:39 OPEN OK DM03_B OPEN PRIMARY VALID 10 63843 63844
show
2022-04-29 13:45:17
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GDW1 83765937 FALSE AUTO FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.130 5437 2022-04-29 13:45:16 GLOBAL VALID OPEN DM03_B OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.130 5237 OK DM03_B OPEN PRIMARY 0 0 REALTIME VALID 10403 64015 10403 64015 NONE
<>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.129 5437 2022-04-29 13:45:16 GLOBAL VALID OPEN DM03 OK 1 1 OPEN STANDBY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.129 5237 OK DM03 OPEN STANDBY 0 0 REALTIME VALID 10116 64014 10116 64014 NONE
DATABASE (DM03) APPLY INFO FROM (DM03_B), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[10402, 10402, 10403], (RLSN, SLSN, KLSN)[64014, 64014, 64015], N_TSK[0], TSK_MEM_USE[512]
REDO_LSN_ARR: (64014)
#================================================================================#
root@dw1_01 172.16.72.129 13:36:11 [pwd:~]# ps -ef | grep dmserver
dmdba 14170 1 1 13:36 ? 00:00:00 /home/dmdba/dmdbms/bin/dmserver /home/dmdba/dmdata/dm03/dm.ini mount
root 14260 13981 0 13:37 pts/1 00:00:00 grep --color=auto dmserver
2. kill -9 主库
root@dw1_02 172.16.72.130 13:50:07 [pwd:/etc/firewalld/zones]# ps -ef | grep -E 'dmserver|watcher'
dmdba 1212 1 0 10:48 ? 00:00:12 /home/dmdba/dmdbms/bin/dmwatcher path=/home/dmdba/dmdata/dm03/dmwatcher.ini -noconsole
dmdba 9444 1 0 12:06 ? 00:00:23 /home/dmdba/dmdbms/bin/dmserver /home/dmdba/dmdata/dm03/dm.ini mount
root 10328 10111 0 13:50 pts/2 00:00:00 grep --color=auto -E dmserver|watcher
root@dw1_02 172.16.72.130 13:50:12 [pwd:/etc/firewalld/zones]# kill -9 9444
【观察集群状态】
实例DM03_B [PRIMARY, OPEN, ISTAT_SAME:TRUE]故障
–>实例DM03_B[STANDBY, MOUNT, ISTAT_SAME:TRUE]恢复正常
守护进程(DM03_B) 状态切换 [OPEN–>STARTUP]
–>守护进程(DM03_B)状态切换 [STARTUP–>OPEN]
守护进程(DM03) 状态切换 [OPEN–>TAKEOVER]
–>守护进程(DM03)状态切换 [TAKEOVER–>OPEN]
–>守护进程(DM03)状态切换 [OPEN–>RECOVERY]
–>守护进程(DM03)状态切换 [RECOVERY–>OPEN]
show
2022-04-29 13:45:17
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GDW1 83765937 FALSE AUTO FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.130 5437 2022-04-29 13:45:16 GLOBAL VALID OPEN DM03_B OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.130 5237 OK DM03_B OPEN PRIMARY 0 0 REALTIME VALID 10403 64015 10403 64015 NONE
<>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.129 5437 2022-04-29 13:45:16 GLOBAL VALID OPEN DM03 OK 1 1 OPEN STANDBY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.129 5237 OK DM03 OPEN STANDBY 0 0 REALTIME VALID 10116 64014 10116 64014 NONE
DATABASE (DM03) APPLY INFO FROM (DM03_B), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[10402, 10402, 10403], (RLSN, SLSN, KLSN)[64014, 64014, 64015], N_TSK[0], TSK_MEM_USE[512]
REDO_LSN_ARR: (64014)
#================================================================================#
#================================================================================#
DATABASE(DM03) APPLY INFO FROM (DM03_B), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[10402, 10402, 10403], (RLSN, SLSN, KLSN)[64014, 64014, 64015], N_TSK[0], TSK_MEM_USE[512]
REDO_LSN_ARR: (64014)
#================================================================================#
[monitor] 2022-04-29 13:50:26: 实例DM03_B[PRIMARY, OPEN, ISTAT_SAME:TRUE]故障
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:50:26 STARTUP ERROR DM03_B OPEN PRIMARY VALID 10 64117 64118
[monitor] 2022-04-29 13:50:26: 守护进程(DM03_B)状态切换 [OPEN-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:50:26 STARTUP ERROR DM03_B OPEN PRIMARY VALID 10 64117 64118
[monitor] 2022-04-29 13:50:26: [!!! 实例DM03_B的守护进程配置为故障自动切换模式,但本监视器不是确认监视器,无法对实例DM03_B执行自动接管 !!!]
[monitor] 2022-04-29 13:50:26: 守护进程(DM03)状态切换 [OPEN-->TAKEOVER]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:50:27 TAKEOVER OK DM03 OPEN STANDBY VALID 10 64117 64117
[monitor] 2022-04-29 13:50:29: 守护进程(DM03)状态切换 [TAKEOVER-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:50:29 OPEN OK DM03 OPEN PRIMARY VALID 11 66564 66564
[monitor] 2022-04-29 13:50:49: 实例DM03_B[STANDBY, MOUNT, ISTAT_SAME:TRUE]恢复正常
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:50:48 STARTUP OK DM03_B MOUNT STANDBY INVALID 10 64118 64118
[monitor] 2022-04-29 13:50:49: 守护进程(DM03_B)状态切换 [STARTUP-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:50:49 OPEN OK DM03_B OPEN STANDBY INVALID 10 64118 64118
[monitor] 2022-04-29 13:50:50: 守护进程(DM03)状态切换 [OPEN-->RECOVERY]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:50:50 RECOVERY OK DM03 OPEN PRIMARY VALID 11 66570 66571
[monitor] 2022-04-29 13:50:52: 守护进程(DM03)状态切换 [RECOVERY-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 13:50:52 OPEN OK DM03 OPEN PRIMARY VALID 11 66571 66572
show
2022-04-29 14:00:05
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GDW1 83765937 FALSE AUTO FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.129 5437 2022-04-29 14:00:05 GLOBAL VALID OPEN DM03 OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.129 5237 OK DM03 OPEN PRIMARY 0 0 REALTIME VALID 10699 66755 10700 66756 NONE
<>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.130 5437 2022-04-29 14:00:05 GLOBAL VALID OPEN DM03_B OK 1 1 OPEN STANDBY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.130 5237 OK DM03_B OPEN STANDBY 0 0 REALTIME VALID 10506 66755 10506 66755 NONE
DATABASE (DM03_B) APPLY INFO FROM (DM03), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[10699, 10699, 10700], (RLSN, SLSN, KLSN)[66755, 66755, 66756], N_TSK[0], TSK_MEM_USE[512]
REDO_LSN_ARR: (66755)
#================================================================================#
3.【结论】
当主库或备库宕机,但其各自服务器上的守护进程(watcher)正常工作时,都可以通过watcher将dmserver重新拉起,并自动recover数据
03| kill -9 备库+备库watcherroot@dw1_02 172.16.72.130 13:50:23 [pwd:/etc/firewalld/zones]# ps -ef | grep -E 'dmserver|watcher'
dmdba 1212 1 0 10:48 ? 00:00:13 /home/dmdba/dmdbms/bin/dmwatcher path=/home/dmdba/dmdata/dm03/dmwatcher.ini -noconsole
dmdba 10330 1 1 13:50 ? 00:00:00 /home/dmdba/dmdbms/bin/dmserver /home/dmdba/dmdata/dm03/dm.ini mount
root 10422 10111 0 13:51 pts/2 00:00:00 grep --color=auto -E dmserver|watcher
root@dw1_02 172.16.72.130 13:51:11 [pwd:/etc/firewalld/zones]# kill -9 10330 1212
主库状态未受影响
[monitor] 2022-04-29 14:06:32: 守护进程(DM03)状态切换 [OPEN-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:06:32 STARTUP OK DM03 SUSPEND PRIMARY VALID 11 66884 66885
[monitor] 2022-04-29 14:06:32: 守护进程(DM03)状态切换 [STARTUP-->FAILOVER]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:06:32 FAILOVER OK DM03 SUSPEND PRIMARY VALID 11 66884 66885
[monitor] 2022-04-29 14:06:35: 守护进程(DM03)状态切换 [FAILOVER-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:06:35 OPEN OK DM03 OPEN PRIMARY VALID 11 66885 66885
[monitor] 2022-04-29 14:06:51: [!!! 实例DM03_B的守护进程配置为故障自动切换模式,但本监视器不是确认监视器,无法对实例DM03_B执行自动接管 !!!]
[monitor] 2022-04-29 14:06:51: 接收守护进程(DM03_B)消息超时
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:06:29 ERROR OK DM03_B OPEN STANDBY INVALID 11 66883 66883
show
2022-04-29 14:07:48
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GDW1 83765937 FALSE AUTO FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.129 5437 2022-04-29 14:07:48 GLOBAL VALID OPEN DM03 OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.129 5237 OK DM03 OPEN PRIMARY 0 0 REALTIME VALID 10853 66909 10853 66909 NONE
ERROR DATABASE:
<>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.130 5437 2022-04-29 14:06:29 GLOBAL VALID ERROR DM03_B OK 1 1 OPEN STANDBY DSC_OPEN REALTIME INVALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.130 5237 OK DM03_B OPEN STANDBY 0 0 REALTIME INVALID 10506 66883 10506 66883 NONE
DATABASE (DM03_B) APPLY INFO FROM (DM03), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[10827, 10827, 10828], (RLSN, SLSN, KLSN)[66883, 66883, 66884], N_TSK[0], TSK_MEM_USE[512]
REDO_LSN_ARR: (66883)
#================================================================================#
# 在主库插入数据
insert into test values(2);
commit;
启动备库,查看数据
dmdba@dw1_02 172.16.72.130 14:18:26 [pwd:~/dmdbms/bin]$ ./DmServicedm03_B start
Starting DmServicedm03_B: [ OK ]
dmdba@dw1_02 172.16.72.130 14:18:47 [pwd:~/dmdbms/bin]$ ./DmWatcherServiceWatcher start
Starting DmWatcherServiceWatcher: [ OK ]
dmdba@dw1_02 172.16.72.130 14:19:11 [pwd:~/dmdbms/bin]$ !disql
disql SYSDBA/SYSDBA@172.16.72.130:5237
服务器[172.16.72.130:5237]:处于备库打开状态
登录使用时间 : 2.689(ms)
disql V8
SQL> select * from test;
行号 A
---------- -----------
1 1
2 2 --数据正常同步
【结论】
当备库(dmserver+watcher)进程挂掉,当服务重新启动的时候,数据库自动同步数据
【实验过程】
shutdown 主库服务器后启动服务器
[monitor] 2022-04-29 14:28:01: 守护进程(DM03_B)状态切换 [OPEN-->TAKEOVER]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:28:01 TAKEOVER OK DM03_B OPEN STANDBY VALID 11 67314 67314
[monitor] 2022-04-29 14:28:01: [!!! 实例DM03的守护进程配置为故障自动切换模式,但本监视器不是确认监视器,无法对实例DM03执行自动接管 !!!]
[monitor] 2022-04-29 14:28:01: 接收守护进程(DM03)消息超时
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:27:40 ERROR OK DM03 OPEN PRIMARY VALID 11 67315 67315
[monitor] 2022-04-29 14:28:04: 守护进程(DM03_B)状态切换 [TAKEOVER-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:28:04 OPEN OK DM03_B OPEN PRIMARY VALID 12 69761 69762
[monitor] 2022-04-29 14:28:21: 守护进程(DM03)状态切换 [NONE-->STARTUP]
[monitor] 2022-04-29 14:28:21: [!!! 实例DM03的守护进程配置为故障自动切换模式,但本监视器不是确认监视器,无法对实例DM03执行自动接管 !!!]
[monitor] 2022-04-29 14:28:26: 守护进程(DM03)状态切换 [STARTUP-->UNIFY EP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:28:26 UNIFY EP OK DM03 MOUNT PRIMARY VALID 11 67315 67315
[monitor] 2022-04-29 14:28:26: 守护进程(DM03)状态切换 [UNIFY EP-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:28:26 STARTUP OK DM03 MOUNT STANDBY INVALID 11 67315 67315
[monitor] 2022-04-29 14:28:26: 守护进程(DM03)状态切换 [STARTUP-->UNIFY EP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:28:26 UNIFY EP OK DM03 MOUNT STANDBY INVALID 11 67315 67315
[monitor] 2022-04-29 14:28:27: 守护进程(DM03)状态切换 [UNIFY EP-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:28:27 STARTUP OK DM03 OPEN STANDBY INVALID 11 67315 67315
[monitor] 2022-04-29 14:28:27: 守护进程(DM03)状态切换 [STARTUP-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:28:27 OPEN OK DM03 OPEN STANDBY INVALID 11 67315 67315
[monitor] 2022-04-29 14:28:27: 守护进程(DM03_B)状态切换 [OPEN-->RECOVERY]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:28:27 RECOVERY OK DM03_B OPEN PRIMARY VALID 12 69769 69769
[monitor] 2022-04-29 14:28:29: 守护进程(DM03_B)状态切换 [RECOVERY-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:28:29 OPEN OK DM03_B OPEN PRIMARY VALID 12 69770 69770
show
2022-04-29 14:29:21
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GDW1 83765937 FALSE AUTO FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.130 5437 2022-04-29 14:29:20 GLOBAL VALID OPEN DM03_B OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.130 5237 OK DM03_B OPEN PRIMARY 0 0 REALTIME VALID 11282 69786 11283 69787 NONE
<>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.129 5437 2022-04-29 14:29:20 GLOBAL VALID OPEN DM03 OK 1 1 OPEN STANDBY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.129 5237 OK DM03 OPEN STANDBY 0 0 REALTIME VALID 11255 69785 11255 69785 NONE
DATABASE (DM03) APPLY INFO FROM (DM03_B), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[11281, 11281, 11282], (RLSN, SLSN, KLSN)[69785, 69785, 69786], N_TSK[0], TSK_MEM_USE[1024]
REDO_LSN_ARR: (69785)
#================================================================================#
可以看到,随着服务器重启,dm进程也随开机自动启动,并进行了recover后open
查看进程状态
root@dw1_01 172.16.72.129 14:29:55 [pwd:~]# ps -ef | grep -E 'dmserver|dmwatcher'
dmdba 1273 1 0 14:28 ? 00:00:01 /home/dmdba/dmdbms/bin/dmserver path=/home/dmdba/dmdata/dm03/dm.ini -noconsole mount
dmdba 1284 1 0 14:28 ? 00:00:00 /home/dmdba/dmdbms/bin/dmwatcher path=/home/dmdba/dmdata/dm03/dmwatcher.ini -noconsole
root 8732 8670 0 14:31 pts/0 00:00:00 grep --color=auto -E dmserver|dmwatcher
【结论】
- shutdown服务器(A或B)后,当服务器重启时默认自动重启dmserver 和dmwatcher 进程
- 进程启动后自动进行数据同步和数据恢复
- 如果down的是主库服务器,在monitor-C的作用下,将进行故障转移
【实验过程】
kill 掉服务器C上的monitor后:
- 【 *** 作1】kill 掉备库dmserver
- 【 *** 作2】在非确认监视器执行switchover
- 【 *** 作3】kill 主库
root@dmteset 172.16.72.128 13:07:33 [pwd:~]# ps -ef | grep monitor
root 2694 1 0 Apr28 ? 00:00:00 /usr/libexec/gvfs-udisks2-volume-monitor
root 2699 1 0 Apr28 ? 00:00:00 /usr/libexec/gvfs-afc-volume-monitor
root 2705 1 0 Apr28 ? 00:00:00 /usr/libexec/gvfs-gphoto2-volume-monitor
root 2711 1 0 Apr28 ? 00:00:00 /usr/libexec/gvfs-goa-volume-monitor
root 2716 1 0 Apr28 ? 00:00:00 /usr/libexec/gvfs-mtp-volume-monitor
dmdba 60147 1 0 09:25 ? 00:01:04 /dm8/dm_home/bin/dmmonitor path=/dm8/dm_home/bin/dmmonitor.ini
root 80286 74864 0 14:34 pts/3 00:00:00 grep --color=auto monitor
root@dmteset 172.16.72.128 14:34:40 [pwd:~]# kill -9 60147
#================================================================================#
show
2022-04-29 14:36:22
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GDW1 83765937 FALSE AUTO FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.130 5437 2022-04-29 14:36:21 GLOBAL VALID MON CONFIRM DM03_B OK 1 1 SUSPEND PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.130 5237 OK DM03_B SUSPEND PRIMARY 0 0 REALTIME VALID 11414 69918 11414 69919 NONE
ERROR DATABASE:
<>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.129 5437 2022-04-29 14:36:21 GLOBAL VALID STARTUP DM03 ERROR 1 1 OPEN STANDBY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.129 5237 ERROR DM03 OPEN STANDBY 0 0 REALTIME VALID 11255 69917 11255 69917 NONE
DATABASE (DM03) APPLY INFO FROM (DM03_B), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[11413, 11413, 11414], (RLSN, SLSN, KLSN)[69917, 69917, 69918], N_TSK[0], TSK_MEM_USE[512]
REDO_LSN_ARR: (69917)
#================================================================================#
[monitor] 2022-04-29 14:36:22: 实例DM03[STANDBY, OPEN, ISTAT_SAME:TRUE]恢复正常
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:36:22 OPEN OK DM03 OPEN STANDBY VALID 12 69917 69917
[monitor] 2022-04-29 14:36:22: 守护进程(DM03)状态切换 [STARTUP-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-29 14:36:22 OPEN OK DM03 OPEN STANDBY VALID 12 69917 69917
【 *** 作2】在非确认监视器执行Switchover
SWITCHOVER
[monitor] 2022-04-29 14在非确认监视器执行Switchover:37:37: 存在多个或没有PRIMARY&OPEN状态的实例或者实例当前的DSC_STATUS不是DSC_OPEN状态
由于没有了确认监视器,无法执行switchover 指令
【 *** 作3】kill 主库
dmdba@dw1_02 172.16.72.130 14:35:27 [pwd:~/dmdbms/bin]$ ps -ef | grep -E 'dmserver|dmwatcher'
dmdba 10501 1 0 14:18 pts/2 00:00:06 /home/dmdba/dmdbms/bin/dmserver path=/home/dmdba/dmdata/dm03/dm.ini -noconsole mount
dmdba 10619 1 0 14:18 pts/2 00:00:03 /home/dmdba/dmdbms/bin/dmwatcher path=/home/dmdba/dmdata/dm03/dmwatcher.ini -noconsole
dmdba 10670 10448 0 14:46 pts/2 00:00:00 grep --color=auto -E dmserver|dmwatcher
dmdba@dw1_02 172.16.72.130 14:46:01 [pwd:~/dmdbms/bin]$
dmdba@dw1_02 172.16.72.130 14:46:02 [pwd:~/dmdbms/bin]$
dmdba@dw1_02 172.16.72.130 14:46:02 [pwd:~/dmdbms/bin]$
dmdba@dw1_02 172.16.72.130 14:46:02 [pwd:~/dmdbms/bin]$ kill -9 10501
show
2022-04-29 14:47:23
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GDW1 83765937 FALSE AUTO FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.130 5437 2022-04-29 14:47:23 GLOBAL VALID OPEN DM03_B OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.130 5237 OK DM03_B OPEN PRIMARY 0 0 REALTIME VALID 11427 72375 11427 72376 NONE
<>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
172.16.72.129 5437 2022-04-29 14:47:23 GLOBAL VALID OPEN DM03 OK 1 1 OPEN STANDBY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
172.16.72.129 5237 OK DM03 OPEN STANDBY 0 0 REALTIME VALID 11255 72374 11255 72374 NONE
DATABASE (DM03) APPLY INFO FROM (DM03_B), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[11426, 11426, 11427], (RLSN, SLSN, KLSN)[72374, 72374, 72375], N_TSK[0], TSK_MEM_USE[1024]
REDO_LSN_ARR: (72374)
可以看到,DM03_B-172.16.72.130 上的watcher仍旧将dmserver 进行了拉起,但是节点角色仍然是PRIMARY,并没有进行故障转移到到DM03
【结论】在自动切换模式下,主备集群关闭确认监视器后:
- 无法进行故障转移:primary无法切换节点
- 无法在执行switchover指令
- watcher仍可将dmserver拉起
在没有确认监视器或确认监视器配置错误的情况下:
- 1.如果主库故障,则备库无法自动接管为新主库。
- 2.如果备库实例和备库的守护进程都出现故障,或者主备库之间出现网络故障,则主库的守护进程无法通过确认监视器来确认备库状态,主库守护进程会处于 Confirm 确认状态,实例处于 Suspend 挂起状态。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)