CONETP-79534:由于网络拥塞导致的取消同步日志、MetroCluster IP中已禁用接管
问题描述
- 出现HA互连错误、接管已禁用:
Sun Feb 02 03:45:57 -0500 [Node-02: nvmm_error: rdma.rlib.event.error:debug]: QP wafl event error: client disconnect.
Sun Feb 02 03:45:57 -0500 [Node-02: nvmm_error: nvmm.mirror.offlined:debug]: params: \{'mirror': 'HA_PARTNER'
}Sun Feb 02 03:45:57 -0500 [Node-02: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of Node-02 by Node-01 disabled (unsynchronized log).
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCING to NVMM_MIRROR_LAYOUT_SYNCED and took 5 msecs.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCED to NVMM_MIRROR_SYNCING_START and took 0 msecs.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_SYNCING_START is aborted because of reason NVMM_ERR_STREAM_MAP.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_error: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_OFFLINE is aborted because of reason NVMM_ABORT_SYNCING_MIRROR.
- HA互连会在几秒钟后重新建立、并启用接管:
Sun Feb 02 03:46:00 -0500 [Node-02: iw_cm_wq: rdma.rlib.connected:debug]: wafl:HA:A QP is now connected.
Sun Feb 02 03:46:00 -0500 [Node-02: iw_cm_wq: rdma.rlib.connected:debug]: raid:HA:A QP is now connected.
Sun Feb 02 03:46:00 -0500 [Node-02: iw_cm_wq: rdma.rlib.connected:debug]: misc:HA:A QP is now connected.
Sun Feb 02 03:46:00 -0500 [Node-02: iw_cm_wq: rdma.rlib.connected:debug]: wafl:HA:A QP is now connected.
Sun Feb 02 03:46:00 -0500 [Node-02: iw_cm_wq: rdma.rlib.connected:debug]: raid:HA:A QP is now connected.
Sun Feb 02 03:46:00 -0500 [Node-02: iw_cm_wq: rdma.rlib.connected:debug]: misc:HA:A QP is now connected.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCING to NVMM_MIRROR_LAYOUT_SYNCED and took 4 msecs.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_SYNCING_START to NVMM_MIRROR_CP1_START and took 26 msecs.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_CP1_START to NVMM_MIRROR_WAFL_INIT and took 270 msecs.
Sun Feb 02 03:46:00 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_WAFL_INIT to NVMM_MIRROR_CP2_FINISH and took 20 msecs.
Sun Feb 02 03:46:01 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_CP2_FINISH to NVMM_MIRROR_WAFL_HEADER and took 543 msecs.
Sun Feb 02 03:46:01 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_WAFL_HEADER to NVMM_MIRROR_SYNCING_OTHER and took 1 msecs.
Sun Feb 02 03:46:01 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_SYNCING_OTHER to NVMM_MIRROR_ONLINE and took 169 msecs.
Sun Feb 02 03:46:01 -0500 [Node-02: nvmm_mirror_sync: nvmm.mirror.onlined:debug]: params: \{'mirror': 'HA_PARTNER'
}Sun Feb 02 03:46:02 -0500 [Node-02: cf_main: cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of Node-02 by Node-01 enabled
- EMS中出现网络拥塞错误:
Mon Feb 03 13:03:36 -0500 [Node-01: mccip_mirror_congestion_mgr_p: mcc.network.congestion:notice]: Network congestion detected. Action taken: Increased ic_timeout to 2000 msec.