AFF A320:在有交换机集群中禁用频繁接管(未同步日志)警报
适用场景
- ONTAP 9
- AFF A320
- 有交换机集群
- 已禁用接管(未同步日志)
问题描述
- 在集群端口和交换机端口之间的端到端连接上未发现硬件错误
- EMS日志报告:
node-01:
Mon Aug 22 11:05:41 -0600 [node-01: cf_main: cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of node-02 disabled (unsynchronized log).
Mon Aug 22 11:05:43 -0600 [node-01: ib_cm_14: rdma.rlib.connected:debug]: misc:HA:P QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-01: ib_cm_6: rdma.rlib.connected:debug]: wafl:HA:P QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-01: ib_cm_3: rdma.rlib.connected:debug]: raid:HA:P QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-01: ib_cm_8: rdma.rlib.connected:debug]: misc:HA:P QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-01: nvmm_helper: nvpm.state.changed:debug]: Node 1's NVPM state changed from "2" to "2".
Mon Aug 22 11:05:45 -0600 [node-01: ib_cm_0: rdma.rlib.connected:debug]: wafl:HA:P QP is now connected.
Mon Aug 22 11:05:45 -0600 [node-01: ib_cm_10: rdma.rlib.connected:debug]: raid:HA:P QP is now connected.
Mon Aug 22 11:05:46 -0600 [node-01: cf_main: cf.fsm.takeoverOfPartnerEnabled:notice]: Failover monitor: takeover of node-02 enabled
node-02:
Mon Aug 22 11:05:41 -0600 [node-02: raidio_thread: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state NVMM_MIRROR_ONLINE is aborted because of reason NVMM_ERR_NO_REQS.
Mon Aug 22 11:05:41 -0600 [node-02: raidio_thread: mirror.stream.qp.error:debug]: params: {'error': 'NVMM_ERR_NO_REQS', 'qp_name': 'RAID', 'mirror': 'HA Partner'}
Mon Aug 22 11:05:41 -0600 [node-02: mcc_cfd_rnic: mirror.stream.qp.error:debug]: params: {'error': 'NVMM_ERR_STREAM', 'qp_name': 'MISC', 'mirror': 'HA Partner'}
Mon Aug 22 11:05:41 -0600 [node-02: nvmm_error: rdma.rlib.event.error:debug]: QP wafl event error: client disconnect.
Mon Aug 22 11:05:41 -0600 [node-02: nvmm_error: nvmm.mirror.offlined:debug]: params: {'mirror': 'HA_PARTNER'}
Mon Aug 22 11:05:41 -0600 [node-02: rastrace_dump: rastrace.dump.saved:debug]: A RAS trace dump for module IC instance 0 was stored in /etc/log/rastrace/IC_0_20220822_11:05:41:084487.dmp.
Mon Aug 22 11:05:41 -0600 [node-02: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node-02 by node-01 disabled (unsynchronized log).
Mon Aug 22 11:05:43 -0600 [node-02: ib_cm_18: rdma.rlib.connected:debug]: misc:HA:A QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-02: ib_cm_10: rdma.rlib.connected:debug]: wafl:HA:A QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-02: ib_cm_9: rdma.rlib.connected:debug]: raid:HA:A QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-02: ib_cm_13: rdma.rlib.connected:debug]: misc:HA:A QP is now connected.
Mon Aug 22 11:05:43 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCING to NVMM_MIRROR_LAYOUT_SYNCED and took 1 msecs.
Mon Aug 22 11:05:43 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCED to NVMM_MIRROR_SYNCING_START and took 0 msecs.
Mon Aug 22 11:05:43 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_SYNCING_START to NVMM_MIRROR_CP1_START and took 25 msecs.
Mon Aug 22 11:05:43 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_CP1_START to NVMM_MIRROR_WAFL_INIT and took 464 msecs.
Mon Aug 22 11:05:43 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_WAFL_INIT to NVMM_MIRROR_CP2_FINISH and took 24 msecs.
Mon Aug 22 11:05:45 -0600 [node-02: ib_cm_15: rdma.rlib.connected:debug]: wafl:HA:A QP is now connected.
Mon Aug 22 11:05:45 -0600 [node-02: ib_cm_8: rdma.rlib.connected:debug]: raid:HA:A QP is now connected.
Mon Aug 22 11:05:46 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_CP2_FINISH to NVMM_MIRROR_WAFL_HEADER and took 2339 msecs.
Mon Aug 22 11:05:46 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_WAFL_HEADER to NVMM_MIRROR_SYNCING_OTHER and took 12 msecs.
Mon Aug 22 11:05:46 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_SYNCING_OTHER to NVMM_MIRROR_ONLINE and took 288 msecs.
Mon Aug 22 11:05:46 -0600 [node-02: nvmm_mirror_sync: nvmm.mirror.onlined:debug]: params: {'mirror': 'HA_PARTNER'}
Mon Aug 22 11:05:46 -0600 [node-02: cf_main: cf.fsm.takeoverByPartnerEnabled:notice]: Failover monitor: takeover of node-02 by node-01 enabled
- 一个节点与其HA配对节点之间似乎存在IC通信问题、并被迫断开IC连接并重新建立以进行更正。节点将进入"未同步日志"状态以保持数据完整性