故障转移监控器:已禁用节点02接管节点01 (未同步日志)、并显示CRC错误和错误符号
适用场景
- AF-C250
- ONTAP 9
- BES-53248群集交换机
问题描述
- 以下警报会 频繁显示在事件/EMS日志中:
Mon Dec 02 01:04:26 -0500 [node-01: wafl_exempt09: mirror.stream.qp.error:debug]: params: {'mirror': 'HA Partner', 'qp_name': 'WAFL', 'error': 'NVMM_ERR_POLL_TIMEOUT'}
Mon Dec 02 01:04:26 -0500 [node-01: mcc_cfd_rnic: mirror.stream.qp.error:debug]: params: {'mirror': 'HA Partner', 'qp_name': 'RAID', 'error': 'NVMM_ERR_STREAM'}
Mon Dec 02 01:04:26 -0500 [node-01: mcc_cfd_rnic: mirror.stream.qp.error:debug]: params: {'mirror': 'HA Partner', 'qp_name': 'MISC', 'error': 'NVMM_ERR_STREAM'}
Mon Dec 02 01:04:26 -0500 [node-01: nvmm_error: rdma.rlib.event.error:debug]: QP wafl event error: client disconnect.
Mon Dec 02 01:04:26 -0500 [node-01: nvmm_error: nvmm.mirror.offlined:debug]: params: {'mirror': 'HA_PARTNER'}
Mon Dec 02 01:04:26 -0500 [node-01: rastrace_dump: rastrace.dump.saved:debug]: A RAS trace dump for module IC instance 0 was stored in /etc/log/rastrace/IC_0_20241202_01:04:26:534541.dmp.
Mon Dec 02 01:04:27 -0500 [node-01: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node-01 by node-02 disabled (unsynchronized log).
Mon Dec 02 01:04:29 -0500 [node-01: nvmm_mirror_sync: nvmm.mirror.state.change:debug]: mirror of sysid 1, partner_type HA Partner, changed state from NVMM_MIRROR_LAYOUT_SYNCING to NVMM_MIRROR_LAYOUT_SYNCED and took 1 msecs.
Tue Dec 03 12:35:00 -0500 [node-01: monitor: monitor.globalStatus.critical:EMERGENCY]: Controller failover of node-01 is not possible: unsynchronized log.
- IFstat 输出显示端口上的CRC错误和错误符号:
-- interface e0d (86 days, 5 hours, 53 minutes, 54 seconds) --
RECEIVE
Total frames: 3080m | Frames/second: 413 | Total bytes: 21281g
Bytes/second: 2855k | Total errors: 293 | Errors/minute: 0
Total discards: 0 | Discards/minute: 0 | Multi/broadcast: 45101k
Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0
CRC errors: 146 | Runt frames: 0 | Fragment: 1
Long frames: 0 | Jabber: 0 | Length errors: 0
Alignment errors: 0 | No buffer: 0 | Pause: 0
Jumbo: 2376m | Error symbol: 146 | Bus overruns: 0
Queue drops: 0 | LRO segments: 1246m | LRO bytes: 21082g
LRO6 segments: 0 | LRO6 bytes: 0 | Bad UDP cksum: 0
Bad UDP6 cksum: 0 | Bad TCP cksum: 0 | Bad TCP6 cksum: 0
Mcast v6 solicit: 0 | Lagg errors: 0 | Lacp errors: 0
Lacp PDU errors: 0
TRANSMIT
Total frames: 1438m | Frames/second: 193 | Total bytes: 169g
Bytes/second: 22739 | Total errors: 0 | Errors/minute: 0
Total discards: 0 | Queue overflow: 0 | Multi/broadcast: 14818k
Collisions: 0 | Pause: 48 | Jumbo: 340m
Cfg Up to Downs: 2 | TSO segments: 82774 | TSO bytes: 1409m
TSO6 segments: 0 | TSO6 bytes: 0 | HW UDP cksums: 7452k
HW UDP6 cksums: 0 | HW TCP cksums: 1403m | HW TCP6 cksums: 0
Mcast v6 solicit: 5 | Lagg drops: 0 | Lagg no buffer: 0
Lagg no entries: 0
DEVICE
Mcast addresses: 6 | Rx MBuf Sz: 9216
LINK INFO
Speed: 25000M | Duplex: full | Flowcontrol: none
Media state: active | Up to downs: 20597 | HW assist: 5655
- 交换机日志显示 交换机端口上的接收错误:
Port InOctets InUcastPkts InMcastPkts InBcastPkts InDropPkts Rx Error
--------- ---------------- ---------------- ---------------- ---------------- ---------------- ----------------
0/1 1747429110877 3619349837 7700076 257725 74546 1586
0/2 19941079199680 6150430088 7820711 256177 160171 0