A800 互连已关闭,并显示 e0a/e0b 致命奇偶校验错误( 0x10 )
适用场景
- AFF A800
- 双40/100G以太网T62100-夹层 卡
- ONTAP 9
问题描述
- 在节点重新启动、重新启动或升级后、 系统将处于部分恢复 状态、并具有互连状态:
RDMA Interconnect is down
"" - 存储故障转移状态为:
Storage failover interconnect error. NVRAM log not synchronized. Disk inventory not exchanged
"" - 控制台日志显示:
e0a/e0b:Fatal parity error (0x10)
- ONTAP操作系统以及BMC、BIOS和T62100固件已更新、并在两个节点中运行
EMS日志:
May 02 07:58:09 [node_name:netif.fatal.err:ALERT]: The network device in slot 0 encountered fatal error e0a/e0b.
May 02 07:58:09 [node_name:netif.fatal.err:ALERT]: The network device in slot 0 encountered fatal error e0a/e0b.
May 02 22:49:05 [node_name: kernel: netif.linkDown:info]: Ethernet e0a: Link down, check cable.
May 02 22:49:05 [node_name: kernel: netif.linkDown:info]: Ethernet e0b: Link down, check cable.
May 02 22:49:05 [node_name: intr: rlib.ifconfig.linkEvent:notice]: params: {'ifname': 'e0b', 'eventType': 'DOWN'}
May 02 22:49:05 -0800 [node_name: vifmgr: vifmgr.portdown:notice]: A link down event was received on node node_name, port e0a.
May 02 22:49:05 -0800 [node_name: nvmm_error: nvmm.mirror.offlined:debug]: params: {'mirror': 'HA_PARTNER'}
May 02 22:49:05 -0800 [node_name: vifmgr: vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node node_name has gone down unexpectedly.
May 02 23:00:00 -0800 [node_name: statd: ic.HAInterconnectDown:error]: HA interconnect: Interconnect down for 10 minutes: link0 down
May 02 23:00:00 -0800 [node_name: statd: callhome.hainterconnect.down:alert]: Call home for HA INTERCONNECT DOWN due to link0 down.