AFF A900节点关闭、但未显示崩溃字符串或错误消息
适用场景
- ONTAP 9
- AFF A900
- ASA A900
- FAS9500
问题描述
- 节点重新启动时不会显示任何崩溃字符串或错误消息
- 配对节点将启动接管、 事件日志会报告以下事件:
[Cluster-01: gop_eq_thread: ic.linkStatusChange:info]: HA interconnect: Port ic6a link is down.
[Cluster-01: cf_fastTimeout: cf.ic.heartBeatFailed:error]: HA interconnect: Heartbeat failed.
[Cluster-01: ctrl_hb_port_ic6a: ctrl.rdma.heartBeat:info]: HA interconnect: Missed heartbeat to 192.0.1.5.
[Cluster-01: vifmgr: vifmgr.cluscheck.droppedall:alert]: Total packet loss when pinging from cluster lif Cluster-01_clus2 (node Cluster-01) to cluster lif Cluster-02_clus1 (node Cluster-02).
[Cluster-01: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.
[Cluster-01: cf_main: cf.fsm.stateTransit:info]: Failover monitor: UP --> TAKEOVER
[Cluster-01: cf_takeover: ha.takeover.stateChng:debug]: params: {'old_state': 'NOT_IN_TAKEOVER', 'new_state': 'IN_CFO_TAKEOVER'}
[Cluster-01: cf_takeover: cf.fm.takeoverStarted:notice]: Failover monitor: takeover started
- BMC命令行界面命令
bmc status -d
显示CPU Catastrophic Error
为asserted
和de-asserted
。
Sep 15 01:53:36 BMCxxxx root: eventfifod 47586.00981(n): 171(0xc0ab) : CPU Catastrophic Error asserted
Sep 15 01:53:36 BMCxxxx root: eventfifod 47586.00981(o): 171(0x90ab) : CPU Catastrophic Error de-asserted
Sep 15 01:53:36 BMCxxxx root: eventfifod 47659.00887(n): 17(0xc011) : PCH Platform reset asserted
Sep 15 01:53:36 BMCxxxx root: eventfifod 47659.00887(s): 22(0xe016) : LPC Bus reset asserted
Sep 15 01:53:36 BMCxxxx root: eventfifod 47659.00887(s): 23(0xe017) : TPM Reset asserted
Sep 15 01:53:37 BMCxxxx root: eventfifod 47659.00887(s): 24(0xe018) : NIC0 Reset asserted
Sep 15 01:53:37 BMCxxxx root: eventfifod 47659.00887(s): 25(0xe019) : NIC1 Reset asserted
Sep 15 01:53:37 BMCxxxx root: eventfifod 47659.00887(s): 27(0xe01b) : NVME reset asserted