CFBMC-6022:由于 BMC SoC 挂起导致 BMC FW 13.11X 关闭 ONTAP
问题描述
运行的平台:
- BMC 固件 13.11
- BMC 固件 13.11P1
在错过和停止心跳后,由配对节点执行接管。
- ONTAP 在过去 600 秒内未收到来自服务处理器 (SP) 的 IPMI 检测信号
- ASUP 系统已通知 SP 检测信号已停止
- 系统已重新启动以恢复 BMC
EMS 日志示例:
ERROR asup.post.drop: AutoSupport message (HA Group Notification from node-01 (SP HBT MISSED) NOTICE) was not posted to NetApp. The system will drop the message.
ERROR mgmtgwd.vreport.nodesUnreachable: Vreport encountered some unreachable nodes. The report may be incomplete.
ALERT callhome.sfo.takeover: Call home for CONTROLLER TAKEOVER COMPLETE AUTOMATIC
ERROR cf.fsm.takeoverOfPartnerDisabled: Failover monitor: takeover of node-02 disabled (local halt in progress).
EMERGENCY monitor.shutdown.emergency: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)
EMERGENCY sp.ipmi.lost.shutdown: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.
ALERT callhome.sp.hbt.stopped: Call home for SP HBT STOPPED
ALERT callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
ERROR sp.heartbeat.stopped: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.