CFBMC-2571:ONTAP 集群节点在 BMC 15.12 中意外重启
问题描述
BMC 15.12 (AFF A250、AFF C250、ASA A250、ASA C250 或 FAS500 系统)
- ONTAP 集群节点意外重启:
[node_name: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED
[node_name: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.
[node_name: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC) - ONTAP 可能会在事件日志中报告 hwassist 错误:
[node01: cf_hwassist: cf.hwassist.missedKeepAlive:error]: HW-assisted takeover missing keep-alive messages from HA partner (node02)
[node01: cf_hwassist: cf.hwassist.recvKeepAlive:info]: hw_assist: Received hw_assist KeepAlive alert from partner(node02) - BMC 系统事件日志显示 BMC 执行了软件复位:
Pilot Software reset
Kernel Panic Reboot
或FPGA pull BMC whole reset
Pilot FPGA AC cycle