AFF A250、C250 ASA A250、C250 FAS500f 在使用 BMC 固件 15.7 或更低版本时出现意外节点重启
适用于
- AFF A250,AFF C250
- ASA A250,ASA C250
- FAS500f
- Baseboard Management Controller (BMC) 固件 15.7 或更低版本
问题
- 意外节点暂停:
[node_name: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
[node_name: spmgrd: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
[node_name: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED
[node_name: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.
[node_name: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)
[node_name: mgwd: mgwd.notify.halt.result:info]: MGWD able to notify CLAM on its HA partner node that this node is undergoing a planned shutdown (reason: E). Error: -
SP-LATEST-SYSTEM-EVENT-LOG或命令system logsel指示具有多个总线可纠正错误的 IPMI 冷复位:
BMC node_name> system log sel3e1 | 03/08/2023 | 16:09:46 | Critical Interrupt #0x31 | Bus Correctable error | Asserted
3e2 | 03/08/2023 | 16:09:46 | Critical Interrupt #0x31 | Bus Correctable error | Asserted
...
3f1 | OEM record f2 | IPMI cold reset
3f2 | OEM record f2 | Pilot Software reset- 或通过 FPGA 进行 BMC 复位:
1c9 | OEM record f2 | FPGA pull BMC whole reset
1ca | OEM record f2 | Pilot AC cycle- 即使通过串行控制台端口,也可能无法访问节点的 BMC