由于 SP 心跳停止导致意外的控制器接管 BMC 15.13
适用于
- AFF A250、C250 或 FAS500
- BMC 固件版本 15.13
问题
一个控制器发生了意外的自动接管。
该事件是由于某个节点上的 Service Processor (SP) 心跳停止,导致强制重启以恢复 BMC。
EMS 日志:Fri Jan 02 23:37:50 +0800 [Node2: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPEDFri Jan 02 23:40:08 +0800 [Node2: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.Fri Jan 02 23:50:08 +0800 [Node2: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)Fri Jan 02 23:50:28 +0800 [Node2: shutdown_thread0: ha.localNodeShutDown:notice]: Shutdown of the local node has been initiated with inhibit_takeover set to FALSE.system log sel : 9e8 | 01/02/2026 | 15:55:26 | System Event #0xff | Timestamp Clock Sync | Asserted 9e9 | 01/02/2026 | 15:55:26 | System Event | Timestamp Clock Sync | Asserted 9ea | 01/02/2026 | 15:55:26 | Battery #0x4a | State Deasserted 9eb | 01/02/2026 | 15:55:26 | Battery #0x4b | State Asserted 9ec | 01/02/2026 | 15:55:26 | Battery #0x4c | State Asserted 9ed | 01/02/2026 | 15:55:26 | Battery #0x4d | State Deasserted 9ee | 01/02/2026 | 15:55:26 | Other FRU #0x50 | 9ef | 01/02/2026 | 15:55:26 | Other FRU #0x50 | 9f0 | 01/02/2026 | 15:55:26 | Other FRU #0x50 | 9f1 | 01/02/2026 | 15:55:26 | Other FRU #0x50 | 9f2 | 01/02/2026 | 15:55:43 | Battery #0x4a | State Deasserted 9f3 | 01/02/2026 | 15:55:43 | Battery #0x4b | State Asserted 9f4 | 01/02/2026 | 15:55:43 | Battery #0x4c | State Asserted 9f5 | 01/02/2026 | 15:55:43 | Battery #0x4d | State Deasserted 9f6 | 01/02/2026 | 15:55:43 | Battery #0x4f | State Deasserted 9f7 | 01/02/2026 | 15:55:43 | Other FRU #0x50 | 9f8 | 01/02/2026 | 15:55:43 | Other FRU #0x50 | 9f9 | 01/02/2026 | 15:55:43 | Other FRU #0x50 | 9fa | 01/02/2026 | 15:55:43 | Other FRU #0x50 | 9fb | 01/02/2026 | 15:55:44 | Power Supply #0x20 | Presence detected | Asserted 9fc | 01/02/2026 | 15:55:44 | Power Supply #0x25 | Presence detected | Asserted 9fd | 01/02/2026 | 15:55:44 | Power Supply #0x72 | Presence detected | Asserted 9fe | 01/02/2026 | 15:55:44 | Power Supply #0x73 | Presence detected | Asserted 9ff | 01/02/2026 | 15:55:45 | OEM record df | FPGA pull BMC whole reset a00 | 01/02/2026 | 15:55:46 | OEM record df | Pilot FPGA AC cycle a01 | 01/02/2026 | 15:55:51 | OEM record c0 | 000000 | 000105000000 a02 | 01/02/2026 | 15:55:59 | Critical Interrupt #0x31 | Bus Correctable error | Asserted a03 | 01/02/2026 | 15:55:59 | Critical Interrupt #0x31 | Bus Correctable error | Asserted a04 | 01/02/2026 | 15:55:59 | Critical Interrupt #0x31 | Bus Correctable error | Asserted a05 | 01/02/2026 | 15:55:59 | Critical Interrupt #0x31 | Bus Correctable error | Asserted