由于AFF A250或FAS500f上的SP检测信号停止而导致系统关闭(BMC 15.4、15.5、15.5P1)
适用场景
- AFF A250
- FAS500f
- 底板管理控制器(BMC) 15.4、15.5、15.5P1
问题描述
- 由于BMC 检测信号已停止、节点重新启动:
Sun Jun 13 21:45:49 +0100 [node-01: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
Sun Jun 13 21:57:32 +0100 [node-01: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
Sun Jun 13 21:57:32 +0100 [node-01: spmgrd: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
Sun Jun 13 22:09:09 +0100 [node-01: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED
Sun Jun 13 22:12:16 +0100 [node-01: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.
Sun Jun 13 22:22:16 +0100 [node-01: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)
- 由于重新启动、配对节点将执行接管
[Node-02: cf_main: cf.fsm.takeover.on.reboot:info]: Failover monitor: One node initiated automatic takeover after detecting that its partner node is rebooting.
- 在其他情况下、节点不会在事件期间记录任何内容、只有配对节点会显示:
Sat Jan 22 18:11:28 +0100 [node-A: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.