由于 AFF A250 或 FAS500f 上的 SP 检测信号停止而导致系统关闭( BMC 15.3 及更早版本)
- Views:
- 19
- Visibility:
- Public
- Votes:
- 0
- Category:
- fas-systems<a>BMC</a><a>IPMI</a><a>检测信号</a><a>SP HBT 已停止</a><a>SP HBT 未命中</a><a>2008719888</a>
- Specialty:
- hw
- Last Updated:
适用场景
- AFF A250
- FAS500f
- 基板管理控制器( BMC ) 15.1P1 , 15.3 和 15.3
问题描述
- 由于 BMC 检测信号已停止,节点重新启动:
21:45:49 +0100 [node-01: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
21:57:32 +0100 [node-01: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
21:57:32 +0100 [node-01: spmgrd: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
22:09:09 +0100 [node-01: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED
22:12:16 +0100 [node-01: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.
22:22:16 +0100 [node-01: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)
- 由于重新启动,配对节点将执行接管
[Node-02: cf_main: cf.fsm.takeover.on.reboot:info]: Failover monitor: One node initiated automatic takeover after detecting that its partner node is rebooting.
- 在某些情况下、节点在事件期间不会记录任何内容、只有配对节点会报告:
18:11:28 +0100 [node-A: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.