跳转到主内容

由于 SP 心跳停止导致意外的控制器接管 BMC 15.13

Views:
46
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
hw
Last Updated:

适用于

  • AFF A250、C250 或 FAS500
  • BMC 固件版本 15.13

问题

一个控制器发生了意外的自动接管。
该事件是由于某个节点上的 Service Processor (SP) 心跳停止,导致强制重启以恢复 BMC。
EMS 日志:
Fri Jan 02 23:37:50 +0800 [Node2: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED
Fri Jan 02 23:40:08 +0800 [Node2: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.
Fri Jan 02 23:50:08 +0800 [Node2: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)
Fri Jan 02 23:50:28 +0800 [Node2: shutdown_thread0: ha.localNodeShutDown:notice]: Shutdown of the local node has been initiated with inhibit_takeover set to FALSE.
sp system log sel
 9e8 | 01/02/2026 | 15:55:26 | System Event #0xff | Timestamp Clock Sync | Asserted
 9e9 | 01/02/2026 | 15:55:26 | System Event | Timestamp Clock Sync | Asserted
 9ea | 01/02/2026 | 15:55:26 | Battery #0x4a | State Deasserted
 9eb | 01/02/2026 | 15:55:26 | Battery #0x4b | State Asserted
 9ec | 01/02/2026 | 15:55:26 | Battery #0x4c | State Asserted
 9ed | 01/02/2026 | 15:55:26 | Battery #0x4d | State Deasserted
 9ee | 01/02/2026 | 15:55:26 | Other FRU #0x50 |
 9ef | 01/02/2026 | 15:55:26 | Other FRU #0x50 |
 9f0 | 01/02/2026 | 15:55:26 | Other FRU #0x50 |
 9f1 | 01/02/2026 | 15:55:26 | Other FRU #0x50 |
 9f2 | 01/02/2026 | 15:55:43 | Battery #0x4a | State Deasserted
 9f3 | 01/02/2026 | 15:55:43 | Battery #0x4b | State Asserted
 9f4 | 01/02/2026 | 15:55:43 | Battery #0x4c | State Asserted
 9f5 | 01/02/2026 | 15:55:43 | Battery #0x4d | State Deasserted
 9f6 | 01/02/2026 | 15:55:43 | Battery #0x4f | State Deasserted
 9f7 | 01/02/2026 | 15:55:43 | Other FRU #0x50 |
 9f8 | 01/02/2026 | 15:55:43 | Other FRU #0x50 |
 9f9 | 01/02/2026 | 15:55:43 | Other FRU #0x50 |
 9fa | 01/02/2026 | 15:55:43 | Other FRU #0x50 |
 9fb | 01/02/2026 | 15:55:44 | Power Supply #0x20 | Presence detected | Asserted
 9fc | 01/02/2026 | 15:55:44 | Power Supply #0x25 | Presence detected | Asserted
 9fd | 01/02/2026 | 15:55:44 | Power Supply #0x72 | Presence detected | Asserted
 9fe | 01/02/2026 | 15:55:44 | Power Supply #0x73 | Presence detected | Asserted
 9ff | 01/02/2026 | 15:55:45 | OEM record df | FPGA pull BMC whole reset
 a00 | 01/02/2026 | 15:55:46 | OEM record df | Pilot FPGA AC cycle
 a01 | 01/02/2026 | 15:55:51 | OEM record c0 | 000000 | 000105000000
 a02 | 01/02/2026 | 15:55:59 | Critical Interrupt #0x31 | Bus Correctable error | Asserted
 a03 | 01/02/2026 | 15:55:59 | Critical Interrupt #0x31 | Bus Correctable error | Asserted
 a04 | 01/02/2026 | 15:55:59 | Critical Interrupt #0x31 | Bus Correctable error | Asserted
 a05 | 01/02/2026 | 15:55:59 | Critical Interrupt #0x31 | Bus Correctable error | Asserted

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.