系统无法启动,并显示 "Failed to recover SP"
适用场景
- AFF A220/FAS27x0/FASC190 AFF
- AFF A200/FAS26x0
- FAS80
- AFF A300/FAS8200
- AFF A700/FAS9000
问题描述
- 存储系统会重新启动(例如在ONTAP升级期间)、但 无法启动、并暂停到加载程序。示例:
...
Waiting for SP ...
SP failure. Resetting SP from primary FW. This can take a few minutes
Waiting for SP ...
SP failure. Resetting SP from backup FW. This can take a few minutes
Waiting for SP ...
Failed to recover SP
IPMI PCI Slot Control failed.
IPMI PCI Slot Configuration failed.
Configuring Devices ...
IPMI:Get controller FRU inventory:failed
IPMI:Get midplane FRU 0 inventory:failed
IPMI: Get NVRAM FRU inventory:failed
BIOS POST Failure(s) detected: SP IPMI failure. Abort AUTOBOOT
LOADER-A>
BIOS POST Failure(s) detected: Failed to get FRU data. Abort AUTOBOOT
- 服务处理器(SP)事件 日志会报告类似的故障消息。示例:
Record 1287: Tue Apr 14 14:34:05.000000 2020 [SysFW.notice]: IPMI:Read midplane FRU common header:timeout - retrying
Record 1288: Tue Apr 14 14:34:10.000000 2020 [SysFW.notice]: IPMI:Read midplane FRU common header:timeout
Record 1289: Tue Apr 14 14:34:13.000000 2020 [SysFW.notice]: Failed to recover SP
Record 1290: Tue Apr 14 14:34:13.000000 2020 [SysFW.critical]: IPMI:Read midplane FRU common header:failed
Record 1291: Sun Jan 01 00:02:58.340000 2017 [Trap Event.critical]: hwassist post_error (26)
Record 1292: Tue Apr 14 14:34:14.000000 2020 [SysFW.critical]: IPMI PCI Slot Control failed.
Record 1293: Sun Jan 01 00:02:59.310000 2017 [Trap Event.critical]: hwassist post_error (26)
Record 1294: Tue Apr 14 14:34:18.000000 2020 [CFE.notice]: Loader time adjust: Set BMC time. Old time: Sun Jan 1 00:03:03 2017. New time: Tue Apr 14 14:34:18 2020.
Record 1295: Tue Apr 14 14:34:18.000000 2020 [Boot Loader.notice]: Received time sync
Record 1296: Tue Apr 14 14:34:20.000000 2020 [Boot Loader.critical]: Abort Autoboot due to BIOS POST failure.
Record 1297: Tue Apr 14 14:34:20.280000 2020 [Trap Event.critical]: hwassist post_error (26)
Record 1298: Tue Apr 14 14:34:24.020000 2020 [IPMI.notice]: 001c | 02 | EVT: 6fc220ff | System_FW_Status | Assertion Event, "Bootloader is running"
- 重新安装控制器后、问题描述仍会保持不变、并且没有任何缆线连接到e0M/SP (以排除 因"Sp-IPMI Failure"而导致节点关闭和启动失败中所述的SP流量过多问题)。