系统启动失败,并显示"Resetting SP from primary FW"或"SP IPMI failure"
适用于
- FAS2620,FAS2650
- FAS2720,FAS2750
- AFF C190
- AFF A150,A220
- AFF A300 / FAS8200
- AFF A900
问题描述
- 节点关闭,EMS 日志显示 SP HBT MISSED 或 SP HBT STOPPED。
[nodename: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed: Sysfan1 F1, Sysfan1 F2, Sysfan2 F1, Sysfan2 F2. Power Supply Status Critical: PSU1.
[nodename: monitor: monitor.globalStatus.critical:EMERGENCY]: Power Supply Status Critical: PSU1.
[nodename: cphmd: hm.alert.cleared:notice]: Alert Id = CriticalFruMultiFaultAlert , Alerting Resource = XXXXXXXXXXXX cleared by monitor chassis
[Nodename: spsm_listener: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
[Nodename: spsm_listener: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED
- 或 Service Processor (SP) 报告控制台日志中的
SP load is high错误,节点关闭。
[SP.notice]: SP load is high: 3.12 2.59 2.02
[SP.notice]: SP load is high: 3.54 2.90 2.21
[IPMI.notice]: e601 | 02 | EVT: 0301ffff | Attn_Sensor1 | Assertion Event, "State Asserted"
[SP.emergency]: SP reset initiated by storage controller
[IPMI.notice]: e701 | c0 | OEM: ffff70005000 | ManufId: 150300 | SP Reset Externally
[IPMI.notice]: e801 | c0 | OEM: fcff70000000 | ManufId: 150300 | POS Register: Unexpected Reset
- 节点从 SP 引导失败:
Warning: Unable to list entries on node-01. RPC: Couldn't make connection [from mgwd on
node "Node-02" (VSID: -1) to mgwd at xxx.xxx.xxx.xxx]
Error: command failed: RPC: Couldn't make connection [from mgwd on node "Node-02" (VSID: -1) to
mgwd at xxx.xxx.xxx.xxx]
- 节点从 Loader 引导失败,错误:
LOADER-A> boot_ontap
Loading X86_64/freebsd・・・
Loading X86_64/freebsd・・・
Starting program at ・・・
NetApp Data ONTAP 9.3P4
***************************************
This platform is not supported in this release.
The system will now halt
***************************************
BIOS Version: 11.1
Portions Copyright (C) 2014-2017 NetApp, Inc. All Rights Reserved.
Initializing System Memory ...
Loading Device Drivers ...
Waiting for SP ...
SP failure. Resetting SP from primary FW. This can take a few minutes
- 或 -
Failed to recover SP
IPMI:Get controller FRU inventory:failed
IPMI:Get midplane FRU 0 inventory:failed
Configuring Devices ...
IPMI PCI Slot Control failed.
CPU = 1 Processor(s) Detected.
Intel(R) Xeon(R) CPU D-1587 @ 1.70GHz (CPU 0)
CPUID: 0x00050664. Cores per Processor = 16
131072 MB System RAM Installed.
SATA (AHCI) Device: SV9MST6D120GLM41NP
Boot Loader version 6.0.10
Copyright (C) 2000-2003 Broadcom Corporation.
Portions Copyright (C) 2002-2020 NetApp, Inc. All Rights Reserved.
BIOS POST Failure(s) detected: SP IPMI failure. Abort AUTOBOOT
- 即使更换主板后,关闭的控制器仍无法启动并出现相同错误
- SP
events all消息:
Record 231: Sun Aug 1 00:25:04 2021 [SysFW.notice]: Failed to recover SP
Record 232: Sun Aug 1 00:25:04 2021 [SysFW.critical]: IPMI:Get controller FRU inventory:failed
Record 233: Sun Aug 1 00:25:04 2021 [SysFW.notice]: IPMI:Get midplane FRU 0 inventory:failed
Record 234: Thu Jan 1 00:05:00 1970 [Trap Event.critical]: hwassist post_error (26)
- SP
events all登录配对节点消息:
Sat Oct 15 13:05:38 2016 [Agent.notice]: Local Serial Exchange Error Internal MLER[4] asserted
Mon Oct 17 08:38:52 2016 [Agent.notice]: Local Invalid Serial Exchange Bus Internal MLER[5] asserted
Thu Jan 01 00:00:36 1970 [Agent.notice]: Midplane I2C Local Buffers Not Ready Internal MLER[6] de-asserted
Mon Oct 17 08:52:11 2016 [Agent.notice]: Midplane Local Grant Timeout Internal MLER[2] asserted