E 系列 BMC 无响应,可能会触发误报硬件警报
适用场景
- NetApp E系列
- SANtricity OS 11.70.1R1或更高版本(11.70.4之前的版本)
- NetApp EF300和EF600
问题描述
- 控制器的BMC (基板 管理 控制器) 无响应
MEL
(主要事件日志)报告如下:
A:10/29/21, 12:35:33 PM (12:35:33) 2800 2868 The controller's BMC was unresponsive and the recovery process successfully
recovered the BMC - Shelf 99, Bay A
A:10/29/21, 12:34:31 PM (12:34:31) 2799 2867 The controller's BMC is unresponsive - Shelf 99, Bay A
- 主要事件日志 还可能报告误报的硬件警报、例如:
A:11/22/21, 11:16:25 AM (11:16:25) 1676 280b Controller shelf component failed - Shelf 99, Controller 1, Fan canister 5, Bay 1 <--CRITICAL
- E系列支持包和AutoSupport (DOM0-BMC-LOogs-%.7z)包含以下BMC事件(sp_system_event_log.txt),表示触发了wathcdog超时重置:
740 | 01/01/2000 | 00:00:30 | Power Supply #0x72 | Presence detected | Asserted
741 | 01/01/2000 | 00:00:30 | Power Supply #0x73 | Presence detected | Asserted
742 | OEM record f2 | Watchdog1 Timeout
743 | OEM record f2 | Pilot Software reset
744 | 01/01/2000 | 00:00:36 | Battery #0x4f | State Deasserted
745 | 01/01/2000 | 00:00:38 | System Event #0xff | Timestamp Clock Sync | Asserted
746 | 11/16/2022 | 19:37:07 | System Event #0xff | Timestamp Clock Sync | Asserted