E系列BMC无响应、并可能触发误报硬件警报
适用场景
- NetApp E系列
- SANtricity操作系统版本为11.70.1R1 11.70.4 (BMC固件早于14.10)
- NetApp EF300和EF600
问题描述
-
MEL
(主要事件日志)中报告控制器的BMC (基板管理控制器)无响应:
A:10/29/21, 12:35:33 PM (12:35:33) 2800 2868 The controller's BMC was unresponsive and the recovery process successfully
recovered the BMC - Shelf 99, Bay A
A:10/29/21, 12:34:31 PM (12:34:31) 2799 2867 The controller's BMC is unresponsive - Shelf 99, Bay A
- 主要事件日志还可能报告误报的硬件警报、例如:
A:11/22/21, 11:16:25 AM (11:16:25) 1676 280b Controller shelf component failed - Shelf 99, Controller 1, Fan canister 5, Bay 1 <--CRITICAL
- E系列支持包和AutoSupport (DOm0-wathcdog BMC logs-%.7z)包含以下BMC事件( SP_system_event_log.txt),表示触发了wathcdog超时重置:
740 | 01/01/2000 | 00:00:30 | Power Supply #0x72 | Presence detected | Asserted
741 | 01/01/2000 | 00:00:30 | Power Supply #0x73 | Presence detected | Asserted
742 | OEM record f2 | Watchdog1 Timeout
743 | OEM record f2 | Pilot Software reset
744 | 01/01/2000 | 00:00:36 | Battery #0x4f | State Deasserted
745 | 01/01/2000 | 00:00:38 | System Event #0xff | Timestamp Clock Sync | Asserted
746 | 11/16/2022 | 19:37:07 | System Event #0xff | Timestamp Clock Sync | Asserted