一个节点报告多个风扇出现故障
适用场景
- FAS2650
- FAS2750
- FAS2720
- ONTAP 9
- 服务处理器 (SP)
- 基板管理控制器(BMC)
问题描述
- HA对中的一个节点在事件日志中报告多个风扇故障:
[Node-02: dsa_worker2: ses.status.temperatureWarning:alert]: DS224-12 (S/N SHFGDXXXX000045) shelf 0 on channel 0b temperature warning for Temperature sensor 12: not installed or failed. Current temperature: 22 C (71 F). This module is on the rear of the shelf at the top left, on shelf module A.
[Node-02: env_mgr: monitor.fan.ok:notice]: All fans are OK.
[Node-02: dsa_worker1: ses.status.temperatureInfo:info]: DS224-12 (S/N SHFGDXXXX000045) shelf 0 on channel 0b temperature information for Temperature sensor 12: normal status.
[Node-02: env_mgr: monitor.fan.ok:notice]: All fans are OK.
[Node-02: monitor: monitor.globalStatus.critical:EMERGENCY]: Chassis temperature is too high..
[Node-02: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheating
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Module B Expander Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Module A Expander Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 4 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 3 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 2 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 1 Temp) is not readable.
[Node-02: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Ambient Temp) is not readable.
[Node-02: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed. Chassis temperature is too high..
[Node-02: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
- 配对节点不会触发任何此类警报。
- 所有电源 均呈绿色闪烁、并且节点正面的琥珀色LED指示灯亮起。
- 在报告错误的节点中、PSU和风扇传感器如下所示:
Sensor Name State Current Critical Warning Warning Critical
Reading Low Low High High
-------------------------------------------------------------------------------------------------
SNMP Bad Fan Count MULTI_FAILED
Chassis is Under Temp invalid --
Chassis is Over Temp YES
PSU1 INFO FAILED
PSU1 INFO FRU_AVAIL
PSU1 FRU MULTIFAULT
PSU2 FRU MULTIFAULT
Module B Expander Temp failed -- C 0 C 5 C 80 C 90 C
Module A Expander Temp failed -- C 0 C 5 C 80 C 90 C
Midplane 4 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 3 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 2 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 1 Temp failed -- C 0 C 5 C 47 C 52 C
Ambient Temp failed -- C 0 C 5 C 47 C 52 C
Internal Shelf not_available --
CPU0 Temp Margin init_failed -- C -- -- 0 C -1 C
PSU1 Present PRESENT
PSU1 5V not_available -- mV -- -- -- --
PSU1 12V not_available -- mV -- -- -- --
PSU1 5V Curr not_available -- mA -- -- -- --
PSU1 12V Curr not_available -- mA -- -- -- --
PSU1 Fan 1 not_available -- RPM -- -- -- --
PSU1 Fan 2 not_available -- RPM -- -- -- --
PSU1 Inlet Temp not_available -- C 0 C 5 C 57 C 62 C
PSU1 Hotspot Temp not_available -- C 0 C 5 C 90 C 100 C
PSU2 Present PRESENT
PSU2 5V not_available -- mV -- -- -- --
PSU2 12V not_available -- mV -- -- -- --
PSU2 5V Curr not_available -- mA -- -- -- --
PSU2 12V Curr not_available -- mA -- -- -- --
PSU2 Fan 1 not_available -- RPM -- -- -- --
PSU2 Fan 2 not_available -- RPM -- -- -- --
PSU2 Inlet Temp not_available -- C 0 C 5 C 57 C 62 C
PSU2 Hotspot Temp not_available -- C 0 C 5 C 90 C 100 C
PSU_FAN not_available --
Module B Expander Temp failed -- C 0 C 5 C 80 C 90 C
Module A Expander Temp failed -- C 0 C 5 C 80 C 90 C
Midplane 4 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 3 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 2 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 1 Temp failed -- C 0 C 5 C 47 C 52 C
Ambient Temp failed -- C 0 C 5 C 47 C 50 C
Internal Shelf not_available