由于传感器读数不正确、节点上报告了多个风扇故障警报
适用场景
- ONTAP 9
- AF/FAS系统
问题描述
- 仅在HA对的一个节点上报告以下错误:
[Node-01: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 4 Temp) is not readable.
[Node-01: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 3 Temp) is not readable.
[Node-01: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 2 Temp) is not readable.
[Node-01: env_mgr: monitor.temp.unreadable:error]: The controller temperature (Midplane 1 Temp) is not readable.
[Node-01: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheating
[Node-01: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed. Chassis temperature is too high..
[Node-01: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
- 节点上报告错误的SP传感器值如下:
Sensor Name State Current Critical Warning Warning Critical
Reading Low Low High High
-------------------------------------------------------------------------------------------------
SNMP Bad Fan Count MULTI_FAILED
Chassis is Under Temp invalid --
Chassis is Over Temp YES
PSU2 Bad invalid --
PSU1 Bad invalid --
PSU2 invalid --
PSU1 invalid --
PSU2 ON invalid --
PSU1 ON invalid --
PSU1 INFO FAILED
PSU1 INFO FAILED
PSU1 FRU MULTIFAULT
PSU2 FRU MULTIFAULT
Partner Status failed --
Module B Expander Temp init_failed -- C -- -- -- --
Module A Expander Temp init_failed -- C -- -- -- --
Midplane 4 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 3 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 2 Temp failed -- C 0 C 5 C 47 C 52 C
Midplane 1 Temp failed -- C 0 C 5 C 47 C 52 C
Ambient Temp init_failed -- C -- -- -- --
Internal Shelf failed --
- SP/BMC固件已是最新版本。
- 受影响节点上SP的平均流量负载正常。
- 即使拔下连接到管理端口的缆线、也会出现问题描述。
- 即使重新拔插主板后、传感器值也不会更改。