多风扇故障警报、系统在更换PSU模块后发出噪音
适用场景
- AFF 和 FAS 系统
- ONTAP 9
- 磁盘架
问题描述
- 两个节点都会在事件日志中频繁报告以下警报:
[Node-01: statd: monitor.shelf.fault:debug]: Critical fault reported on disk storage shelf attached to channel 0a. Check fans, power supplies, disks, and temperature sensors.
[Node-01: statd: monitor.fan.failed:debug]: Multiple fans has failed.
[Node-01: env_mgr: monitor.fan.warning:debug]: multiple fans have failed. Replace it to avoid overheating
[Node-01: env_mgr: callhome.c.fan.fru.fault:debug]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
- 的输出
storage show fault
显示未检测到某个电源:
::> system node run -node * -command storage show fault
Enclosure Status: unrecoverable
Channel: 0a
Shelf: 0
Shelf Type: DS224-12
Product Serial Number: 952240001855
Module Type: IOM12E
Power Supplies:
Element Status Status Bytes Status Descriptions
1: OK 01,00,00,20 RQSTED ON
2: NOT INSTALLED 05,00,00,20
Fans:
Element Status Status Bytes Status Descriptions
1: OK 01,02,EC,26
2: OK 01,02,EC,26
3: NOT INSTALLED 05,00,00,20
4: NOT INSTALLED 05,00,00,20
Input Power Monitor:
Element Status Status Bytes Status Descriptions
1: OK 01,00,29,07
2: NOT INSTALLED 05,00,00,00
Power Crest Factor:
Element Status Status Bytes Status Descriptions
1: OK 01,00,29,07
2: NOT INSTALLED 05,00,00,00
- 即使在更换PSU后、SP传感器也无法报告读数:
Sensor Name State Current Critical Warning Warning Critical
Reading Low Low High High
-------------------------------------------------------------------------------------------------
SNMP Bad Fan Count MULTI_FAILED
Chassis is Under Temp NO
Chassis is Over Temp NO
PSU2 Bad invalid --
PSU1 Bad FALSE
PSU2 invalid --
PSU1 GOOD
PSU2 ON ON
PSU1 ON ON
PSU1 INFO FRU_AVAIL
PSU1 INFO FRU_AVAIL
PSU1 FRU GOOD
PSU2 FRU MULTIFAULT
Partner Status A_SIDE_PRESENT
PSU1 Present PRESENT
PSU2 Present not_available --
PSU2 5V not_available -- mV -- -- -- --
PSU2 12V not_available -- mV -- -- -- --
PSU2 5V Curr not_available -- mA -- -- -- --
PSU2 12V Curr not_available -- mA -- -- -- --
PSU2 Fan 1 not_available -- RPM -- -- -- --
PSU2 Fan 2 not_available -- RPM -- -- -- --
PSU2 Inlet Temp not_available -- C 0 C 5 C 57 C 62 C
PSU2 Hotspot Temp not_available -- C 0 C 5 C 90 C 100 C
PSU_FAN FAIL_2
- 由于未检测到一个PSU风扇、另一个PSU风扇开始更快地旋转、发出噪音。
- SP/BMC已在最新固件版本上。
- 重新启动SP/BMC不会停止警报。
- e0M端口不会受到高流量的影响、如知识库文章所述 :Chassis fan fru failed:Multiple fans have failed Even after ime就是 在升级SP/BMC之后
- 尽管 逐个执行节点的接管/恢复、问题描述仍会继续存在。