多风扇故障警报、系统在更换PSU模块后发出噪音
适用场景
- AFF 和 FAS 系统
- ONTAP 9
- 磁盘架
问题描述
- 两个节点都会在事件日志中频繁报告以下警报:
[Node-01: statd: monitor.shelf.fault:debug]: Critical fault reported on disk storage shelf attached to channel 0a. Check fans, power supplies, disks, and temperature sensors.
 [Node-01: statd: monitor.fan.failed:debug]: Multiple fans has failed.
 [Node-01: env_mgr: monitor.fan.warning:debug]: multiple fans have failed. Replace it to avoid overheating
 [Node-01: env_mgr: callhome.c.fan.fru.fault:debug]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failed
- 的输出 storage show fault显示未检测到某个电源:
::> system node run -node * -command storage show fault
Enclosure Status: unrecoverable
 Channel: 0a
 Shelf: 0
 Shelf Type: DS224-12
 Product Serial Number: 952240001855
 Module Type: IOM12E
Power Supplies:
 Element Status      Status Bytes  Status Descriptions
   1: OK         01,00,00,20   RQSTED ON
  2: NOT INSTALLED    05,00,00,20  
Fans:
 Element Status      Status Bytes  Status Descriptions
   1: OK         01,02,EC,26   
   2: OK         01,02,EC,26   
  3: NOT INSTALLED    05,00,00,20  
  4: NOT INSTALLED    05,00,00,20  
Input Power Monitor:
 Element Status      Status Bytes  Status Descriptions
   1: OK         01,00,29,07   
  2: NOT INSTALLED    05,00,00,00  
Power Crest Factor:
 Element Status      Status Bytes  Status Descriptions
   1: OK         01,00,29,07   
  2: NOT INSTALLED    05,00,00,00
- 即使在更换PSU后、SP传感器也无法报告读数:
Sensor Name        State      Current   Critical    Warning    Warning   Critical
                      Reading     Low      Low      High     High
 -------------------------------------------------------------------------------------------------
 SNMP Bad Fan Count            MULTI_FAILED
 Chassis is Under Temp             NO
 Chassis is Over Temp             NO
 PSU2 Bad          invalid       --
 PSU1 Bad                  FALSE
 PSU2            invalid       --
 PSU1                    GOOD
 PSU2 ON                    ON
 PSU1 ON                    ON
 PSU1 INFO                 FRU_AVAIL
 PSU1 INFO                 FRU_AVAIL
 PSU1 FRU                  GOOD
 PSU2 FRU                 MULTIFAULT
 Partner Status              A_SIDE_PRESENT
 PSU1 Present               PRESENT  
 PSU2 Present        not_available    --
 PSU2 5V          not_available    -- mV     --      --      --      --     
 PSU2 12V          not_available    -- mV     --      --      --      --     
 PSU2 5V Curr        not_available    -- mA     --      --      --      --     
 PSU2 12V Curr       not_available    -- mA     --      --      --      --     
 PSU2 Fan 1         not_available    -- RPM    --      --      --      --     
 PSU2 Fan 2         not_available    -- RPM    --      --      --      --     
 PSU2 Inlet Temp      not_available    -- C      0 C      5 C     57 C     62 C    
 PSU2 Hotspot Temp     not_available    -- C      0 C      5 C     90 C     100 C    
 PSU_FAN                  FAIL_2
- 由于未检测到一个PSU风扇、另一个PSU风扇开始更快地旋转、发出噪音。
- SP/BMC已在最新固件版本上。
- 重新启动SP/BMC不会停止警报。
- e0M端口不会受到高流量的影响、如知识库文章所述 :Chassis fan fru failed:Multiple fans have failed Even after ime就是 在升级SP/BMC之后
- 尽管 逐个执行节点的接管/恢复、问题描述仍会继续存在。