"机箱风扇 FRU 故障:多个风扇出现故障"
适用于
- ONTAP 9
- FAS/AFF 系统
问题描述
- 琥珀色 LED 亮起
- EMS 报告来自两个节点的风扇和 PSU 的多个错误:-
Fri May 20 20:34:35 +1000 [hocnas-01: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheatingFri May 20 20:34:38 +1000 [hocnas-01: dsa_worker1: callhome.shlf.fan:EMERGENCY]: Call home for SHELF COOLING UNIT FAILEDFri May 20 20:35:00 +1000 [hocnas-01: monitor: monitor.globalStatus.critical:EMERGENCY]: Multiple fans has failed. Disk shelf fault.Fri May 20 20:35:33 +1000 [hocnas-01: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failedFri May 20 20:41:01 +1000 [hocnas-01: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheatingFri May 20 20:41:31 +1000 [hocnas-01: env_mgr: monitor.fan.ok:notice]: All fans are OK.Fri May 20 20:44:11 +1000 [hocnas-01: cphmd: hm.alert.raised:alert]: Alert Id = CriticalFruMultiFaultAlert , Alerting Resource = PSQ094174100430 raised by monitor chassisFri May 20 20:44:11 +1000 [hocnas-01: cphmd: hm.alert.raised:alert]: Alert Id = CriticalFruMultiFaultAlert , Alerting Resource = PSQ094174101084 raised by monitor chassisSat May 21 15:00:00 +1000 [hocnas-02: statd: monitor.shelf.fault:alert]: Critical fault reported on disk storage shelf attached to channel 0a. Check fans, power supplies, disks, and temperature sensors.Sat May 21 15:00:00 +1000 [hocnas-02: statd: monitor.fan.failed:alert]: Multiple fans has failed.Sat May 21 15:41:42 +1000 [hocnas-02: env_mgr: monitor.fan.warning:notice]: multiple fans have failed. Replace it to avoid overheatingSat May 21 15:42:12 +1000 [hocnas-02: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: Multiple fans have failedSat May 21 15:47:44 +1000 [hocnas-02: dsa_worker1: callhome.shlf.fan:EMERGENCY]: Call home for SHELF COOLING UNIT FAILED
Or
[ses.status.fanError:EMERGENCY]: (EMS parameters: prodChannel="DS224-12 (S/N *****) shelf 0 on channel 0b" typeText="Cooling element" fanNumber="3" errorMsg="critical status" errorText="; fan is off" locationText="This module is on the rear of the shelf on the lower right power supply")
[callhome.shlf.fan:EMERGENCY]: (EMS parameters: subject="SHELF COOLING UNIT FAILED")
[ses.status.fanError:EMERGENCY]: (EMS parameters: prodChannel="DS224-12 (S/N *****) shelf 0 on channel 0b" typeText="Cooling element" fanNumber="4" errorMsg="critical status" errorText="; fan is off" locationText="This module is on the rear of the shelf on the lower right power supply")
[monitor.globalStatus.critical:EMERGENCY]: (EMS parameters: problem="Multiple fans has failed. ")- 节点 2 的 SP 报告为 PSU_present 为 Absent:
Sensor Name | Current | Unit | Status | LCR | LNC | UNC | UCR
----------------- ------------ ------------ ------------ ----------- ----------- ----------- -----------
PSU1_Present | 0x0 | discrete | Absent | na | na | na | na
PSU1_Fault | na | discrete | na | na | na | na | na
PSU2_Present | 0x0 | discrete | Absent | na | na | na | na
PSU2_Fault | na | discrete | na | na | na | na | na