ONTAP 报告多个 SHELF_FAULT 和 SHELF COOLING UNIT FAILED
适用于
- ONTAP 9
- DS460C (DS460-12),NS224NSM100
问题描述
- 在 AutoSupports 中出现多个 SHELF_FAULT 和 SHELF COOLING UNIT FAILED:
HA Group Notification (SHELF_FAULT) ERROR.
HA Group Notification (SHELF COOLING UNIT FAILED) EMERGENCY
- 货架故障和货架冷却单元故障错误会在短时间内恢复并变为正常。
示例:
[?] Mon Feb 20 19:47:52 +0800 [n19911002-01: dsa_worker5: ses.status.fanInfo:info]: DS460-12 (S/N xxxx) shelf 20 on channel 0a cooling fan information for Cooling element 1: normal status.
[?] Mon Feb 20 19:47:52 +0800 [n19911002-01: dsa_worker5: ses.status.fanInfo:info]: DS460-12 (S/N xxxx) shelf 20 on channel 0a cooling fan information for Cooling element 2: normal status.
[?] Mon Feb 20 19:47:52 +0800 [n19911002-01: dsa_worker5: ses.status.fanInfo:info]: DS460-12 (S/N xxxx) shelf 20 on channel 0a cooling fan information for Cooling element 3: normal status.
[?] Mon Feb 20 19:47:52 +0800 [n19911002-01: dsa_worker5: ses.status.fanInfo:info]: DS460-12 (S/N xxxx) shelf 20 on channel 0a cooling fan information for Cooling element 4: normal status.
[?] Mon Feb 20 19:47:52 +0800 [n19911002-01: dsa_worker5: ses.status.fanInfo:info]: DS460-12 (S/N xxxx) shelf 20 on channel 0a cooling fan information for Cooling element 5: normal status.
[?] Mon Feb 20 19:47:52 +0800 [n19911002-01: dsa_worker5: ses.status.fanInfo:info]: DS460-12 (S/N xxxx) shelf 20 on channel 0a cooling fan information for Cooling element 6: normal status.
[?] Mon Feb 20 19:47:52 +0800 [n19911002-01: dsa_worker5: ses.status.fanInfo:info]: DS460-12 (S/N xxxx) shelf 20 on channel 0a cooling fan information for Cooling element 7: normal status.
[?] Mon Feb 20 19:47:52 +0800 [n19911002-01: dsa_worker5: ses.status.fanInfo:info]: DS460-12 (S/N xxxx) shelf 20 on channel 0a cooling fan information for Cooling element 8: normal status.
[?] Mon Feb 20 19:48:00 +0800 [n19911002-01: monitor: monitor.globalStatus.ok:notice]: The system's global status is normal.
[?] Tue Feb 21 10:36:51 +0800 [n19911002-01: statd: monitor.shelf.fault.ok:notice]: Fault previously reported on disk storage shelf attached to channel 0a has been corrected.
[?] Tue Feb 21 10:37:00 +0800 [n19911002-01: monitor: monitor.globalStatus.ok:notice]: The system's global status is normal.
environment的输出中部分或全部冷却单元出现故障。Cooling Unit installed element list: 1, 2, 3, 4, 5, 6, 7, 8; with error: 1, 2, 3, 4, 5, 6, 7, 8
- 如果机架中存在使用磁盘的 Active File System 聚合,则系统可能会发生多磁盘崩溃事件:
Dec 05 14:13:56 [NODE-01:mgr.boot.reason_abnormal:EMERGENCY]: System rebooted after a panic. PANIC : aggr aggr_name: raid volfsm, fatal multi-disk error..
。