运行状况监控器报告的ShelfPSUFailure_Alert
适用场景
- FAS/AFA系统
- 磁盘架
- 电源设备(PSU)
- 运行状况监控进程schm:ShelfPSUFailure_Alert
问题描述
- 有时会因电源问题呼叫家庭
- 事件日志中会报告以下警报:
[Node-02: schmd: hm.alert.raised:alert]: Alert Id = ShelfPSUFailure_Alert , Alerting Resource = 16350XXXXXXXX448 raised by monitor system-connect
[Node-02: statd: monitor.shelf.fault:alert]: Critical fault reported on disk storage shelf attached to channel 0b. Check fans, power supplies, disks, and temperature sensors.
- 输出
storage show fault
显示:
Enclosure Status: unrecoverable
Channel: 0b
Shelf: 11
Shelf Type: DS224-12
Product Serial Number: SHFFGXXXXXXXXX
Module Type: IOM12
Power Supplies:
Element Status Status Bytes Status Descriptions
1: CRITICAL 02,00,00,F3 DC FAIL, AC FAIL, OFF, RQSTED ON, FAIL
2: OK 01,00,00,20 RQSTED ON
- 磁盘架日志中包含PSU的问题描述:
Tue Mar 5 02:05:34 2024 ( 730+01:52:26.196); 030B004F; M0; ENC_MGT; power_manager; 02; HAL indicates PSU FAILURE fault on PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.196); 030B0050; M0; ENC_MGT; power_manager; 02; HAL indicates PSU TURNED OFF fault on PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.196); 030B0053; M0; ENC_MGT; power_manager; 02; HAL indicates PSU AC FAILURE fault on PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B006F; M0; ENC_MGT; power_manager; 02; PCM 1 PCM FAILURE Fault Detected Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B0070; M0; ENC_MGT; power_manager; 02; Re-asserting FAIL NON REDUNDANT alarm for PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B006F; M0; ENC_MGT; power_manager; 02; PCM 1 TURNED OFF Fault Detected Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B0072; M0; ENC_MGT; power_manager; 02; Setting AC MISSING NON REDUNDANT alarm for PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B006F; M0; ENC_MGT; power_manager; 02; PCM 1 AC FAILURE Fault Detected Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B0070; M0; ENC_MGT; power_manager; 02; Re-asserting AC MISSING NON REDUNDANT alarm for PCM 1 Tue Mar 5 02:05:34 2024 ( 730+01:52:26.227); 030B005D; M0; ENC_MGT; power_manager; 04; PCM 1 faults indicate loss of local fan power Tue Mar 5 02:05:39 2024 ( 730+01:52:31.233); 030B0060; M0; ENC_MGT; power_manager; 04; PCM 1 local fan power restored Tue Mar 5 02:05:39 2024 ( 730+01:52:31.233); 030B0084; M0; ENC_MGT; power_manager; 02; Clearing PSU AC Missing (non-redundant) alarm
- 即使重新拔插PSU、问题描述仍然存在。