CFSHELF-1826: ses.status.temperatureError (...) 磁盘架中温度传感器 1 的温度错误
问题描述
- OPS 前面板中的货架注意 LED 亮起。
- AutoSupport 环境输出,该传感器中报告了"故障" [1])。示例:
Channel: 0a
Shelf: 0
SES device path: local access: 0a.00.99
Module type: IOM12; monitoring is active
Shelf status: critical condition
...
Temperature Sensor installed element list: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11; with error: 1
Shelf temperatures by element:
[1] 128 C (262 F) (ambient) Overtemperature failure!
[2] 20 C (68 F) Normal temperature range
...
[11] 31 C (87 F) Normal temperature range
- ONTAP 事件消息示例:
::> event log show -event *shelf*
Time Node Severity Event
------------------- ---------------- ------------- ---------------------------
1/2/2025 10:35:00 node_name EMERGENCY monitor.globalStatus.critical: Disk shelf fault.
1/2/2025 10:34:28 node_name ALERT monitor.shelf.fault: Critical fault reported on disk storage shelf attached to channel 0b. Check fans, power supplies, disks, and temperature sensors.
1/2/2025 10:34:19 node_name ERROR ses.status.temperatureError: DS224-12 (S/N SHFHU2048000395) shelf 0 on channel 0c temperature error for Temperature sensor 1: critical status; overtemperature failure. Current temperature: 128 C (262 F). This module is on the front of the shelf on the left, on the OPS panel.
1/2/2025 10:33:52 node_name DEBUG stackmon.shelf.discovery.complete: One or more shelves have been discovered.
...