机箱温度过高关闭- AutoSupport消息
适用场景
- AFF系统
- ASA系统
- FAS系统
- ONTAP 9
- CHOTSD:来自<node>的HA组通知(机箱温度过高关闭)紧急
- callhome.chassis.hitemp
- callhome.chassis.overtemp
事件摘要
[node02: env_mgr: callhome.chassis.hitemp:error]: Call home for CHASSIS OVER TEMPERATURE
如果机箱温度太高、则会显示上述消息。
- 此信息会在系统关闭之前显示、并指示系统中存在潜在的环境问题或硬件故障、例如风扇故障或温度传感器故障。
- 系统应位于环境温度在 系统工作范围内的数据中心内。请查看Hardware Universe以了解具体的平台要求。
验证
事件日志
- 运行
event log show -severity * -Message-name *temperature*
[node01: env_mgr: monitor.chassisTemperature.warm:alert]: Chassis temperature is too warm: Midplane 4 Temp is warning high (47 C).
[node01: env_mgr: monitor.chassisTemperature.warm:alert]: Chassis temperature is too warm: Midplane 3 Temp is warning high (47 C).
[node01: env_mgr: callhome.chassis.hitemp:error]: Call home for CHASSIS OVER TEMPERATURE
[node01: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (Temperature critical)
[node01: env_mgr: monitor.shutdown.chassisOverTemp:EMERGENCY]: Chassis temperature is too hot: Ambient temperature is warning high. System will be shutdown in 2 minutes[node01: env_mgr: monitor.chassisTemperature.ok:notice]: Chassis temperature is ok.
[node02: env_mgr: monitor.chassisTemperature.warm:alert]: Chassis temperature is too warm: Midplane 4 Temp is warning high (47 C).
[node02: env_mgr: monitor.chassisTemperature.warm:alert]: Chassis temperature is too warm: Midplane 3 Temp is warning high (47 C). [node01: dsa_worker3: callhome.shlf.overtemp:error]: Call home for SHELF OVER TEMPERATURE 43 C (109 F)
[node01: dsa_worker3: ses.status.temperatureError:critical]: DS4246 (S/N SHJMS000000011A) shelf 0 on channel 0a temperature error for Temperature sensor 1: critical status; overtemperature failure. Current temperature: 43 C (109 F). This module is on the front of the shelf on the left, on the OPS panel.
[node01: env_mgr: monitor.shutdown.chassisOverTemp:critical]: Chassis temperature is too hot: System will be shutdown in 2 minutes
解决方法
- 检查数据中心温度。如果冷却系统正常工作:
- 请查看Hardware Universe以了解具体的平台要求
- 最佳实践是将SP 或BMC固件升级到最新版本、因为警报可能是由于固件问题描述引起的:更新ONTAP服务处理器(SP)或基板管理控制器(BMC)的步骤
- 在ONTAP中、按照如何在断电或高温关闭后启动节点
- 如果问题描述仍然存在、并且数据中心温度已确认在可接受的操作范围内、请联系支持部门