机箱温度过高关闭 - AutoSupport 消息
适用于
- AFF 系统
- ASA 系统
- FAS 系统
- ONTAP 9
- CHOTSD:HA 组通知来自 <node>(机箱过温关机)紧急
- callhome.chassis.hitemp
- callhome.chassis.overtemp
事件摘要
[node02: env_mgr: callhome.chassis.hitemp:error]: Call home for CHASSIS OVER TEMPERATURE
当机箱温度过热时,会出现上述消息。
- 该消息出现在系统关闭之前,表明系统中存在潜在的环境问题或硬件故障,例如风扇故障或温度传感器故障。
- 系统应位于环境温度在系统运行范围内的数据中心。请查看Hardware Universe以了解具体的平台要求。
验证
事件日志
- 运行
event log show -severity * -Message-name *temperature*
[node01: env_mgr: monitor.chassisTemperature.warm:alert]: Chassis temperature is too warm: Midplane 4 Temp is warning high (47 C).
[node01: env_mgr: monitor.chassisTemperature.warm:alert]: Chassis temperature is too warm: Midplane 3 Temp is warning high (47 C).
[node01: env_mgr: callhome.chassis.hitemp:error]: Call home for CHASSIS OVER TEMPERATURE
[node01: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (Temperature critical)
[node01: env_mgr: monitor.shutdown.chassisOverTemp:EMERGENCY]: Chassis temperature is too hot: Ambient temperature is warning high. System will be shutdown in 2 minutes[node01: env_mgr: monitor.chassisTemperature.ok:notice]: Chassis temperature is ok.
[node02: env_mgr: monitor.chassisTemperature.warm:alert]: Chassis temperature is too warm: Midplane 4 Temp is warning high (47 C).
[node02: env_mgr: monitor.chassisTemperature.warm:alert]: Chassis temperature is too warm: Midplane 3 Temp is warning high (47 C). [node01: dsa_worker3: callhome.shlf.overtemp:error]: Call home for SHELF OVER TEMPERATURE 43 C (109 F)
[node01: dsa_worker3: ses.status.temperatureError:critical]: DS4246 (S/N SHJMS000000011A) shelf 0 on channel 0a temperature error for Temperature sensor 1: critical status; overtemperature failure. Current temperature: 43 C (109 F). This module is on the front of the shelf on the left, on the OPS panel.
[node01: env_mgr: monitor.shutdown.chassisOverTemp:critical]: Chassis temperature is too hot: System will be shutdown in 2 minutes
解决方法
- 检查数据中心温度。冷却系统是否正常工作?
- 检查Hardware Universe以了解特定的平台要求
- 检查是否存在阻碍气流的障碍物
- 最佳做法是将 SP/BMC 固件升级到最新版本,因为警报可能是由固件问题引起的:更新 ONTAP 的服务处理器 (SP) 或基板管理控制器 (BMC) 的步骤
- 按照在 ONTAP 中断电或高温关闭后如何引导节点
- 如果问题仍然存在,并且数据中心温度在可接受的工作范围内得到验证,请联系支持人员