由于多个DIMM处于警告低重新分配状态、AFF a700s将重新启动
适用场景
- AFF a700s
- BMC固件版本1.89和1.91
问题描述
- 当DIMM不断报告警告低阈值报告时、节点重新启动。
Wed Apr 20 19:19:39 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm G0 Temp is warning low (16 C).
Wed Apr 20 19:25:39 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm A1 Temp is warning low (16 C).
Wed Apr 20 19:25:39 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm G1 Temp is warning low (16 C).
Wed Apr 20 19:26:39 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm A0 Temp is warning low (16 C).
Wed Apr 20 19:27:38 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm B0 Temp is warning low (16 C).
Wed Apr 20 19:45:39 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm H0 Temp is warning low (16 C).
Wed Apr 20 19:59:39 -0700 [node1: env_mgr: monitor.chassisTemperature.cool:alert]: Chassis temperature is too cool: Dimm B1 Temp is warning low (16 C).
- 然后、节点崩溃、出现温度过低。
Sun May 08 15:56:17 -0700 [node1: env_mgr: callhome.chassis.undertemp:EMERGENCY]: Call home for CHASSIS UNDER TEMPERATURE SHUTDOWN
- 通过ASUP和系统管理器报告的崩溃将记录为温度过高。
- 检查节点上的系统传感器时、所有其他传感器报告的温度范围与DIMM相同。