由于SP固件映像重置、节点因环境原因关闭(温度严重)而重新启动
适用场景
- FAS
- AFF
问题描述
- 节点发生故障并出现以下事件:
Wed May 25 00:48:24 UTC [pv35p45im-filerm45003: env_mgr: monitor.temp.unreadable:info]: The controller temperature (CPU1 Temp Margin) is not readable.
Wed May 25 00:48:24 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor PVCCP CPU0 in the controller module is not readable.
Wed May 25 00:51:16 UTC [pv35p45im-filerm45003: spsm_listener: sp.heartbeat.stopped:warning]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 20 seconds.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: callhome.c.fan.fru.fault:error]: Call home for CHASSIS FAN FRU FAILED: SysFan3 F1
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.chassisFan.stop:alert]: Chassis fan contains at least one stopped fan: SysFan3 F1 (failed)
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.temp.unreadable:info]: The controller temperature (In Flow Temp) is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.chassisTemperature.state.unknown:warning]: Chassis temperature state is unknown: Multiple Temp sensors are unreadable. System will be shutdown in 2 minutes.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor PCH Hot in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor P5V STBY in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor P3.3V STBY in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor P1.8V STBY in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor P0.9V STBY in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor PVDDQ DDR3 AB in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.power.unreadable:info]: A power sensor PVTT DDR3 AB in the controller module is not readable.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Multiple chassis fans have failed.
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: env_mgr: callhome.fans.failed:EMERGENCY]: Call home for MULTIPLE FAN FAILURE
Wed May 25 00:51:19 UTC [pv35p45im-filerm45003: statd: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (Temperature critical)
- 节点重新启动并开始等待交还状态。
- 重新启动期间的服务处理器状态显示未知。
Node>sysconfig -a
Service Processor
Status: Unknown
IPMI: unknown
PKT: unknown
- 可能会从SP日志中看到SP重置事件。
sp>events all
SP recovered successfully after a reset from primary FW image.
- 注意到服务处理器在节点重新启动期间自动重新启动。
sp>sp uptime
02:17:43 up 2:27, load average: 1.29, 1.15, 1.10