BMC升级期间发生磁盘架故障
适用场景
- BMC (基板管理控制器)
问题描述
- EMS日志会在BMC重新启动升级期间报告
Disk shelf fault
并自动恢复。
[?] Tue Mar 18 11:41:06 +0900 [Node-01: servprocd: sp.servprocd.upd.evts:debug]: params: {'reason': 'SP Firmware network update has been successfully scheduled from 11.7 to 11.11'}
[?] Tue Mar 18 11:41:18 +0900 [Node-01: servprocd: sp.servprocd.upd.evts:debug]: params: {'reason': 'SP Firmware network update from 11.7 to 11.11 has been triggered.'}
[?] Tue Mar 18 11:41:49 +0900 [Node-01: servprocd: sp.servprocd.upd.evts:debug]: params: {'reason': 'SP firmware image has been successfully transferred to SP using network interface.'}
[?] Tue Mar 18 11:43:44 +0900 [Node-01: servprocd: sp.servprocd.upd.evts:debug]: params: {'reason': 'SP Firmware update completed.'}
[?] Tue Mar 18 11:43:44 +0900 [Node-01: servprocd: sp.servprocd.upd.evts:debug]: params: {'reason': 'SP triggering auto-reboot to complete firmware update. Console connection through the SP will now be dropped.'}
[?] Tue Mar 18 11:44:49 +0900 [Node-01: dsa_worker2: ses.status.temperatureWarning:alert]: DS212-12 (S/N SHFNC2208000137) shelf 0 on channel 0b temperature warning for Temperature sensor 11: non-critical status; undertemperature warning. Current temperature: 0 C (32 F). This module is on the rear of the shelf at the top left, on shelf module A.
[?] Tue Mar 18 11:44:58 +0900 [Node-01: statd: monitor.shelf.warning:error]: Fault reported on disk storage shelf attached to channel 0b. Check fans, power supplies, disks, and temperature sensors.
[?] Tue Mar 18 11:44:59 +0900 [Node-01: spsm_listener: sp.update.status:debug]: params: {'reason': 'sp_startup_notify_servprocd: SP startup handler has been called. '}
[?] Tue Mar 18 11:45:00 +0900 [Node-01: monitor: monitor.globalStatus.critical:EMERGENCY]: Disk shelf fault.
[?] Tue Mar 18 11:45:59 +0900 [Node-01: servprocd: sp.servprocd.upd.evts:debug]: params: {'reason': 'PostUpdate Check : SP firmware post-update check has PASSED '}
[?] Tue Mar 18 11:46:00 +0900 [Node-01: monitor: monitor.globalStatus.ok:notice]: The system's global status is normal.