意外节点重新启动报告 " 多位错误地址 8xxxxxxx"
适用场景
AFF A250
问题描述
- 使用 BMC 系统日志输出意外重新启动节点示例:
2021-04-30T02:51:28.812572 00:00 localhost kernel: kernel - - [581600.980000] Multiple bit error address 8c96a038 -
...
2021-04-30T02:51:29.852344 00:00 localhost kernel: kernel - - [581602.010000] Multiple bit error address 8c96a038 -
2021-04-30T02:51:35.079281 00:00 localhost shutdown[2271]: shutdown 2271 - shutting down for system reboot -
- 来自合作伙伴的 EMS 消息,例如:
Fri Apr 30 05:14:25 +0200 [node_name2: spmgrd: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
Fri Apr 30 05:14:25 +0200 [node_name2: spmgrd: callhome.sp.hbt.missed:notice]: Call home for SP HBT MISSED
Fri Apr 30 05:25:58 +0200 [node_name2: spmgrd: callhome.sp.hbt.stopped:alert]: Call home for SP HBT STOPPED