AFF A800 意外重启,DIMM-XX 出现 ECC 错误
适用于
AFF A800
问题描述
- 由于具有自动合作伙伴接管的意外节点重新启动,已创建自动案例。示例:
CLTFLT:HA Group Notification from node_name (CONTROLLER TAKEOVER COMPLETE AUTOMATIC - Communiction Error) ALERT
- 由于系统内存出错而死机
ALERT sk.panic: Panic String: ECC error at DIMM-6: CE-04-2011-42F4C628,ADDR 0xe8a61ae40,(Node(1), Memory controller(0), CH(0), DIMM(1), Rank(5), Bank Group(3), Bank(0x1), Row(0x2b2d), Col(0x190)) Uncorrectable Machine Check Error at CPU31. SKL_IMC0 Error: STATUS<0xfe00a84001010090>(VALID,OVERFLOW,UC,EN,MISCV,ADDRV,PCC,CORR_ERR_STATUS(0),CORR_ERR_CNT(0x2a1),OTHER_INFO(0),MscodDdrType(0x1),MscodDataRdErr,MCACOD(0x90))MISC<0x200002c120002086>(DataErrorChunk(0x2),McCmdChnl(0),McCmdMemRegion(0),McCmdOpcode(0x2),McCmdVld,SmiAD,SmiMsgClass(0),SmiOpcode(0x2),TrkId(0x100),Error_Type(0x4),ADDRMODE(0x2),ADDRLSB(0x6))ADDR<0x0000000e8a61ae40>(HIPHYADDR(0xe),LOPHYADDR(0x22986b9))(Node(1), Memory controller(0), CH(0), DIMM(1), Rank(5), Bank Group(3), Bank(0x1), Row(0x2b2d), Col(0x190), ADDR<0x0000000e8a61ae40>(HIPHYADDR(0xe),LOPHYADDR(0x22986b9))MISC<0x200002c120002086>(DataErrorChunk(0x2),McCmdChnl(0),McCmdMemRegion(0),McCmdOpcode(0x2),McCmdVld,SmiAD,SmiMsgClass(0),SmiOpcode(0x2),TrkId(0x100),Error_Type(0x4),ADDRMODE(0x2),ADDRLSB(0x6)). in process id...