AFF-A900 节点因 DIMM ECC 错误而崩溃
适用场景
- AF-A900
- ONTAP 9
问题描述
- AFF-A900 节点意外崩溃并显示以下消息:
ECC error at DIMM-1: 2306-17A288DB,ADDR 0xfeb4079840,(Node(1), Memory controller(2), CH(5), DIMM(0), Rank(1), Bank Group(3), Bank(0x2), Row(0x3f8d0), Col(0x328)) ICL_IMC2_C1 Error: STATUS<0xbe06894200a00091>(VALID,UC,EN,MISCV,ADDRV,PCC,CORR_ERR_STAT(0),CORR_ERR_CNT(0x1a25),OTHER_INFO(0x2),MSCOD(0xa0),MCACOD(0x91)),MISC<
0x09000b1fc6819486>(EXTRA_ERR_INFO(0x480058fe340ca),ADDR_MODE(0x2),REC_ERR_LSB(0x6)),ADDR<0x000000feb4079840>(ADDRESS(0xfeb4079840))Node(1), Memory controller(2)
CH(1), DIMM(0), Rank(1), Bank Group(3), Bank(0x2), Row(0x3f8d0), Col(0x328)
- 在启动过程中,控制台显示 PPR 测试内存成功,但稍后仍检测到
mapped out
错误。
PPR: Sequence PASS.
Initializing System Memory ...
DIMM:1 mapped out. BIOS MRC mapped out DIMM. Major / Minor Error Code: 0x30 / 0x2D
或
DIMM:1 mapped out. BIOS MRC mapped out DIMM. Major / Minor Error Code: 0x31 / 0x2C