由于DIMM故障导致ECC错误、节点崩溃
适用场景
- ONTAP 9
- FAS 系统
- AFF 系统
问题描述
节点无法使用以下崩溃字符串启动:
PANIC: ECC error at DIMM-2: CE-03-2040-176B3357,ADDR 0x558b31e40,(Node(0), Memory controller(0), CH(1), DIMM(0), Rank(0), Bank Group(3), Bank(0x3), Row(0x9633), Col(0xf8)) Uncorrectable Machine Check Error at CPU9. BDWL_HA0 Error: STATUS<0xbe00000000010091>(Val,UnCor,Enable,MiscV,AddrV,PCC,CorrSts(0),CorrCnt(0),ExtErr(0x1),ErrCode(Channel 1, Read)ErrCode(0x91))MISC<0x000000044056d686>(HaDbBank(0),PE(0),ReqOpcode(0x22),RNID(0),RTID(0x2b),HTID(0x6b))ADDR<0x0000000558b31e40>((0x558b31e40)). in process idle: cpu9 on release 9.7P10 (C) on Sun Nov 13 00:57:56 IST 2022