如何对PCI/NMI、UMCE和嵌套计算机检查异常异常发生故障进行故障排除
适用场景
- PCI不可屏蔽中断(NMI)发生中断
- PCI不可更正的机器检查异常(UMCE)发生紧急情况
- 非PCI不可更正的机器检查异常(UMCE)发生紧急情况
- 嵌套机器检查异常发生错误
- AFF 系统
- FAS 系统
问题描述
本文介绍如何解决以下类型的崩溃问题:
- PCI/NMI
PANIC: PCI Error NMI from device(s):PCI Device 111d:806c in slot 2 on Controller, Qlogic FC 8G adapter in slot 2 on Controller, Qlogic FC 8G adapter in slot 2 on Controller. in process idle on release 8.3 (C) on Fri Sep 18 13:27:47 MDT 2015
- PCI UMCE
- 指在PCI总线上发现不可恢复的问题描述。
PANIC: Uncorrectable Machine Check Error at CPU30. SKL_IIO Error: STATUS<0xbb80000000000e0b>(VALID,UC,EN,MISCV,PCC,S,AR,CORR_ERR_STATUS(0),CORR_ERR_CNT(0),MSCOD(0),MCACOD(0xe0b))MISC<0x00000000ae000000>(UCR_BUS_LOG(174),UCR_DEVICE_LOG(0),UCR_FUNCTION_LOG(0),UCR_SEGMENT_LOG(0))IIO Machine Check from device(s):RPT(174,0,0):ErrSrcID(CorrSrc(0),UCorrSrc(0xb100)), PLX PCIE 8749 switch on Controller, PCI Device 1425:600d in slot 1 on Controller, PCI Device 1425:600d in slot 1 on Controller, PCI Device 1425:600d in slot 1 on Controller, PCI Device 1425:600d in slot 1 on Controller, T62100-CR Dual 40/100G NIC in slot 1 on Controller, PCI Device 1425:650d in slot 1 on Controller, PCI Device 1425:660d in slot 1 on Controller. in process idle: cpu30
- 非PCI UMCE
- 指 对系统内存或CPU缓存执行不可恢复的操作。
PANIC: Uncorrectable Machine Check Error at CPU0. MC0 Error: STATUS<0xb200000430000800>(Val,UnCor,Enable,PCC,ErrCode(Src,NTO,Gen,Mem,L0)). MC5 Error: STATUS<0xf2000010c4300e0f>(Val,OverF,UnCor,Enable,PCC,ErrCode(Gen,NTO,Gen,Gen,Gen)); Uncorrectable error at DIMM-1, Channel 0, Serial: BA-00-1131-00098398!69002460-I01-NTA-T1?!, FERR(0x400), NERR(0x402), MERR M10Err, Rank 3, Bank 6, CAS 0x1e8, RAS 0x1bcf Uncorrectable error at DIMM-1, Channel 0, Serial: BA-00-1131-00098398!69002460-I01-NTA-T1?!, MERR M10Err, Rank 3, Bank 6, CAS 0x1e8, RAS 0x1bc.
- 嵌套机器检查
PANIC: nested machine check exception detected on CPU #, no coredump will be generated.