在 PCIe :总线: 100 上检测到 AFF A250 不可更正的错误
适用场景
AFF A250
问题描述
- 节点已暂停,但出现以下情况:
Mar 26 11:00:00 [node_name:monitor.shutdown.brokenDisk:EMERGENCY]: data disk,parity disk in RAID group "/aggregate/plex0/rg0" are broken. Halting system now.
- 启动环路错误:
Uncorrectable error detected at PCIE:Bus:100 Dev:0 Fun:0 for 2 time(s)!!!!
!!!!Machine Check MC-Bank:6 - Status: 0xBB80000000000E0B, ADDR: 0x0000000000000000, MISC: 0x0000000064000000 !!!!
!!!! X64 Exception Type - 12(#MC - Machine-Check) CPU Apic ID - 00000002 !!!!
RIP - 0000000077B708DE, CS - 0000000000000038, RFLAGS - 0000000000000002
RAX - 0000000000000000, RCX - 0000000077B12500, RDX - 0000000000000005
RBX - 0000000077B62300, RSP - 0000000077B31A40, RBP - 0000000000000001
RSI - 000000000003E2B4, RDI - 0000000077B52800
R8 - 0000000000000005, R9 - 0000000000000001, R10 - 0000000000000000
R11 - 0000000077B12500, R12 - 0000000000000000, R13 - 0000000000000000
R14 - 0000000000000000, R15 - 0000000000000000
DS - 0000000000000020, ES - 0000000000000020, FS - 0000000000000020
GS - 0000000000000020, SS - 0000000000000020
CR0 - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000077B14000
CR4 - 0000000000000668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 0000000077B29128 000000000000004F, LDTR - 0000000000000000
IDTR - 0000000077B88300 00000000000001FF, TR - 0000000000000040
FXSAVE_STATE - 0000000077B316A0
!!!! Find PE image f:\jb\bddo0\Build\YubaCity\DEBUG_MYTOOLS\X64\PurleySktPkg\Override\IA32FamilyCpuPkg\PiSmmCpuDxeSmm\PiSmmCpuDxeSmm\DEBUG\PiSmmCpuDxeSmm.pdb (ImageBase=0000000077B68000, EntryPoint=0000000077B68340) !!!!
Copyright(c) 2020 American Megatrends, Inc.
- PCIe 链路错误:
Tue Mar 30 09:53:23 +0000 [node_name: kernel: nvme.link.error:error]: PCIe link initialization error for NVMeSSD in slot 4.
Mon Mar 22 19:44:01 +0000 [node_name: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMeSSD in slot 4 due to excessive errors.
Thu Mar 25 00:07:05 +0000 [node_name: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 2 due to excessive errors.
Thu Mar 25 10:59:12 +0000 [node_name: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 1 due to excessive errors.
Tue Mar 30 11:45:14 +0000 [node_name: SKL cerror: pcie.stealth.errors:debug]: params: {'pcie_errors': 'IIO2: RPT(100,0,0): RPT(100,0,0): SecStatus(RcvMstAbt); PLX PCIE 9797 switch on Controller, Br[9797](102,4,0): RcvErr(P7(255)), Br[9797](102,7,0): BadTLP(8), BadDLLP(3470); '}