跳转到主内容

在 PCIe :总线: 100 上检测到 AFF A250 不可更正的错误

适用场景

AFF A250

问题描述

  • 节点已暂停,但出现以下情况:
Mar 26 11:00:00 [node_name:monitor.shutdown.brokenDisk:EMERGENCY]: data disk,parity disk in RAID group "/aggregate/plex0/rg0" are broken. Halting system now.
 
  • 启动环路错误:
Uncorrectable error detected at PCIE:Bus:100 Dev:0 Fun:0 for 2 time(s)!!!!
!!!!Machine Check MC-Bank:6 - Status: 0xBB80000000000E0B, ADDR: 0x0000000000000000, MISC: 0x0000000064000000 !!!!
!!!! X64 Exception Type - 12(#MC - Machine-Check) CPU Apic ID - 00000002 !!!!
RIP - 0000000077B708DE, CS - 0000000000000038, RFLAGS - 0000000000000002
RAX - 0000000000000000, RCX - 0000000077B12500, RDX - 0000000000000005
RBX - 0000000077B62300, RSP - 0000000077B31A40, RBP - 0000000000000001
RSI - 000000000003E2B4, RDI - 0000000077B52800
R8 - 0000000000000005, R9 - 0000000000000001, R10 - 0000000000000000
R11 - 0000000077B12500, R12 - 0000000000000000, R13 - 0000000000000000
R14 - 0000000000000000, R15 - 0000000000000000
DS - 0000000000000020, ES - 0000000000000020, FS - 0000000000000020
GS - 0000000000000020, SS - 0000000000000020
CR0 - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000077B14000
CR4 - 0000000000000668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 0000000077B29128 000000000000004F, LDTR - 0000000000000000
IDTR - 0000000077B88300 00000000000001FF, TR - 0000000000000040
FXSAVE_STATE - 0000000077B316A0
!!!! Find PE image f:\jb\bddo0\Build\YubaCity\DEBUG_MYTOOLS\X64\PurleySktPkg\Override\IA32FamilyCpuPkg\PiSmmCpuDxeSmm\PiSmmCpuDxeSmm\DEBUG\PiSmmCpuDxeSmm.pdb (ImageBase=0000000077B68000, EntryPoint=0000000077B68340) !!!!
Copyright(c) 2020 American Megatrends, Inc.
 
  • PCIe 链路错误:
Tue Mar 30 09:53:23 +0000 [node_name: kernel: nvme.link.error:error]: PCIe link initialization error for NVMe SSD in slot 4.
Mon Mar 22 19:44:01 +0000 [node_name: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 4 due to excessive errors.
Thu Mar 25 00:07:05 +0000 [node_name: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 2 due to excessive errors.
Thu Mar 25 10:59:12 +0000 [node_name: kernel: nvme.link.disabled.error:error]: PCIe link disabled for NVMe SSD in slot 1 due to excessive errors.
 
 
Tue Mar 30 11:45:14 +0000 [node_name: SKL cerror: pcie.stealth.errors:debug]: params: {'pcie_errors': 'IIO2: RPT(100,0,0): RPT(100,0,0): SecStatus(RcvMstAbt); PLX PCIE 9797 switch on Controller, Br[9797](102,4,0): RcvErr(P7(255)), Br[9797](102,7,0): BadTLP(8), BadDLLP(3470); '}
 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.