X91153A卡上的PCIe隐藏错误导致节点重新启动
适用场景
- AFF A900
- ONTAP 9.12.1P4
- X91153A以太网存储控制器
问题描述
- 节点重新启动时、EMS或SP日志中没有清除发生原因
- 请参见日志中不断重复出现的EMS错误、如下所示:
Fri Oct 20 22:45:27 -0400 [cluster1-01: ICL error: pcie.stealth.errors:debug]: params: {'pcie_errors': 'IIO0: RPT(135,2,0): Microchip PCI-E Switch on Controller, Microchip PCI-E Switch in slot 11 on Controller, Br[4000](137,0,0): DevStatus(Corr), CorrErr(Rcvr,RpTim); Br[4036](139,0,0) in slot 11: DevStatus(Corr), CorrErr(RpTim); '}
Fri Oct 20 22:47:27 -0400 [cluster1-01: ICL error: pcie.stealth.errors:debug]: params: {'pcie_errors': 'IIO0: RPT(135,2,0): Microchip PCI-E Switch on Controller, Br[4000](137,0,0): DevStatus(Corr), CorrErr(Rcvr); '}
- 检查
sysconfig -ac
被调用的卡(本例中为插槽11)时,指向X91153A卡:
sysconfig: slot 11 OK: X91153A: 2p 40G/100G RoCE QSFP28