跳转到主内容

IO扩展端口上的PLX PCIe 8732交换机发生崩溃

Views:
19
Visibility:
Public
Votes:
0
Category:
fas-systems<a>2008375128</a>
Specialty:
hw
Last Updated:

适用场景

  • ONTAP 9
  • FAS8080

问题描述

  • FAS8080系统导致NMI崩溃、出现在以下服务处理器(SP)""system log中。

示例: 

PANIC: PCI Error NMI from device(s):ErrSrcID(CorrSrc(0),UCorrSrc(0x8010)), RPT(128,2,0):PLX PCIE 8732 switch on IO Expansion, PLX PCIE 8732 switch on IO Expansion, ErrSrcID(CorrSrc(0),UCorrSrc(0x8018)), RPT(128,3,0):PLX PCIE 8732 switch on IO Expansion, PLX PCIE 8732 switch on IO Expansion.  in SK process wafl_exempt18 on release 9.3P18 (C)

  • SP控制台"events all"出现以下错误。

示例: 

Record 28: Mon Jul 06 14:40:29.620402 2020 [Agent.notice]: 990.526: 160 : IOXM Fan_B1 Present de-asserted
Record 29: Mon Jul 06 14:40:29.634230 2020 [Agent.notice]: 990.526: 161 : IOXM Fan_B2 Present de-asserted
Record 30: Mon Jul 06 14:40:29.634821 2020 [Agent.notice]: 990.526: 162 : IOXM Fan_B3 Present de-asserted
Record 31: Mon Jul 06 14:40:29.691246 2020 [Agent.notice]: 062.792: 43 : CPU 1 Correctable Error 3 asserted
Record 32: Mon Jul 06 14:40:29.691429 2020 [Agent.notice]: 062.792: 29 : Non-maskable Interrupt from PCH to CPU asserted
Record 33: Mon Jul 06 14:40:29.742088 2020 [Agent.notice]: 113.649: 42 : CPU 1 Correctable Error 2 asserted
Record 34: Mon Jul 06 14:40:29.814207 2020 [Agent.notice]: 185.883: 42 : CPU 1 Correctable Error 2 de-asserted
Record 35: Mon Jul 06 14:40:29.814382 2020 [Agent.notice]: 185.883: 43 : CPU 1 Correctable Error 3 de-asserted
Record 36: Mon Jul 06 14:40:29.814523 2020 [Agent.notice]: 185.889: 29 : Non-maskable Interrupt from PCH to CPU de-asserted
Record 37: Mon Jul 06 14:40:30.041525 2020 [IPMI.warning]: Error while reading sensor number : 44
Record 38: Mon Jul 06 14:40:30.053568 2020 [IPMI.notice]: 0202 | c0 | OEM: f9ff7020ff2c | ManufId: 150300 | Undefined
Record 39: Mon Jul 06 14:40:30.917594 2020 [IPMI.warning]: Error while reading sensor number : 45
Record 40: Mon Jul 06 14:40:30.931970 2020 [IPMI.notice]: 0302 | c0 | OEM: f9ff7020ff2d | ManufId: 150300 | Undefined
Record 41: Mon Jul 06 14:40:32.325609 2020 [IPMI.warning]: Error while reading sensor number : 189
Record 42: Mon Jul 06 14:40:32.342545 2020 [IPMI.notice]: 0402 | c0 | OEM: f9ff7020ffbd | ManufId: 150300 | Undefined
Record 43: Mon Jul 06 14:40:32.749519 2020 [IPMI.warning]: Error while reading sensor number : 190
Record 44: Mon Jul 06 14:40:32.757665 2020 [IPMI.notice]: 0502 | c0 | OEM: f9ff7020ffbe | ManufId: 150300 | Undefined
Record 45: Mon Jul 06 14:40:32.778577 2020 [IPMI.notice]: 0602 | 02 | EVT: 6f01ffff | IOfan1_Present | Assertion Event, "Absent"
Record 46: Mon Jul 06 14:40:32.793519 2020 [IPMI.notice]: 0702 | 02 | EVT: 6f01ffff | IOfan2_Present | Assertion Event, "Absent"
Record 47: Mon Jul 06 14:40:32.809474 2020 [IPMI.notice]: 0802 | 02 | EVT: 6f01ffff | IOfan3_Present | Assertion Event, "Absent"
Record 48: Mon Jul 06 14:40:32.898375 2020 [Agent.notice]: 269.611: 14 : Attention LED (at Midplane) asserted
Record 49: Mon Jul 06 14:40:35.640291 2020 [Agent.notice]: 005.571: 42 : CPU 1 Correctable Error 2 asserted
Record 50: Mon Jul 06 14:40:35.640474 2020 [Agent.notice]: 005.571: 29 : Non-maskable Interrupt from PCH to CPU asserted
Record 51: Mon Jul 06 14:40:36.061511 2020 [IPMI.warning]: Error while reading sensor number : 14
Record 52: Mon Jul 06 14:40:36.067951 2020 [IPMI.notice]: 0902 | c0 | OEM: f9ff7020ff0e | ManufId: 150300 | Undefined
Record 53: Mon Jul 06 14:40:36.477533 2020 [IPMI.warning]: Error while reading sensor number : 15
Record 54: Mon Jul 06 14:40:36.484003 2020 [IPMI.notice]: 0a02 | c0 | OEM: f9ff7020ff0f | ManufId: 150300 | Undefined
Record 55: Mon Jul 06 14:40:39.797520 2020 [IPMI.warning]: Error while reading sensor number : 42
Record 56: Mon Jul 06 14:40:39.803964 2020 [IPMI.notice]: 0b02 | c0 | OEM: f9ff7020ff2a | ManufId: 150300 | Undefined
Record 57: Mon Jul 06 14:40:40.213512 2020 [IPMI.warning]: Error while reading sensor number : 43
Record 58: Mon Jul 06 14:40:40.220325 2020 [IPMI.notice]: 0c02 | c0 | OEM: f9ff7020ff2b | ManufId: 150300 | Undefined
Record 59: Mon Jul 06 14:40:29.000000 2020 [Controller.notice]: Appliance panic. See logs for cause of panic.
Record 60: Mon Jul 06 14:40:55.945497 2020 [IPMI.notice]: 0d02 | 02 | EVT: 6f406fff | Sensor 255 | Assertion Event, "Storage OS stop/shutdown"
Record 61: Mon Jul 06 14:40:56.226475 2020 [Agent.notice]: 597.574: 11 : Controller Attention LED asserted
Record 62: Mon Jul 06 14:40:56.830983 2020 [Agent.notice]: 202.552: 49 : PCH Platform Reset asserted
Record 63: Mon Jul 06 14:40:56.831147 2020 [Agent.notice]: 202.552: 29 : Non-maskable Interrupt from PCH to CPU de-asserted
Record 64: Mon Jul 06 14:40:56.831291 2020 [Agent.notice]: 202.612: 63 : BIOS Complete from PCH de-asserted
Record 65: Mon Jul 06 14:40:56.831884 2020 [Agent.notice]: 203.592: 42 : CPU 1 Correctable Error 2 de-asserted
Record 66: Mon Jul 06 14:40:56.839826 2020 [Agent.notice]: 211.285: 49 : PCH Platform Reset de-asserted
Record 67: Mon Jul 06 14:40:56.908653 2020 [SP.critical]: Filer Reboots

  • SP控制台"system sensors"无法查看IOXM的状态。

示例:

Sensor Name    | Current   | Unit     | Status    | LCR     | LNC     | UNC     | UCR
-----------------+------------+------------+------------+-----------+-----------+-----------+-----------
IO_InFlow_Temp   | na     | degrees C  | na     | 0.000    | 10.000   | 53.000   | 63.000
IO_OutFlow_Temp  | na     | degrees C  | na     | 0.000    | 10.000   | 62.000   | 72.000
IO_Riser_R_Temp  | na     | degrees C  | na     | 0.000    | 10.000   | 54.000   | 64.000
IO_Riser_L_Temp  | na     | degrees C  | na     | 0.000    | 10.000   | 53.000   | 63.000
IO_12V       | na     | Volts    | na     | 0.000    | 0.000    | 32.130   | 32.130
IO_12V_Curr    | na     | Amps     | na     | 0.000    | 0.000    | 63.750   | 63.750
IO_STDBY_12V    | na     | Volts    | na     | 0.000    | 0.000    | 32.130   | 32.130
IO_STDBY_12V_Cur | na     | Amps     | na     | na    | na    | 3.188    | 3.188
IOfan1_Present   | 0x0    | discrete   | Absent    | na    | na    | na    | na
IOfan1_Fault    | na     | discrete   | na     | na    | na     | na    | na
IOfan1_F1_Speed  | na     | RPM     | na     | 1950.000  | 2040.000  | na    | na
IOfan1_F2_Speed  | na     | RPM     | na     | 1950.000  | 2040.000  | na    | na
IOfan2_Present   | 0x0     | discrete   | Absent    | na    | na    | na    | na
IOfan2_Fault    | na     | discrete   | na     | na    | na    | na    | na
IOfan2_F1_Speed  | na     | RPM     | na     | 1950.000  | 2040.000  | na    | na
IOfan2_F2_Speed  | na     | RPM     | na     | 1950.000  | 2040.000  | na    | na
IOfan3_Present   | 0x0     | discrete   | Absent    | na    | na    | na    | na
IOfan3_Fault    | na     | discrete   | na     | na    | na    | na    | na
IOfan3_F1_Speed  | na     | RPM     | na     | 1950.000  | 2040.000  | na    | na
IOfan3_F2_Speed  | na     | RPM     | na     | 1950.000  | 2040.000  | na    | na

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.