跳转到主内容

AFF A250/C250 HIC2 Temp0 失败

Views:
17
Visibility:
Public
Votes:
0
Category:
aff-series
Specialty:
hw
Last Updated:

适用于

  • AFF A250
  • AFF C250
  • X1152

问题描述

  • ONTAP 升级后或正常运行期间节点报告底盘温度太高
[Node-01:monitor.globalStatus.critical:EMERGENCY]: Chassis temperature is too high..
  • 节点可能会死机重启到 waiting for giveback 状态并同时报告 NIC 传感器错误。

PANIC: Uncorrectable Machine Check Error at CPU10. SKL_IIO Error: STATUS<0xbb80000000000e0b>(VALID,UC,EN,MISCV,PCC,S,AR,CORR_ERR_STATUS(0),CORR_ERR_CNT(0),MSCOD(0),MCACOD(0xe0b))MISC<0x0000000064000000>(UCR_BUS_LOG(100),UCR_DEVICE_LOG(0),UCR_FUNCTION_LOG(0),UCR_SEGMENT_LOG(0))IIO Machine Check from device(s):RPT(100,0,0):ErrSrcID(CorrSrc(0x6670),UCorrSrc(0x66a0)), PLX PCIE 9797 switch on Controller, Br[9797](102,20,0): Link down, PLX PCIE 9797 switch on Controller, Br[9797](102,21,0): Link down. ,.  in process idle: cpu10 on release 9.13.1P6 (C)

Waiting for giveback...(Press Ctrl-C to abort wait)
Jul 04 10:24:42 [node1:monitor.temp.unreadable:error]: The controller temperature (HIC2 Temp0) is not readable.
Jul 04 10:24:42 [node1:monitor.temp.unreadable:error]: The controller temperature (HIC2 Temp1) is not readable.
Jul 04 10:26:12 [node1:callhome.chassis.hitemp:error]: Call home for CHASSIS OVER TEMPERATURE

  • SP-LATEST-IPMI 显示无法读取的传感器

工作卡:

HIC1_TEMP0       | 55.000     | degrees C  | ok    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
HIC1_TEMP1       | 57.000     | degrees C  | ok    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000

 
 
受损卡故障 TEMP0:
HIC2_TEMP0       | na         | degrees C  | na    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
HIC2_TEMP1       | 53.000     | degrees C  | ok    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
 
受损卡故障 TEMP1:
HIC2_TEMP0    | 52.000    | degrees C  | ok    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
HIC2_TEMP1    | na      | degrees C  | na    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
  • SP 事件日志显示插槽 2 中 NIC 的速度降低
617 | 02/13/2024 | 19:42:41 | Temperature #0x10 | Lower Non-recoverable going low
618 | OEM record ee | Device Bus: 117 Dev: 0 Fun: 0 (slot 2) Failed to train at max link speed/width, retraining cycle 0
- Expected GEN1, actual GEN1
- Expected x16, actual x8
  • NIC 中缺少的端口在 SYSCONFIG-A

slot 2: Quad 10G/25G Ethernet Controller CX5-Mezz
  e2a MAC Address:    d0:39:ea:52:c8:5f (auto-unknown-fd-down)
  e2b MAC Address:    d0:39:ea:52:c8:60 (auto-unknown-fd-down)
  Device Type:        CX5 PSID(NAP0000000014)
  Firmware Version:   16.26.4012
  Part Number:        111-04587
  Hardware Revision:  B0
  Serial Number:      032249003452

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.