跳转到主内容

AFF A250/C250 HIC2 Temp0 失败

Views:
12
Visibility:
Public
Votes:
0
Category:
aff-series
Specialty:
hw
Last Updated:

适用场景

  • AFF A250
  • AFF C250
  • X1152

问题描述

  • ONTAP升级后或正常运行期间节点报告底盘温度过高
[Node-01:monitor.globalStatus.critical:EMERGENCY]: Chassis temperature is too high..
  • 节点可能会紧急重启waiting for giveback状态并同时报告 NIC 传感器错误。

PANIC: Uncorrectable Machine Check Error at CPU10. SKL_IIO Error: STATUS<0xbb80000000000e0b>(VALID,UC,EN,MISCV,PCC,S,AR,CORR_ERR_STATUS(0),CORR_ERR_CNT(0),MSCOD(0),MCACOD(0xe0b))MISC<0x0000000064000000>(UCR_BUS_LOG(100),UCR_DEVICE_LOG(0),UCR_FUNCTION_LOG(0),UCR_SEGMENT_LOG(0))IIO Machine Check from device(s):RPT(100,0,0):ErrSrcID(CorrSrc(0x6670),UCorrSrc(0x66a0)), PLX PCIE 9797 switch on Controller, Br[9797](102,20,0): Link down, PLX PCIE 9797 switch on Controller, Br[9797](102,21,0): Link down. ,.  in process idle: cpu10 on release 9.13.1P6 (C)

Waiting for giveback...(Press Ctrl-C to abort wait)
Jul 04 10:24:42 [node1:monitor.temp.unreadable:error]: The controller temperature (HIC2 Temp0) is not readable.
Jul 04 10:24:42 [node1:monitor.temp.unreadable:error]: The controller temperature (HIC2 Temp1) is not readable.
Jul 04 10:26:12 [node1:callhome.chassis.hitemp:error]: Call home for CHASSIS OVER TEMPERATURE

  • PLATFORM-SENSORS.XML 显示传感器不可读

工作卡:

HIC1_TEMP0       | 55.000     | degrees C  | ok    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
HIC1_TEMP1       | 57.000     | degrees C  | ok    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000

 
 
受损卡出现故障、表示模板0:
HIC2_TEMP0       | na         | degrees C  | na    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
HIC2_TEMP1       | 53.000     | degrees C  | ok    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
 
受损卡出现故障、表示模板1:
HIC2_TEMP0    | 52.000    | degrees C  | ok    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
HIC2_TEMP1    | na      | degrees C  | na    | 1.000     | 3.000     | 5.000     | 101.000   | 103.000   | 105.000
  • SP事件日志显示插槽2中NIC的速度降级
617 | 02/13/2024 | 19:42:41 | Temperature #0x10 | Lower Non-recoverable going low
618 | OEM record ee | Device Bus: 117 Dev: 0 Fun: 0 (slot 2) Failed to train at max link speed/width, retraining cycle 0
- Expected GEN1, actual GEN1
- Expected x16, actual x8
  • 中的NIC缺少端口 SYSCONFIG-A

slot 2: Quad 10G/25G Ethernet Controller CX5-Mezz
  e2a MAC Address:    d0:39:ea:52:c8:5f (auto-unknown-fd-down)
  e2b MAC Address:    d0:39:ea:52:c8:60 (auto-unknown-fd-down)
  Device Type:        CX5 PSID(NAP0000000014)
  Firmware Version:   16.26.4012
  Part Number:        111-04587
  Hardware Revision:  B0
  Serial Number:      032249003452

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.