AFF A250/C250 HIC2 Temp0 失败
适用场景
- AFF A250
- AFF C250
- X1152
问题描述
- ONTAP升级后或正常运行期间节点报告底盘温度过高
[Node-01:monitor.globalStatus.critical:EMERGENCY]: Chassis temperature is too high..
- 节点可能会紧急重启
waiting for giveback
状态并同时报告 NIC 传感器错误。
PANIC: Uncorrectable Machine Check Error at CPU10. SKL_IIO Error: STATUS<0xbb80000000000e0b>(VALID,UC,EN,MISCV,PCC,S,AR,CORR_ERR_STATUS(0),CORR_ERR_CNT(0),MSCOD(0),MCACOD(0xe0b))MISC<0x0000000064000000>(UCR_BUS_LOG(100),UCR_DEVICE_LOG(0),UCR_FUNCTION_LOG(0),UCR_SEGMENT_LOG(0))IIO Machine Check from device(s):RPT(100,0,0):ErrSrcID(CorrSrc(0x6670),UCorrSrc(0x66a0)), PLX PCIE 9797 switch on Controller, Br[9797](102,20,0): Link down, PLX PCIE 9797 switch on Controller, Br[9797](102,21,0): Link down. ,. in process idle: cpu10 on release 9.13.1P6 (C)
Waiting for giveback...(Press Ctrl-C to abort wait)
Jul 04 10:24:42 [node1:monitor.temp.unreadable:error]: The controller temperature (HIC2 Temp0) is not readable.
Jul 04 10:24:42 [node1:monitor.temp.unreadable:error]: The controller temperature (HIC2 Temp1) is not readable.
Jul 04 10:26:12 [node1:callhome.chassis.hitemp:error]: Call home for CHASSIS OVER TEMPERATURE
PLATFORM-SENSORS.XML
显示传感器不可读
工作卡:
HIC1_TEMP0 | 55.000 | degrees C | ok | 1.000 | 3.000 | 5.000 | 101.000 | 103.000 | 105.000
HIC1_TEMP1 | 57.000 | degrees C | ok | 1.000 | 3.000 | 5.000 | 101.000 | 103.000 | 105.000
HIC2_TEMP0 | na | degrees C | na | 1.000 | 3.000 | 5.000 | 101.000 | 103.000 | 105.000
HIC2_TEMP1 | 53.000 | degrees C | ok | 1.000 | 3.000 | 5.000 | 101.000 | 103.000 | 105.000
HIC2_TEMP0 | 52.000 | degrees C | ok | 1.000 | 3.000 | 5.000 | 101.000 | 103.000 | 105.000
HIC2_TEMP1 | na | degrees C | na | 1.000 | 3.000 | 5.000 | 101.000 | 103.000 | 105.000
- SP事件日志显示插槽2中NIC的速度降级
617 | 02/13/2024 | 19:42:41 | Temperature #0x10 | Lower Non-recoverable going low
618 | OEM record ee | Device Bus: 117 Dev: 0 Fun: 0 (slot 2) Failed to train at max link speed/width, retraining cycle 0
- Expected GEN1, actual GEN1
- Expected x16, actual x8
- 中的NIC缺少端口
SYSCONFIG-A
slot 2: Quad 10G/25G Ethernet Controller CX5-Mezz
e2a MAC Address: d0:39:ea:52:c8:5f (auto-unknown-fd-down)
e2b MAC Address: d0:39:ea:52:c8:60 (auto-unknown-fd-down)
Device Type: CX5 PSID(NAP0000000014)
Firmware Version: 16.26.4012
Part Number: 111-04587
Hardware Revision: B0
Serial Number: 032249003452