CONTUP-170136:FAS8200和AFF A300系统可能会发生CPU无响应、然后发生多个看门狗控制器中断
问题描述
- 许多传感器信息无法正确读取。
> system sensors
Sensor Name | Current | Unit | Status | LCR | LNC | UNC | UCR
-----------------+------------+------------+------------+-----------+-----------+-----------+-----------
CPU0_Temp_Margin | na | degrees C | na | na | na | -11.000 | -1.000
In_Flow_Temp | 20.000 | degrees C | ok | 0.000 | 5.000 | 50.000 | 55.000
Out_Flow_Temp | 27.000 | degrees C | ok | 0.000 | 5.000 | 65.000 | 75.000
PCI_Slot_Temp | 25.000 | degrees C | ok | 0.000 | 5.000 | 60.000 | 70.000
Smart_Bat_Temp | 22.000 | degrees C | ok | 0.000 | 5.000 | 60.000 | 70.000
CPU0_Error | 0x0 | discrete | Asserted | na | na | na | na
CPU0_Therm_Trip | 0x0 | discrete | Asserted | na | na | na | na
Wrench_Port_Up | 0x0 | discrete | Enabled | na | na | na | na
Attn_Sensor1 | 0x0 | discrete | Asserted | na | na | na | na
- FAS8200和AFF A300存储系统可能会发生CPU无响应、然后发生看门狗控制器中断。
watchdog nmi on cpu 0, hang cpu is 0 in process idle: cpu0
Record 1108: Sat Apr 30 05:01:38 2022 [IPMI Event.critical]: NMI
Record 1109: Sat Apr 30 05:01:38 2022 [IPMI.notice]: e800 | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"
Record 1110: Sat Apr 30 05:01:39 2022 [IPMI Event.critical]: L2 watchdog timeout hard reset
Record 1111: Sat Apr 30 05:01:39 2022 [Trap Event.critical]: hwassist l2_watchdog_reset (29)
Record 1112: Sat Apr 30 05:01:45 2022 [IPMI.notice]: e900 | 02 | EVT: 6fc104ff | System_Watchdog | Assertion Event, "Hard reset"