由于出现不可更正的ECC错误、AFF A800发生watchdog重置
适用场景
AFF A800
问题描述
- 自
SP-LATEST-CONSOLE-LOGS
PANIC: watchdog nmi on cpu 28, hang cpu is 0 in process idle: cpu28 on release 9.10.1P4 (C) on Fri Jul 15 07:34:06 UTC 2022
version: 9.10.1P4: Mon May 9 18:11:44 EDT 2022
- 自
SP-LATEST-SYSTEM-EVENT-LOG
Record 1262: Fri Jul 15 07:34:05.800000 2022 [IPMI.notice]: 0439 | 02 | EVT: 6fa10003 | PVDDQ_KLM | Assertion Event, "Uncorrectable ECC"
Record 1263: Fri Jul 15 07:34:05.810000 2022 [IPMI.notice]: 043a | 02 | EVT: 6fa10003 | PVDDQ_KLM | Assertion Event, "Uncorrectable ECC"
Record 1264: Fri Jul 15 07:34:07.770000 2022 [IPMI.notice]: 043b | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"
Record 1265: Fri Jul 15 07:34:08.210000 2022 [IPMI Event.critical]: NMI
Record 1266: Fri Jul 15 07:34:08.210000 2022 [IPMI.notice]: 043c | 02 | EVT: 6f00ffff | CriticalInt | Assertion Event, "NMI/Diag Interrupt"
- 自
SP-LATEST-RUNTIME
======================
FRU LEDs status
======================
FRU LED ID 1 = BMC Controller Active LED
FRU LED ID 2 = BMC Controller Attention LED
FRU LED ID 3 = BMC System LED
(...)
FRU LED ID 37 = BMC DIMM 16 Attention LED
FRU LED ID 1 is on
FRU LED ID 2 is on
FRU LED ID 3 is on. Set by BMC
(...)
FRU LED ID 37 is on
- 已
SP-LATEST-CONSOLE-LOGS
对DIMM进行测试和重新认证
Running full memory initialization.
PPR:Hard
PPR:Processing 0x0/0x0/0x1/0x0/0x0/0x0/0x0/0xD/0x0
PPR:Pre-PPR Row test PASS. dramMask = 0x0
PPR:Post-PPR Row test PASS. dramMask = 0x0
PPR:Sequence PASS