SG1100 Panic:CPU 未响应 MCE 广播
适用于
- NetApp StorageGRID 管理节点
- SG1100
问题
- 管理节点(SG1100)发生意外重新启动和暂时无法访问。

- BMC 日志 CPU 目录错误(
CATERR)
331 Mar/13/2026 01:37:41 [Information] [Host Res Warning] [OEM] Host Partition Reset triggered 255 minutes - Asserted 330 Mar/13/2026 01:36:37 [Critical] [CATERR] [Processor] IERR - Asserted 329 Mar/13/2026 01:35:10 [Critical] [CATERR] [Processor] Machine Check Exception (MCERR) - Asserted
storagegrid_crash_dmesg.log表示 kernel 由于CPUs not responding to MCE broadcast触发了恐慌
[5048608.845286] watchdog: BUG: soft lockup - CPU#75 stuck for 78s! [prometheus-node:46612]
... [5048616.006133] mce: CPUs not responding to MCE broadcast (may include false positives): 10,58 [5048616.006138] Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler