L2 看门狗重置后 PCI 死机
适用于
FAS 2750
问题
- 存在 L2 看门狗重置的接管
Record 1824: Thu Mar 26 01:01:35.735405 2026 [IPMI Event.critical]: NMIRecord 1825: Thu Mar 26 01:01:36.373886 2026 [IPMI.notice]: 0381 | 02 | EVT: 6fc124ff | System_Watchdog | Assertion Event, "Hard reset"Record 1826: Thu Mar 26 01:01:36.692786 2026 [IPMI Event.critical]: L2 watchdog timeout hard resetRecord 1827: Thu Mar 26 01:01:36.692874 2026 [IPMI Event.critical]: System resetRecord 1828: Thu Mar 26 01:01:36.812256 2026 [IPMI Event.critical]: L2 watchdog action completed[?] Thu Mar 26 06:31:37 +0530 [node-02: cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(node-01), system_down because l2_watchdog_reset.- 看门狗重置后,在引导时,在插槽 1 上看到 PCI NMI 死机
random: registering fast source Intel Secure Key RNGnvme0: doorbell stride #17. nvme0: NSSR Occurred, clearingPANIC : PCI Error NMI from device(s):ErrSrcID(CorrSrc(0x100),UCorrSrc(0x8)), RPT(0,1,0):PCI Device 144d:a808 in slot 1 on Controller. version: 9.15.1P13: Tue Jul 15 10:07:45 EDT 2025conf : x86_64.optimizemodel : FAS2750memory : 22508 MB SK + 9847 MB BSD (1696 MB FuseN)cpuid = 0