CHS-3129:由于传感器不可读、AFF A70意外的L2监视程序重置
问题描述
节点发生接管事件、如下所示:Fri Mar 28 16:30:19 +0100 [NETAPP-HOT-02: cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(NETAPP-HOT-01), system_down because l2_watchdog_reset.
BMC Logs indicate that the BMC triggered the reset due to an unreadable sensor situation against an unknown sensor:
================ Log end time Fri Mar 14 11:38:20 2025
================ Log start time Tue Jan 3 20:25:31 2017
BIOS Version: 20.4
...
Waiting for giveback...(Press Ctrl-C to abort wait)Continuing boot...
https://smartsolve.netapp.com/#cdv?refasup=2025033105130016&view=vertical§ ion=asupids=2025033106570204 SPRecord 484: Fri Mar 28 15:30:18.448261 2025 [IPMI.notice]: 014c | 02 | EVT: 6fc824ff | System_Watchdog | Assertion Event, "Timer interrupt"
Record 485: Fri Mar 28 15:30:18.824581 2025 [IPMI Event.critical]: NMI
Record 486: Fri Mar 28 15:30:18.861607 2025 [IPMI.notice]: 014d | 02 | EVT: 6f00ffff | CriticalInt | Assertion Event, "NMI/Diag Interrupt"
Record 487: Fri Mar 28 15:30:19.705186 2025 [IPMI.notice]: 014e | 02 | EVT: 6fc124ff | System_Watchdog | Assertion Event, "Hard reset"
Record 488: Fri Mar 28 15:30:19.801360 2025 [IPMI Event.critical]: L2 watchdog timeout hard reset
Record 489: Fri Mar 28 15:30:19.832180 2025 [IPMI Event.critical]: System reset
...
Record 495: Fri Mar 28 15:40:22.584522 2025 [IPMI.warning]: Recovering BMC due to a non-readable sensor