由于watchdog重置、AFF A700/FAS9000重新启动
适用场景
- AFF A700
- FAS9000
问题描述
- L2监护程序重置导致节点意外重新启动
- 节点报告中的服务处理器(SP)事件日志:
Record 676: Sat Sep 03 07:48:57.485823 2022 [IPMI Event.critical]: NMI
Record 677: Sat Sep 03 07:48:58.692798 2022 [IPMI Event.critical]: L2 watchdog timeout hard reset
Record 678: Sat Sep 03 07:48:58.730798 2022 [Trap Event.critical]: hwassist l2_watchdog_reset (29)
Record 683: Sat Sep 03 07:49:13.116765 2022 [IPMI.notice]: 3f04 | 02 | EVT: 6fc804ff | System_Watchdog | Assertion Event, "Timer interrupt"
Record 684: Sat Sep 03 07:49:13.132351 2022 [IPMI.notice]: 4004 | 02 | EVT: 6fc104ff | System_Watchdog | Assertion Event, "Hard reset"
Record 685: Sat Sep 03 07:49:43.344481 2022 [IPMI.notice]: 4104 | 02 | EVT: 6f02ffff | PCM_Status | Assertion Event, "Fault"
- 重新启动后、在启动期间和EMS中可能会观察到类似以下内容的错误:
[cluster-01:mgr.boot.reason_abnormal:EMERGENCY]: System rebooted due to a watchdog reset.