在AFF A700s平台上处理L2 watchdog重置
适用场景
- AFF A700
问题描述
- 节点意外重新启动
- 节点在意外关闭后未重新启动
受影响节点上的BMC日志显示以下内容:
453 | 05/10/2022 | 23:21:58 | CriticalInt | Software NMI | Asserted
454 | 05/10/2022 | 23:21:58 | Watchdog2 | Timer interrupt | Asserted
455 | 05/10/2022 | 23:21:59 | Watchdog2 | Hard reset | Asserted
456 | 05/10/2022 | 23:21:59 | SysReset | State Asserted | Asserted
- 如果节点重新启动、则EMS日志文件可能会显示以下错误
Wed May 11 00:21:59 +0100 [NetApp: cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(n4-nht-fas-c03-02), system_down because l2_watchdog_reset. Wed May 11 00:21:59 +0100 [NetApp: cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(n4-nht-fas-c03-02), system_down because reset_via_sp. Wed May 11 00:22:00 +0100 [NetApp: cf_main: cf.fsm.stateTransit:info]: Failover monitor: UP --> TAKEOVER