由于L2 FAS8300重置、L2/L2/ FAS8700重新启动AFF A400
适用场景
- ONTAP 9
- AFF A400
- FAS 8300
- FAS 8700
问题描述
- 由于L2监督重置、节点意外重新启动。
- 运行正常的配对节点出现ONTAP事件消息(EMS)错误:
NOTICE cf.hwassist.takeoverTrapRecv: hw_assist: Received takeover hw_assist alert from partner(node-01), system_down because reset_via_sp.
NOTICE cf.hwassist.takeoverTrapRecv: hw_assist: Received takeover hw_assist alert from partner(node-01), system_down because l2_watchdog_reset.
或
[node-1: cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(node_name-2), system_down because power_off_via_sp.
- 来自受影响节点的ONTAP崩溃消息:
[node-2: send_boot_msg_thread: mgr.stack.string:notice]: Panic string: watchdog nmi on cpu 8, hang cpu is 1 in process idle: cpu8 on release...
- BMC日志报告 NMI错误:
BMC> system log sel
df | 11/06/2021 | 01:58:24 | System Event #0xff | Timestamp Clock Sync | Asserted
e0 | 11/06/2021 | 02:12:53 | Watchdog 2 #0xb1 | Timer interrupt (NMI/SMS/OS) | Asserted
e1 | 11/06/2021 | 02:12:53 | Critical Interrupt #0xb0 | NMI/Diag Interrupt | Asserted
e2 | 11/06/2021 | 02:12:56 | Watchdog 2 #0xb1 | Hard reset (NMI/SMS/OS) | Asserted
e3 | 11/06/2021 | 02:12:56 | Power Unit #0xb2 | Power reset | Asserted | from channel 15