跳转到主内容

CONTAP-449185: PANIC: 故障转移监视器:无法传输 - 接管过程在 9.9.1P16 (C) 版的 SK 进程 cf_main 中挂起 (wafl)

Views:
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
core
Last Updated:

问题描述

在 SnapMirror 更新期间,源节点遇到多个"内存不足"(OOM)错误,导致后续 SnapMirror 失败。最终,故障转移尝试导致配对节点出现死机。
Panic on cpu#10: PANIC: Failover Monitor: unable to transit - takeover process is hung (wafl) in SK process cf_main on release 9.9.1P16 (C) on Tue Apr 29 15:40:36 CST 2025
此节点开始接管其已发生崩溃的配对节点。
Tue Apr 29 15:30:34 +0800 [node01: cf_firmware: cf.fm.partnerFwTransition:info]: params: {'prevstate': 'SF_UP', 'newstate': 'SF_SPARECORE', 'progresscounter': '2'}
Tue Apr 29 15:30:34 +0800 [node01: cf_main: cf.fsm.firmwareStatus:info]: Failover monitor: partner Dumping sparecore
Tue Apr 29 15:30:34 +0800 [node01: cf_main: cf.fsm.takeover.panic:alert]: Failover monitor: takeover attempted after partner panic.
Tue Apr 29 15:30:34 +0800 [node01: cf_main: cf.fsm.stateTransit:info]: Failover monitor: UP --> TAKEOVER
Tue Apr 29 15:30:34 +0800 [node01: cf_takeover: ha.takeover.stateChng:debug]: params: {'old_state': 'NOT_IN_TAKEOVER', 'new_state': 'IN_CFO_TAKEOVER'}
Tue Apr 29 15:30:34 +0800 [node01: cf_takeover: cf.fm.takeoverStarted:notice]: Failover monitor: takeover started
...
Tue Apr 29 15:30:34 +0800 [node01: cf_takeover: cf.fm.takeoverCommitted:debug]: Failover monitor: takeover committed
Tue Apr 29 15:30:34 +0800 [node01: ThreadHandlerun: clam.update.partner.state:info]: CLAM on node (ID=1000) updated failover state of partner (ID=1001) to to.
...
Tue Apr 29 15:31:00 +0800 [node01: monitor: monitor.globalStatus.ok:notice]: This node is attempting to takeover node02.
但是,传输事件在 10 分钟后超时,导致此节点崩溃。

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.