跳转到主内容

在 MetroCluster IP 中意外重新启动节点

Views:
29
Visibility:
Public
Votes:
0
Category:
metrocluster
Specialty:
MetroCluster
Last Updated:

适用于

  • ONTAP 9
  • MetroCluster IP
  • AFF-A700
  • X91146A T6 卡

问题

  • 节点意外重新启动,没有问题指示
  • SP 日志显示 HA 合作伙伴正在进行磁盘保留,这将在接管(CLAM 接管)后发生: 

Apr 18 04:01:40 [NodeA1:clam.node.ooq:EMERGENCY]: Node (name=NodeA2,
ID=1001) is out of "CLAM quorum" (reason=quorum update).

A disk reservation was detected on disk 7a.10.3P3 at 18Apr2023 04:01:44   
Ordinarily, this will only occur if the partner node has taken over.
This node will be shutdown.
HALT: HA partner has taken over disk reservations
Uptime: 47d18h37m13s
System rebooting...

  • 在重启前不久报告 HA 互连超时,并且由于心跳丢失而触发接管:

Sun Apr 18 20:35:39 +0200 [NodeA1: DR_heartbeat_thread: cf.ic.xferTimedOut:error]: HA interconnect: MCC_DRSOM transfer timed out.
Sun Apr 18 20:35:39 +0200 [NodeA1: cf_firmware: cf.ic.xferTimedOut:error]: HA interconnect: OFW transfer timed out.
Sun Apr 18 20:35:58 +0200 [NodeA1: cf_main:cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after
no heartbeat was detected from the partner node

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.