在 MetroCluster IP 中意外重新启动节点
适用于
- ONTAP 9
- MetroCluster IP
- AFF-A700
- X91146A T6 卡
问题
- 节点意外重新启动,没有问题指示
- SP 日志显示 HA 合作伙伴正在进行磁盘保留,这将在接管(CLAM 接管)后发生:
Apr 18 04:01:40 [NodeA1:clam.node.ooq:EMERGENCY]: Node (name=NodeA2,
ID=1001) is out of "CLAM quorum" (reason=quorum update).
A disk reservation was detected on disk 7a.10.3P3 at 18Apr2023 04:01:44
Ordinarily, this will only occur if the partner node has taken over.
This node will be shutdown.
HALT: HA partner has taken over disk reservations
Uptime: 47d18h37m13s
System rebooting...
- 在重启前不久报告 HA 互连超时,并且由于心跳丢失而触发接管:
Sun Apr 18 20:35:39 +0200 [NodeA1: DR_heartbeat_thread: cf.ic.xferTimedOut:error]: HA interconnect: MCC_DRSOM transfer timed out.。
Sun Apr 18 20:35:39 +0200 [NodeA1: cf_firmware: cf.ic.xferTimedOut:error]: HA interconnect: OFW transfer timed out.
Sun Apr 18 20:35:58 +0200 [NodeA1: cf_main:cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after
no heartbeat was detected from the partner node