MetroCluster IP中的节点意外重新启动
适用场景
- ONTAP 9
- MetroCluster IP
- AF-A700
- X91166A T6卡
问题描述
- 节点意外重新启动、并且没有问题描述指示
- 显示HA配对节点正在获取磁盘预留的SP日志、该预留可能会在接管(管理接管)后发生:
Apr 18 04:01:40 [NodeA1:clam.node.ooq:EMERGENCY]: Node (name=NodeA2,
ID=1001) is out of "CLAM quorum" (reason=quorum update).
A disk reservation was detected on disk 7a.10.3P3 at 18Apr2023 04:01:44
Ordinarily, this will only occur if the partner node has taken over.
This node will be shutdown.
HALT: HA partner has taken over disk reservations
Uptime: 47d18h37m13s
System rebooting...
- 由于检测信号丢失、在重新启动和接管触发之前不久会报告HA互连超时:
Sun Apr 18 20:35:39 +0200 [NodeA1: DR_heartbeat_thread: cf.ic.xferTimedOut:error]: HA interconnect: MCC_DRSOM transfer timed out. Sun Apr 18 20:35:39 +0200 [NodeA1: cf_firmware: cf.ic.xferTimedOut:error]: HA interconnect: OFW transfer timed out. Sun Apr 18 20:35:58 +0200 [NodeA1: cf_main:cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.