MetroCluster:执行恢复时、LUN脱机
适用场景
- MetroCluster IP
- MetroCluster后端端口上的集群端口
- MetroCluster后端端口脱机
- 交还
问题描述
- 如果在一个或多个MetroCluster后端端口脱机时启动节点并执行归还、则发生原因集群可能会丢失仲裁
- 然后、VifMgr (Virtual Interface Manager)将脱机、进而触发FreBSD使所有的LUN脱机、以避免重复的IP冲突
示例:
NODE_03 VifMgr无法使用e0a上的集群LIF 1加入仲裁
[kern_vifmgr:info:9017] A [src/rdb/TM.cc 1621 (0x80ea38600)]: _triggerOnlineStatusCallback: TM 1002: Report UNIT_IS_OFFLINE (epoch 0, master 0). Reason: RW_TXN txn could not acquire transaction: RPC failure ().
[kern_vifmgr:info:9017] A [src/rdb/TM.cc 1625 (0x80ea38600)]: _triggerOnlineStatusCallback: FAILOVER rdb: Local unit VifMgr offline
NODE_03 VifMgr尝试将集群LIF 1移至另一个端口、但因该端口为OOQ而失败
[kern_vifmgr:info:9017] [0x812356d00] [Net::CdbLifHandle::avoidDownPorts] LIF lif:cdb:node_03:node_03_clus1 (1000) is assigned to a down port (node_03:e0a). Attempting to reassign.
[kern_vifmgr:info:9017] Warning: Unable to list entries on node node_04. RPC: Port mapper failure [from vifmgr on node "node_03" (VSID: -3) to mgwd at 169.254.249.59]
NODE_04 VifMgr丢失仲裁、因为它无法与NODE_03进行通信
[kern_vifmgr:info:9156] A [src/rdb/cluster_events.cc 88 (0x80e836c00)]: Report: Cluster event: cluster-quorum-ends, epoch 31, site 1003 [not enough healthy nodes (1/2 healthy)].
[kern_vifmgr:info:9156] A [src/rdb/quorum/qm_states/inq/HoldingQuorumState.cc 55 (0x80e836c00)]: doWork: Master losing quorum, not enough votes to maintain quorum at 2248s.
NODE_04不会在65秒宽限期内重新获得仲裁、并使可能托管在NODE_03上的任何BID脱机、以避免出现splitbrain或重复IP情形
[kern_vifmgr:info:9156] [0x80ae37300] [EventMgr::unitOffline] Setting VifMgr operational status as OOQ
[kern_vifmgr:info:9156] [0x80ae37300] [FailoverMgr::localNodeDown] VifMgr on node node_04 is now out of quorum.
[node_04: vifmgr: vifmgr.lifBeingRemoved:notice]: LIF data_01 (on virtual server 7), IP address 1.11.20.12, is being removed from node node_04, port a0a-120.