节点崩溃,在重新启动时不再加入集群
适用于
- FAS2750 / AFF-A220
- AFF-A250
- MCC-IP(MetroCluster IP)
- MetroCluster 交换机 RCF 升级
问题描述
- 在 RCF 升级到 MCC-IP 中的交换机时,一个控制器出现 CLAM 崩溃
Aug 01 17:40:16 [node01:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node Node-01 has gone down unexpectedly. Aug 01 17:47:34 [node01:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node Node-01 has gone down unexpectedly. Aug 01 17:52:10 [node01:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node Node-01 has gone down unexpectedly. Aug 01 18:21:34 [node01:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0b on node Node-01 has gone down unexpectedly. PANIC : Received PANIC packet from partner, receiving message is (Coredump and takeover initiated because Connectivity, Liveliness and Availability Monitor (CLAM) has determined this node is out of quorum.)
- 重新启动时集群端口 e0a/e0b 已启动,但节点不健康
::*> network port show -role cluster
Auto-Negot Duplex Speed (Mbps)
Node Port Role Link MTU Admin/Oper Admin/Oper Admin/Oper
------ ------ ------------ ---- ----- ----------- ---------- ------------
node01
e0a cluster up 9000 true/true full/full auto/10000
e0b cluster up 9000 true/true full/full auto/10000
::> cluster show
Node Health Eligibility Epsilon
-------------------- ------- ------------ ------------
node01 false true false
node02 false true false
storage failover show
报告节点尚未启动其应用程序
::> storage failover show
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
node01
node02 true Connected to node02
node02
node01 true Connected to node01.
Waiting for cluster applications to
come online on the local node.
Offline applications: vldb, vifmgr,
bcomd, crs, scsi blade, clam.
2 entries were displayed.