节点崩溃,在重新启动时不再加入集群
适用于
- FAS2750 / AFF-A220
- AFF-A250
- MCC-IP(MetroCluster IP)
- MetroCluster 交换机 RCF 升级
问题描述
- 在 RCF 升级到 MCC-IP 中的交换机时,一个控制器出现 CLAM 崩溃
Aug 01 17:40:16 [node01:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node Node-01 has gone down unexpectedly. Aug 01 17:47:34 [node01:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node Node-01 has gone down unexpectedly. Aug 01 17:52:10 [node01:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0a on node Node-01 has gone down unexpectedly. Aug 01 18:21:34 [node01:vifmgr.clus.linkdown:EMERGENCY]: The cluster port e0b on node Node-01 has gone down unexpectedly. PANIC : Received PANIC packet from partner, receiving message is (Coredump and takeover initiated because Connectivity, Liveliness and Availability Monitor (CLAM) has determined this node is out of quorum.)
- 重新启动时集群端口 e0a/e0b 已启动,但节点不健康
::*> network port show -role cluster
                     Auto-Negot  Duplex    Speed (Mbps)
 Node   Port   Role      Link   MTU Admin/Oper  Admin/Oper Admin/Oper
 ------ ------ ------------ ---- ----- ----------- ---------- ------------
 node01
     e0a   cluster    up   9000  true/true  full/full   auto/10000
     e0b   cluster    up   9000  true/true  full/full   auto/10000
::> cluster show
 Node          Health  Eligibility   Epsilon
 -------------------- ------- ------------  ------------
 node01           false   true      false
 node02           false   true      false
  
- storage failover show报告节点尚未启动其应用程序
::> storage failover show
                 Takeover
 Node       Partner     Possible State Description
 -------------- -------------- -------- -------------------------------------
 node01
         node02      true    Connected to node02
 node02
         node01      true    Connected to node01.
                     Waiting for cluster applications to
                     come online on the local node.
                     Offline applications: vldb, vifmgr,
                     bcomd, crs, scsi blade, clam.
 2 entries were displayed.