跳转到主内容

升级集群交换机后从集群LIF执行pinging时、数据包持续丢失

Views:
Visibility:
Public
Votes:
0
Category:
fabric-interconnect-and-management-switches<a>2009804577</a>
Specialty:
hw
Last Updated:

适用场景

  • Cisco NX3232C集群网络交换机(CNS)
  • NX-OS固件更新 
  • RC框架 固件从1.8或更早版本更新到1.10或更高版本

问题描述

  • 所有节点持续报告:
[vifmgr: vifmgr.cluscheck.ctdpktloss:debug]: Continued packet loss when pinging from cluster lif node-01_clus-1 (node node-01) to cluster lif node-02_clus2 (node node-02).
 
对彼此的集群生命周期执行pinging时。
  • 一半集群ping集群失败。示例

::*> cluster ping-cluster -node node-01
...
 Basic connectivity succeeds on 14 path(s)
 Basic connectivity fails on 14 path(s)
 ...
 Larger than PMTU communication succeeds on 14 path(s)
 RPC status:
 14 paths up, 0 paths down (tcp check)
 14 paths up, 0 paths down (udp check)

  • 每次将连接到交换机1的集群端口还原为交换机2的LIF时:
    •  EMS报告的消息类似于:
vifmgr: vifmgr.dbase.checkerror:alert]: VIFMgr experienced an error verifying cluster database consistency. Some LIFs might not be hosted properly as a result.
vifmgr: vifmgr.startup.failover.err:alert]: VIFMgr encountered errors during startup.
  • vipmgr"报告类似于以下内容的消息:
[kern_vifmgr:info:6537] rdb::qm:...:src/rdb/quorum/qm_states/inq/SecondaryState.cc:222 (thr_id:0x80c138500) SecondaryState::receivePoll Leaving quorum at 21170636s apparent starvation or RPC failure at sender 1003. Sender expected VS_Unknown, actual WS_QuorumMember.
  • mgwd报告的消息类似于:
[kern_mgwd:info:2343] A [src/rdb/quorum/qm_states/inq/SecondaryState.cc 217 (0x823d60300)]: receivePoll: Leaving quorum at 9068946s apparent starvation or RPC failure at sender 1003. Sender expected VS_Unknown, actual WS_QuorumMember.
[kern_mgwd:info:2343] A [src/rdb/cluster_events.cc 88 (0x823d60300)]: Report: Cluster event: node-event, epoch 31, site 1004 [apparent starvation detected in voting protocol].
[kern_mgwd:info:2325] W [src/rdb/TM.cc 3923 (0x821377f00)]: _coord_commit: TM 1003: Transaction TID <31,277502,277502> commit failed: UNIT_OFFLINE; declaring unstable quorum in epoch 31.  Total participating sites: 3, number of sites committed: 3, epsilon commit: true
[kern_mgwd:info:2325] rdb::TM:Mon Nov 06 11:06:47 2023:src/rdb/TM.cc:3933 (thr_id:0x821377f00) TM 1003: Transaction TID <31,277502,277502> commit failed: UNIT_OFFLINE; declaring unstable quorum in epoch 31.  Total participating sites: 3, number of sites committed: 3, epsilon commit: true
  • 问题描述保持不变、无论是否启用ISL (以隔离每个交换机上的流量)

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.