端口上的总线溢出导致集群运行状况被降级
适用场景
- 集群网络端口
- 检测到总线超限
- AFF A700
问题描述
- 节点报告集群端口处于 已降级状态、并显示与以下内容类似的消息:
[node_name-1: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Total Packet Loss - Ping failures detected between node_name-1_cluster1 ( 169.254.123.145 ) on node_name-1 and node_name-2_cluster1 ( 169.254.123.167 ) on node_name-2
[node_name-1: vifmgr: vifmgr.cluscheck.droppedall:alert]: Total packet loss when pinging from cluster lif node_name-1_cluster1 (node node_name-1) to cluster lif node_name-2_cluster1 (node node_name-2).
和/或
[node_name-1: vifmgr: vifmgr.port.monitor.failed:error]: The "l2_reachability" health check for port e0a (node node_name-1) has failed. The port is operating in a degraded state.
[node_name-1: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Insufficient L2 Reachability - Insufficient L2 Reachability detected from cluster port e0a on node node_name-1.
- ONTAP事件消息和 VIFMGR-LOG.GZ输出、具有:
::> event log show -messagename vifmgr*
Time Node Severity Event
---- ----------- ------------- ---------------------------
... node_name-1 ERROR vifmgr.cluscheck.droppedall: Total packet loss when pinging from cluster lif node_name-1_cluster2 (node node_name-1) to cluster lif node_name-2_cluster1 (node node_name-2).
... node_name-1 INFORMATIONAL vifmgr.portdown: A link down event was received on node node_name-1, port e0a.
... node_name-1 CRITICAL vifmgr.clus.linkdown: The cluster port e0a on node node_name-1 has gone down unexpectedly.
... node_name-1 INFORMATIONAL vifmgr.portdown: A link down event was received on node node_name-1, port e0a.
- 该集群网络端口报告不可接入性。示例:
::> network port reachability show -detail -node node_name-1 -port e0a
Node Port Expected Reachability Reachability Status
------------ -------- ---------------------------- --------------------------
node_name-1 e0a Cluster:Cluster no-reachability
Unreachable Ports: node_name-2:e0b, node_name-2:e0a, node_name-1:e0b
Unexpected Ports: -
- 增加集群网络端口ifstat输出上的总丢弃量和总线超限。示例:
::> system node run -node node_name -command ifstat e0a
-- interface e0a (0 hours, 38 minutes, 59 seconds) --
RECEIVE
Total frames: 217k | Frames/second: 93 | Total bytes: 98483k
Bytes/second: 42105 | Total errors: 0 | Errors/minute: 0
Total discards: 31183 | Discards/minute: 800 | Multi/broadcast: 131
Non-primary u/c: 0 | CRC errors: 0 | Runt frames: 0
...
Noproto: 0 | Error symbol: 0 | Illegal symbol: 0
Bus overruns: 31183 | Queue drops: 0 | LRO segments: 206k
LRO bytes: 95312k | LRO6 segments: 0 | LRO6 bytes: 0
...