在新BES交换机上将新节点添加到集群后、观察到入站数据包错误警告
适用场景
- 添加AFF A700节点的现有集群
- 集群网络交换机升级到BES-53248
问题描述
- 集群网络交换机已更换为BES-53248
- 集群已通过新的AF-A700节点进行了扩展
- 新添加节点的集群端口上的EMS中的数据包丢失消息。示例:
[?] Sat Oct 22 10:44:59 +0300 [node-02: vifmgr: vifmgr.cluscheck.ctdpktloss:alert]: Continued packet loss when pinging from cluster lif node-02_clus1 (node node-02) to cluster lif node-07_clus1 (node node-07).
[?] Sat Oct 22 10:51:28 +0300 [node-02: mgwd: callhome.hm.alert.major:alert]: Call home for Health Monitor process cshm: ClusterIfInErrorsWarn_Alert[cluster-SW2/Slot: 0 Port: 49 100G - Level].
- 新添加节点的集群端口上的运行状况监控警报。示例:
Node: node-05
Monitor: ethernet-switch
Alert ID: ClusterIfInErrorsWarn_Alert
Alerting Resource: cluster-SW1/Slot: 0 Port: 49 100G - Level
Description: The percentage of inbound packet errors of switch interface "cluster-SW1/Slot: 0 Port: 49 100G - Level" is above the warning threshold.
- 受影响端口和
OutDropPkts
Rx
TX Errors
.outputs.的交换机接口计数器不断增加。示例:
*************** show interface counters ***************
Port InOctets InUcastPkts InMcastPkts InBcastPkts InDropPkts Rx Error
--------- ---------------- ---------------- ---------------- ---------------- ---------------- ----------------
(...)
0/49 94694437047 172458306 29633 20635 0 260111
0/50 65891725795 148441933 29600 20712 1 212051
0/51 0 0 0 0 0 0
0/52 0 0 0 0 0 0
0/53 51009189766 93580229 12853 8950 7 124798
0/54 53578904828 89873168 12841 8869 0 126181
(...)
Port OutOctets OutUcastPkts OutMcastPkts OutBcastPkts OutDropPkts Tx Error
--------- ---------------- ---------------- ---------------- ---------------- ---------------- ----------------
(...)
0/49 79762420296 161280006 414369 545579 210497 63368
0/50 77932724095 157365236 414727 545696 212537 63588
0/51 0 0 0 0 0 0
0/52 0 0 0 0 0 0
0/53 40437115567 88005728 179938 287375 102559 38203
0/54 39124214192 80158224 179601 287091 103037 38216
(...)