ONTAP升级后主板状态降级
适用场景
- ONTAP 9
- 集群网络交换机
问题描述
- ONTAP升级后的运行状况检查显示主板状态已降级。
::> system health status show
Status
---------------
degraded
::> system health subsystem show
Subsystem Health
----------------- ------------------
SAS-connect ok
Environment ok
Memory ok
Service-Processor ok
Switch-Health ok
CIFS-NDO ok
Motherboard degraded
IO ok
MetroCluster ok
MetroCluster_Node ok
FHM-Switch ok
FHM-Bridge ok
SAS-connect_Cluster ok
13 entries were displayed.
- 我们看到 节点1和2上针对e0c报告了NodeIfInErrorsWarnAlert运行状况警报。
::> system health alert show
Node: node2
Alert ID: NodeIfInErrorsWarnAlert
Resource: e0c
Severity: Major
Indication Time: Thu Mar 27 18:33:07 2025
Suppress: false
Acknowledge: false
Probable Cause: The percentage of inbound packet errors of node
"node2" on interface "e0c" is above the
warning threshold.
Possible Effect: Communication from this node to the cluster might be
degraded
Corrective Actions: 1) Migrate any cluster LIF that uses this connection to another port connected to a cluster switch.
For example, if cluster LIF "clus1" is on port e0a and the other LIF is on e0b,
run the following command to move "clus1" to e0b:
"network interface migrate -vserver vs1 -lif clus1 -sourcenode node1 -destnode node1 -dest-port e0b"
2) Replace the network cable with a known-good cable.
If errors are corrected, stop. No further action is required.
Otherwise, continue to Step 3.
3) Move the network cable to another port on the node (if available).
Migrate the cluster LIF to the new port.
If errors are corrected, contact technical support to troubleshoot the original node port.
Otherwise, continue to Step 4.
4) Move the network cable to another available cluster switch port.
Migrate the cluster LIF back to the original port.
If errors are corrected, contact technical support to troubleshoot the original switch port.
If errors persist, contact technical support for
further assistance.
Node: node1
Alert ID: NodeIfInErrorsWarnAlert
Resource: e0c
Severity: Major
Indication Time: Thu Mar 27 18:33:01 2025
Suppress: false
Acknowledge: false
Probable Cause: The percentage of inbound packet errors of node
"node1" on interface "e0c" is above the
warning threshold.
Possible Effect: Communication from this node to the cluster might be
degraded
Corrective Actions: 1) Migrate any cluster LIF that uses this connection to another port connected to a cluster switch.
For example, if cluster LIF "clus1" is on port e0a and the other LIF is on e0b,
run the following command to move "clus1" to e0b:
"network interface migrate -vserver vs1 -lif clus1 -sourcenode node1 -destnode node1 -dest-port e0b"
2) Replace the network cable with a known-good cable.
If errors are corrected, stop. No further action is required.
Otherwise, continue to Step 3.
3) Move the network cable to another port on the node (if available).
Migrate the cluster LIF to the new port.
If errors are corrected, contact technical support to troubleshoot the original node port.
Otherwise, continue to Step 4.
4) Move the network cable to another available cluster switch port.
Migrate the cluster LIF back to the original port.
If errors are corrected, contact technical support to troubleshoot the original switch port.
If errors persist, contact technical support for
further assistance.
2 entries were displayed
- 由于节点node1和node2的集群端口e0c上的CRC错误增加、报告了nodeIfInErrorsWarnAlert错误。
EMS
The percentage of inbound packet errors of node "node1" on interface "e0c" is above the warning threshold.
The percentage of inbound packet errors of node "node2" on interface "e0c" is above the warning threshold.
[node1: vifmgr: vifmgr.cluscheck.ctdpktloss:alert]: Continued packet loss when pinging from cluster lif node2_clus2 (node node2) to cluster lif node1 (node node1).
[node1: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Large MTU Packet Loss - Ping failures detected between node2 ( 169.XXX.XX.217 ) on node2 and node1_clus1 ( 169.XXX.XX.173 ) on node1
ifconfig -v
node2
-- interface e0c (16 hours, 4 minutes, 52 seconds) --
RECEIVE
Total frames: 354m | Frames/second: 6130 | Total bytes: 499g
Bytes/second: 8631k | Total errors: 32176k | Errors/minute: 33348
Total discards: 4 | Discards/minute: 0 | Multi/broadcast: 1545k
Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0
CRC errors: 28157k | Runt frames: 0 | Fragment: 111k
Long frames: 221 | Jabber: 4 | Length errors: 0
Alignment errors: 0 | No buffer: 0 | Pause: 0
Jumbo: 36975k | Error symbol: 3907k | Bus overruns: 4
Queue drops: 0 | LRO segments: 291m | LRO bytes: 481g
LRO6 segments: 0 | LRO6 bytes: 0 | Bad UDP cksum: 0
Bad UDP6 cksum: 0 | Bad TCP cksum: 0 | Bad TCP6 cksum: 0
Mcast v6 solicit: 0 | Lagg errors: 0 | Lacp errors: 0
Lacp PDU errors: 0
Node1
-- interface e0c (8 hours, 25 minutes, 21 seconds) --
RECEIVE
Total frames: 157m | Frames/second: 5185 | Total bytes: 130g
Bytes/second: 4288k | Total errors: 14338k | Errors/minute: 28374
Total discards: 0 | Discards/minute: 0 | Multi/broadcast: 114k
Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0
CRC errors: 13012k | Runt frames: 0 | Fragment: 59563
Long frames: 365 | Jabber: 0 | Length errors: 0
Alignment errors: 0 | No buffer: 0 | Pause: 0
Jumbo: 5423k | Error symbol: 1266k | Bus overruns: 0
Queue drops: 0 | LRO segments: 112m | LRO bytes: 121g
LRO6 segments: 0 | LRO6 bytes: 0 | Bad UDP cksum: 0
Bad UDP6 cksum: 0 | Bad TCP cksum: 0 | Bad TCP6 cksum: 0
Mcast v6 solicit: 0 | Lagg errors: 0 | Lacp errors: 0
Lacp PDU errors: 0