跳转到主内容

ONTAP升级后主板状态降级

Views:
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw
Last Updated:

适用场景

  • ONTAP 9
  • 集群网络交换机

问题描述

  • ONTAP升级后的运行状况检查显示主板状态已降级。

::> system health status show
Status
---------------
degraded

::> system health subsystem show
Subsystem         Health
----------------- ------------------
SAS-connect       ok
Environment       ok
Memory            ok
Service-Processor ok
Switch-Health     ok
CIFS-NDO          ok
Motherboard       degraded
IO                ok
MetroCluster      ok
MetroCluster_Node ok
FHM-Switch        ok
FHM-Bridge        ok
SAS-connect_Cluster ok
13 entries were displayed.

  • 我们看到 节点1和2上针对e0c报告了NodeIfInErrorsWarnAlert运行状况警报。

::> system health alert show
               Node: node2
           Alert ID: NodeIfInErrorsWarnAlert
           Resource: e0c
           Severity: Major
    Indication Time: Thu Mar 27 18:33:07 2025
           Suppress: false
        Acknowledge: false
     Probable Cause: The percentage of inbound packet errors of node
                    "node2" on interface "e0c" is above the
                     warning threshold.
    Possible Effect: Communication from this node to the cluster might be
                     degraded
Corrective Actions: 1) Migrate any cluster LIF that uses this connection to another port connected to a cluster switch.
                      For example, if cluster LIF "clus1" is on port e0a and the other LIF is on e0b,
                      run the following command to move "clus1" to e0b:
                     "network interface migrate -vserver vs1 -lif clus1 -sourcenode node1 -destnode node1 -dest-port e0b"
                     2) Replace the network cable with a known-good cable.
                     If errors are corrected, stop. No further action is required.
                     Otherwise, continue to Step 3.
                     3) Move the network cable to another port on the node (if available).
                     Migrate the cluster LIF to the new port.
                     If errors are corrected, contact technical support to troubleshoot the original node port.
                      Otherwise, continue to Step 4.
                     4) Move the network cable to another available cluster switch port.
                     Migrate the cluster LIF back to the original port.
                     If errors are corrected, contact technical support to troubleshoot the original switch port.
                     If errors persist, contact technical support for
                     further assistance.

               Node: node1
           Alert ID: NodeIfInErrorsWarnAlert
           Resource: e0c
           Severity: Major
    Indication Time: Thu Mar 27 18:33:01 2025
           Suppress: false
        Acknowledge: false
     Probable Cause: The percentage of inbound packet errors of node
                    "node1" on interface "e0c" is above the
                     warning threshold.
    Possible Effect: Communication from this node to the cluster might be
                     degraded
Corrective Actions: 1) Migrate any cluster LIF that uses this connection to another port connected to a cluster switch.
                      For example, if cluster LIF "clus1" is on port e0a and the other LIF is on e0b,
                      run the following command to move "clus1" to e0b:
                     "network interface migrate -vserver vs1 -lif clus1 -sourcenode node1 -destnode node1 -dest-port e0b"
                     2) Replace the network cable with a known-good cable.
                     If errors are corrected, stop. No further action is required.
                     Otherwise, continue to Step 3.
                     3) Move the network cable to another port on the node (if available).
                     Migrate the cluster LIF to the new port.
                     If errors are corrected, contact technical support to troubleshoot the original node port.
                      Otherwise, continue to Step 4.
                     4) Move the network cable to another available cluster switch port.
                     Migrate the cluster LIF back to the original port.
                     If errors are corrected, contact technical support to troubleshoot the original switch port.
                     If errors persist, contact technical support for
                     further assistance.

2 entries were displayed

  • 由于节点node1和node2的集群端口e0c上的CRC错误增加、报告了nodeIfInErrorsWarnAlert错误。

EMS

The percentage of inbound packet errors of node "node1" on interface "e0c" is above the warning threshold.
The percentage of inbound packet errors of node "node2" on interface "e0c" is above the warning threshold.

[node1: vifmgr: vifmgr.cluscheck.ctdpktloss:alert]: Continued packet loss when pinging from cluster lif node2_clus2 (node node2) to cluster lif node1 (node node1).

[node1: vifmgr: callhome.clus.net.degraded:alert]: Call home for CLUSTER NETWORK DEGRADED: Large MTU Packet Loss - Ping failures detected between node2 ( 169.XXX.XX.217 ) on node2 and node1_clus1 ( 169.XXX.XX.173 ) on node1

ifconfig -v

node2

    -- interface  e0c  (16 hours, 4 minutes, 52 seconds) --

    RECEIVE
     Total frames:      354m | Frames/second:    6130  | Total bytes:       499g
     Bytes/second:     8631k | Total errors:    32176k | Errors/minute:   33348
     Total discards:      4  | Discards/minute:     0  | Multi/broadcast:  1545k
     Non-primary u/c:     0  | Errored frames:      0  | Unsupported Op:      0
    CRC errors:      28157k | Runt frames:         0  | Fragment:          111k
     Long frames:       221  | Jabber:              4  | Length errors:       0
     Alignment errors:    0  | No buffer:           0  | Pause:               0
     Jumbo:           36975k | Error symbol:     3907k | Bus overruns:        4
     Queue drops:         0  | LRO segments:      291m | LRO bytes:         481g
     LRO6 segments:       0  | LRO6 bytes:          0  | Bad UDP cksum:       0
     Bad UDP6 cksum:      0  | Bad TCP cksum:       0  | Bad TCP6 cksum:      0
     Mcast v6 solicit:    0  | Lagg errors:         0  | Lacp errors:         0
     Lacp PDU errors:     0



Node1

-- interface  e0c  (8 hours, 25 minutes, 21 seconds) --

    RECEIVE
     Total frames:      157m | Frames/second:    5185  | Total bytes:       130g
     Bytes/second:     4288k | Total errors:    14338k | Errors/minute:   28374
     Total discards:      0  | Discards/minute:     0  | Multi/broadcast:   114k
     Non-primary u/c:     0  | Errored frames:      0  | Unsupported Op:      0
    CRC errors:      13012k | Runt frames:         0  | Fragment:        59563
     Long frames:       365  | Jabber:              0  | Length errors:       0
     Alignment errors:    0  | No buffer:           0  | Pause:               0
     Jumbo:            5423k | Error symbol:     1266k | Bus overruns:        0
     Queue drops:         0  | LRO segments:      112m | LRO bytes:         121g
     LRO6 segments:       0  | LRO6 bytes:          0  | Bad UDP cksum:       0
     Bad UDP6 cksum:      0  | Bad TCP cksum:       0  | Bad TCP6 cksum:      0
     Mcast v6 solicit:    0  | Lagg errors:         0  | Lacp errors:         0
     Lacp PDU errors:     0

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.