跳转到主内容

CX6 NIC X91153A 的链路重置消息重复

Views:
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
hw
Last Updated:

适用于

  • AFF-A900
  • ONTAP 9
  • CX6 PSID 卡

问题描述

  • 自 2024 年 6 月 30 日以来,Link Resetting 消息一直在节点 node-01 的插槽 2 上重复出现
SYSCONFIG -A
slot 2: Dual 40G/100G/200G Ethernet Controller CX6
 
SYSCONFIG -AC
sysconfig: slot 2 OK: X91153A: 2p 40G/100G RoCE QSFP28
 
EMS
(2024年6月)
Sun Jun 30 00:17:51 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2a(pci0:51:0:0) has generated a register dump in /mroot/etc/mlx5log : Link Resetting.
Sun Jun 30 00:17:51 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2a(pci0:51:0:0) failed to generate a register dump with error = 17 : Link Resetting.
Sun Jun 30 00:17:51 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2b(pci0:51:0:1) has generated a register dump in /mroot/etc/mlx5log : Link Resetting.
Sun Jun 30 00:17:51 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2b(pci0:51:0:1) failed to generate a register dump with error = 17 : Link Resetting.
 
(2025...)
Thu Sep 25 20:00:55 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2a(pci0:51:0:0) failed to generate a register dump with error = 17 : Link Resetting.
Thu Sep 25 20:08:50 +0900 [node-01: CCMA-Worker: netif.linkInfo:info]: Ethernet adapter e2a(pci0:51:0:0) failed to generate a register dump with error = 17 : Link Resetting.
Thu Sep 25 20:11:05 +0900 [node-01: CCMA-Worker: netif.linkInfo:info]: Ethernet adapter e2a(pci0:51:0:0) has generated a register dump in /mroot/etc/mlx5log : Link Resetting.
Thu Sep 25 20:15:27 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2b(pci0:51:0:1) failed to generate a register dump with error = 17 : Link Resetting.
Thu Sep 25 20:17:42 +0900 [node-01: kernel: netif.linkInfo:info]: Ethernet adapter e2b(pci0:51:0:1) has generated a register dump in /mroot/etc/mlx5log : Link Resetting.
  • 在从 ONTAP 9.12.1P7 到 9.15.1P14 的 NDU 升级过程中,具有此不稳定 CX6 NIC 的节点 node-01 遇到了死机
cluster::*> storage failover takeover -ofnode node-01
cluster::*> Files /cfcard/x86_64/freebsd/image1/VERSION and /var/VERSION differ
ERROR: /var cannot be downgraded.
Waiting for PIDS:  1392.
Terminated
.
Setting default boot image to image1...
done.
Uptime: 722d2h54m27s
PANIC  : peg_nvmeof_qpair_flush_request: Failed to move RDMA qp (0xfffff804eac60c00) to error state: -60
 
version: 9.12.1P7: Fri Sep 15 02:00:51 EDT 2023
conf  : x86_64.optimize
cpuid = 3
KDB: stack backtrace:
vpanic() at vpanic+0x429/frame 0xfffffe121d094210
panic() at panic+0x42/frame 0xfffffe121d094270
peg_nvmeof_qpair_flush_request() at peg_nvmeof_qpair_flush_request+0x74a/frame 0xfffffe121d094360
peg_nvmeof_ctrlr_fail_task() at peg_nvmeof_ctrlr_fail_task+0xa8/frame 0xfffffe121d094390
stack_zero() at stack_zero+0x137/frame 0xfffffe121d0943f0
taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame 0xfffffe121d094430
fork_exit() at fork_exit+0xb2/frame 0xfffffe121d094470
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe121d094470
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 722d2h56m51s
 
PANIC: peg_nvmeof_qpair_flush_request: Failed to move RDMA qp (0xfffff804eac60c00) to error state: -60 in process peg nvmeof taskq_31 on release 9.12.1P7 (C) on Thu Sep 25 20:19:51 KST 2025
version: 9.12.1P7: Fri Sep 15 02:00:51 EDT 2023
 
  • 死机重启后,在 sysconfig -a 输出中不再识别节点 node-01 上的 CX6 NIC
NDU 之前:
slot 1: Dual 40G/100G/200G Ethernet Controller CX6
slot 2: Dual 40G/100G/200G Ethernet Controller CX6
e2a MAC Address:   xx:xx:xx:xx:xx:90 (auto-100g_cr4-fd-up)
e2b MAC Address:   xx:xx:xx:xx:xx:91 (auto-100g_cr4-fd-up)
slot 3: Quad 10G/25G Ethernet Controller CX5
 
 
NDU 之后:
slot 1: Dual 40G/100G/200G Ethernet Controller CX6
slot 3: Quad 10G/25G Ethernet Controller CX5

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.