跳转到主内容

ESXi主机间歇性丢失对iSCSI数据存储库的访问

Views:
93
Visibility:
Public
Votes:
1
Category:
fabric-interconnect-and-management-switches
Specialty:
san
Last Updated:

适用场景

  • NetApp FAS/AF存储
  • VMware ESXi 主机
  • iSCSI
  • 连接ESXi主机和NetApp存储的Cisco IP交换机

问题描述

  • ESXi主机间歇性丢失对iSCSI数据存储库的访问权限。
  • 主机报告类似于以下内容的错误:

Apr  7 06:10:42 va1plxld001 iscsid: iscsid: Kernel reported iSCSI connection 12:0 error (1022 - ISCSI_ERR_NOP_TIMEDOUT: A NOP has timed out) state (3)
Apr  7 06:10:42 va1plxld001 iscsid: iscsid: connection14:0 is operational after recovery (3 attempts)
Apr  7 06:10:44 va1plxld001 kernel: connection11:0: ping timeout of 5 secs expired, recv timeout 5, last rx 6465840173, last ping 6465845184, now 6465850192
Apr  7 06:10:44 va1plxld001 kernel: connection11:0: detected conn error (1022)
Apr  7 06:10:44 va1plxld001 kernel: connection10:0: ping timeout of 5 secs expired, recv timeout 5, last rx 6465840173, last ping 6465845184, now 6465850192
Apr  7 06:10:44 va1plxld001 kernel: connection10:0: detected conn error (1022)
Apr  7 06:10:44 va1plxld001 iscsid: iscsid: Kernel reported iSCSI connection 11:0 error (1022 - ISCSI_ERR_NOP_TIMEDOUT: A NOP has timed out) state (3)
Apr  7 06:10:44 va1plxld001 iscsid: iscsid: Kernel reported iSCSI connection 10:0 error (1022 - ISCSI_ERR_NOP_TIMEDOUT: A NOP has timed out) state (3)
Apr  7 06:10:45 va1plxld001 iscsid: iscsid: connection23:0 is operational after recovery (3 attempts)
Apr  7 06:10:45 va1plxld001 iscsid: iscsid: connection24:0 is operational after recovery (3 attempts)
Apr  7 06:10:45 va1plxld001 iscsid: iscsid: connection13:0 is operational after recovery (4 attempts)
Apr  7 06:10:49 va1plxld001 iscsid: iscsid: connection9:0 is operational after recovery (6 attempts)
Apr  7 06:10:50 va1plxld001 iscsid: iscsid: connection19:0 is operational after recovery (6 attempts)
Apr  7 06:10:50 va1plxld001 iscsid: iscsid: connection20:0 is operational after recovery (6 attempts)

  • 在ESXi日志中观察到以下错误:

2024-01-11T08:59:02.137Z cpu56:2098325)WARNING: iscsi_vmk: iscsivmk_ConnProcessReject:963: vmhba64:CH:0 T:0 CN:0: iSCSI pdu rejected: itt 0x12c6, opcode TMF Request, reason Immediate Command Reject
2024-01-11T08:59:02.137Z cpu56:2098325)WARNING: iscsi_vmk: iscsivmk_ConnProcessReject:965: Sess [ISID: 00023d000001 TARGET: iqn.1992-08.com.netapp:sn.9345f8e16a7e11exxxfe00a098dbab70:vs.3 TPGT: 402 TSIH: 0]
2024-01-11T08:59:02.137Z cpu56:2098325)WARNING: iscsi_vmk: iscsivmk_ConnProcessReject:966: Conn [CID: 0 L: 10.164.60.64:17687 R: 10.164.56.69:3260]
2024-01-11T08:59:02.137Z cpu56:2098325)WARNING: iscsi_vmk: iscsivmk_ConnProcessReject:991: vmhba64:CH:0 T:0 CN:0: Rejected TMF Task not found: itt 0x12c6
2024-01-11T08:59:02.137Z cpu56:2098325)WARNING: iscsi_vmk: iscsivmk_ConnProcessReject:992: Sess [ISID: 00023d000001 TARGET: iqn.1992-08.com.netapp:sn.9345f8e16a7e11exxxfe00a098dbab70:vs.3 TPGT:

  • 在 NetApp LUN上的虚拟机上观察到性能问题 、 其中一些虚拟机托管为Apache、Tomcat等Web服务器
  • 在ONTAP以太网端口上观察到大量CRC

-- interface  e0c  (421 days, 4 hours, 17 minutes, 52 seconds) --RECEIVE
 Total frames:    220g | Frames/second:   6059  | Total bytes:     296t
 Bytes/second:    8149k | Total errors:   38204k | Errors/minute:    63
 Total discards:    1  | Discards/minute:    0  | Multi/broadcast:   618m
 Non-primary u/c:    0  | CRC errors:    38204k | Runt frames:      0
 Long frames:      0  | Length errors:   4116  | Alignment errors:   0
 No buffer:       1  | Pause:         0  | Jumbo:         0
 Noproto:        0  | Bus overruns:     0  | LRO segments:    181g
 LRO bytes:      259t | LRO6 segments:     0  | LRO6 bytes:      0
 Bad UDP cksum:     0  | Bad UDP6 cksum:    0  | Bad TCP cksum:   4800
 Bad TCP6 cksum:    0  | Mcast v6 solicit:   0  | Lagg errors:      0
 Lacp errors:      0  | Lacp PDU errors:    0

  • 除 iSCSI登录事件外、EMS日志中未报告任何错误事件。

Mon Apr 7 10:05:04 EDT [netapp-01: iswt_admin_thread: iscsi.notice:notice]: ISCSI: New session from initiator iqn.1994-05.com.redhat:902acxxxc6f4 at IP addr 10.xx.22x.35
Mon Apr 7 10:05:16 EDT [netapp-01: iswt_admin_thread: iscsi.notice:notice]: ISCSI: New session from initiator iqn.1994-05.com.redhat:902acxxxc6f4 at IP addr 10.xx.22x.36

  • 为了进行故障隔离,如果配置了多协议,我们可以使报告CRC错误的以太网端口脱机,禁用端口CRC后,在配对节点以太网端口上会递增,指示问题描述与上游ONTAP

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.