跳转到主内容

由于FCVI错误、延迟较长

Views:
4
Visibility:
Public
Votes:
0
Category:
metrocluster
Specialty:
metrocluster<a>2009年786804</a>
Last Updated:

适用场景

  • ONTAP 9
  • MCC—FC

问题描述

  • 两个站点上的节点的问题描述延迟都较高。
  • 节点SITEB-N节点A的EMS报告PCIe错误。

Sat Oct 07 00:30:15 +0800 [SITEB-NODE-A: HSWL error: pcie.stealth.errors:debug]: params: {'pcie_errors': 'LVMR,1_0.PLX PCIE 8780 switch on Controller, PLX PCIE 8780 switch on Controller, PLX PCIE 8764 switch in slot 6 on Controller. IIO0:RPT(0,3,0): Br[8780](56,16,0): RcvErr(P17(255)), Br[8780](56,17,0): BadTLP(262804), BadDLLP(860367); Br[8780](56,17,0): DevStatus(Corr), CorrErr(Rcvr,BTLP,BDLLP,RNRov,RpTim); Br[8764](94,0,0) in slot 6: DevStatus(Corr), CorrErr(BTLP,RNRov,RpTim), BadTLP(1). '}
Sat Oct 07 00:32:15 +0800 [SITEB-NODE-A: HSWL error: pcie.stealth.errors:debug]: params: {'pcie_errors': 'LVMR,1_0.PLX PCIE 8780 switch on Controller, PLX PCIE 8780 switch on Controller, PLX PCIE 8764 switch in slot 6 on Controller. IIO0:RPT(0,3,0): Br[8780](56,16,0): RcvErr(P17(255)), Br[8780](56,17,0): BadTLP(295570), BadDLLP(926266); Br[8780](56,17,0): DevStatus(Corr), CorrErr(Rcvr,BTLP,BDLLP,RNRov,RpTim); Br[8764](94,0,0) in slot 6: DevStatus(Corr), CorrErr(RNRov,RpTim). '}

  • 节点SITEB-NDEP-A的EMS报告FVI断开连接错误。

Sat Oct 07 10:43:37 +0800 [SITEB-NODE-A: ispfcvi2500_main1: fcvi.qlgc.received.disconnect:debug]: FC-VI adapter: Disconnect request received on port 5a. QP name = WAFL, QP index = 9, Remote node's system id = 537415743.
Sat Oct 07 10:43:37 +0800 [SITEB-NODE-A: ispfcvi2500_main3: fcvi.qlgc.received.disconnect:debug]: FC-VI adapter: Disconnect request received on port 5c. QP name = WAFL, QP index = 3, Remote node's system id = 537415743.
Sat Oct 07 10:43:37 +0800 [SITEB-NODE-A: fcvi_cm: ic.rdma.qpDisconnected:debug]: WAFL is disconnected.
Sat Oct 07 10:43:37 +0800 [SITEB-NODE-A: fcvi_cm: ic.rdma.qpConnected:debug]: WAFL is connected.
Sat Oct 07 10:43:52 +0800 [SITEB-NODE-A: ispfcvi2500_main1: fcvi.qlgc.ioErr:debug]: FC-VI adapter: FCVI driver on port 5a received IO error. Status = FW detected response error(status code = 0x121), FCVI opcode = Write Request(0x1), QP name = WAFL, QP index = 9, Remote node's system id = 537415743.

  • MCC中节点的EMS报告QP错误。
07 Oct 2023 10:43:37 [SITEA-NODE-A: error] ispfcvi2500_main1 fcvi qlgc ioErr: port="5a" status_str="Invalid VI state" status="0x10c" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="4" systemID="537416369"
07 Oct 2023 10:43:37 [SITEA-NODE-A: error] ispfcvi2500_main3 fcvi qlgc ioErr: port="5c" status_str="Invalid VI state" status="0x10c" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="4" systemID="537416369"
07 Oct 2023 10:44:18 [SITEA-NODE-A: error] wafl_exempt03 fcvi qlgc ioErr: port="5b" status_str="Request timed out" status="0x104" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="3" systemID="537416369"
07 Oct 2023 10:43:37 [SITEA-NODE-B: error] ispfcvi2500_main3 fcvi qlgc ioErr: port="5c" status_str="Invalid VI state" status="0x10c" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="4" systemID="537416379"
07 Oct 2023 10:43:51 [SITEA-NODE-B: error] ispfcvi2500_main1 fcvi qlgc qpErr: port="5a" qpname="WAFL" qpnum="0x3" state_str="Error" state="0x3" suberror="Mismatch in data relative offset" code="0x13" system_id="537416379" errcnt="2183" info=""
07 Oct 2023 10:45:28 [SITEA-NODE-B: error] ispfcvi2500_main3 fcvi qlgc qpErr: port="5c" qpname="WAFL" qpnum="0x4" state_str="Error" state="0x3" suberror="Mismatch in data relative offset" code="0x13" system_id="537416379" errcnt="5" info=""
07 Oct 2023 10:43:52 [SITEB-NODE-A: error] ispfcvi2500_main1 fcvi qlgc ioErr: port="5a" status_str="FW detected response error" status="0x121" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="9" systemID="537415743"
07 Oct 2023 10:44:00 [SITEB-NODE-A: error] ispfcvi2500_main4 fcvi qlgc qpErr: port="5d" qpname="MISC" qpnum="0x4" state_str="Error" state="0x3" suberror="Transport error on transmit path" code="0x5" system_id="537415743" errcnt="202" info=""
07 Oct 2023 10:44:00 [SITEB-NODE-A: error] wafl_exempt04 fcvi qlgc ioErr: port="5d" status_str="Request timed out" status="0x104" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="2" systemID="537415743"
07 Oct 2023 10:44:08 [SITEB-NODE-A: error] ispfcvi2500_main2 fcvi qlgc qpErr: port="5b" qpname="RAID" qpnum="0x3" state_str="Error" state="0x3" suberror="Timeout occured on the QP exchange" code="0xe" system_id="537415743" errcnt="199" info=""
07 Oct 2023 10:44:08 [SITEB-NODE-A: error] wafl_exempt06 fcvi qlgc ioErr: port="5b" status_str="Request timed out" status="0x104" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="2" systemID="537415743"
  • MCC中节点的EMS也会报告NVMM_MROR错误。

07 Oct 2023 10:44:00 [SITEA-NODE-A: debug] fcvi_cm nvmm mirror aborting: partner_sysid="2" partner_type="DR PARTNER" mirror_state="NVMM_MIRROR_SYNCING_OTHER" error="NVMM_ERR_LINK_DOWN"
07 Oct 2023 10:44:00 [SITEA-NODE-A: debug] nvmm_error nvmm mirror aborting: partner_sysid="2" partner_type="DR PARTNER" mirror_state="NVMM_MIRROR_LAYOUT_SYNCING" error="NVMM_ABORT_SYNCING_MIRROR"
07 Oct 2023 10:44:00 [SITEA-NODE-A: debug] nvmm_error nvmm mirror aborting: partner_sysid="2" partner_type="DR PARTNER" mirror_state="NVMM_MIRROR_OFFLINE" error="NVMM_ABORT_SYNCING_MIRROR"
07 Oct 2023 10:44:06 [SITEA-NODE-A: notice] ispfcvi2500_main3 fcvi qlgc received disconnect: port="5c" qpname="RAID" qpnum="5" system_id="537416369" info=""
07 Oct 2023 10:44:06 [SITEA-NODE-A: debug] fcvi_cm rdma rlib connected: qp_name="RAID" port="3" client_addr="23.0.1.7" server_addr="23.0.1.5"
07 Oct 2023 10:44:06 [SITEA-NODE-A: debug] nvmm_mirror_sync nvmm mirror state change: partner_sysid="2" partner_type="DR PARTNER" prev_mirror_state="NVMM_MIRROR_LAYOUT_SYNCING" new_mirror_state="NVMM_MIRROR_LAYOUT_SYNCED" state_time="27"
07 Oct 2023 10:44:06 [SITEA-NODE-A: notice] ispfcvi2500_main3 fcvi qlgc received disconnect: port="5c" qpname="WAFL" qpnum="4" system_id="537416369" info=""
07 Oct 2023 10:44:06 [SITEA-NODE-A: debug] fcvi_cm nvmm mirror aborting: partner_sysid="2" partner_type="DR PARTNER" mirror_state="NVMM_MIRROR_CP2_FINISH" error="NVMM_ERR_LINK_DOWN"

  • 正确使用了多个网络。
  • 仅启用一个FCVI端口而禁用其他FCVI端口、EMS仍会报告相同错误、并且高延迟问题描述未修复。
  • 禁用所有FCVI端口、错误和高延迟已修复。

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.