由于FCVI错误、延迟较长
- Views:
- 4
- Visibility:
- Public
- Votes:
- 0
- Category:
- metrocluster
- Specialty:
- metrocluster<a>2009年786804</a>
- Last Updated:
适用场景
- ONTAP 9
- MCC—FC
问题描述
- 两个站点上的节点的问题描述延迟都较高。
- 节点SITEB-N节点A的EMS报告PCIe错误。
Sat Oct 07 00:30:15 +0800 [SITEB-NODE-A: HSWL error: pcie.stealth.errors:debug]: params: {'pcie_errors': 'LVMR,1_0.PLX PCIE 8780 switch on Controller, PLX PCIE 8780 switch on Controller, PLX PCIE 8764 switch in slot 6 on Controller. IIO0:RPT(0,3,0): Br[8780](56,16,0): RcvErr(P17(255)), Br[8780](56,17,0): BadTLP(262804), BadDLLP(860367); Br[8780](56,17,0): DevStatus(Corr), CorrErr(Rcvr,BTLP,BDLLP,RNRov,RpTim); Br[8764](94,0,0) in slot 6: DevStatus(Corr), CorrErr(BTLP,RNRov,RpTim), BadTLP(1). '}
Sat Oct 07 00:32:15 +0800 [SITEB-NODE-A: HSWL error: pcie.stealth.errors:debug]: params: {'pcie_errors': 'LVMR,1_0.PLX PCIE 8780 switch on Controller, PLX PCIE 8780 switch on Controller, PLX PCIE 8764 switch in slot 6 on Controller. IIO0:RPT(0,3,0): Br[8780](56,16,0): RcvErr(P17(255)), Br[8780](56,17,0): BadTLP(295570), BadDLLP(926266); Br[8780](56,17,0): DevStatus(Corr), CorrErr(Rcvr,BTLP,BDLLP,RNRov,RpTim); Br[8764](94,0,0) in slot 6: DevStatus(Corr), CorrErr(RNRov,RpTim). '}
- 节点SITEB-NDEP-A的EMS报告FVI断开连接错误。
Sat Oct 07 10:43:37 +0800 [SITEB-NODE-A: ispfcvi2500_main1: fcvi.qlgc.received.disconnect:debug]: FC-VI adapter: Disconnect request received on port 5a. QP name = WAFL, QP index = 9, Remote node's system id = 537415743.
Sat Oct 07 10:43:37 +0800 [SITEB-NODE-A: ispfcvi2500_main3: fcvi.qlgc.received.disconnect:debug]: FC-VI adapter: Disconnect request received on port 5c. QP name = WAFL, QP index = 3, Remote node's system id = 537415743.
Sat Oct 07 10:43:37 +0800 [SITEB-NODE-A: fcvi_cm: ic.rdma.qpDisconnected:debug]: WAFL is disconnected.
Sat Oct 07 10:43:37 +0800 [SITEB-NODE-A: fcvi_cm: ic.rdma.qpConnected:debug]: WAFL is connected.
Sat Oct 07 10:43:52 +0800 [SITEB-NODE-A: ispfcvi2500_main1: fcvi.qlgc.ioErr:debug]: FC-VI adapter: FCVI driver on port 5a received IO error. Status = FW detected response error(status code = 0x121), FCVI opcode = Write Request(0x1), QP name = WAFL, QP index = 9, Remote node's system id = 537415743.
- MCC中节点的EMS报告QP错误。
07 Oct 2023 10:43:37 [SITEA-NODE-A: error] ispfcvi2500_main1 fcvi qlgc ioErr: port="5a" status_str="Invalid VI state" status="0x10c" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="4" systemID="537416369"
07 Oct 2023 10:43:37 [SITEA-NODE-A: error] ispfcvi2500_main3 fcvi qlgc ioErr: port="5c" status_str="Invalid VI state" status="0x10c" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="4" systemID="537416369"
07 Oct 2023 10:44:18 [SITEA-NODE-A: error] wafl_exempt03 fcvi qlgc ioErr: port="5b" status_str="Request timed out" status="0x104" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="3" systemID="537416369"
07 Oct 2023 10:43:37 [SITEA-NODE-B: error] ispfcvi2500_main3 fcvi qlgc ioErr: port="5c" status_str="Invalid VI state" status="0x10c" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="4" systemID="537416379"
07 Oct 2023 10:43:51 [SITEA-NODE-B: error] ispfcvi2500_main1 fcvi qlgc qpErr: port="5a" qpname="WAFL" qpnum="0x3" state_str="Error" state="0x3" suberror="Mismatch in data relative offset" code="0x13" system_id="537416379" errcnt="2183" info=""
07 Oct 2023 10:45:28 [SITEA-NODE-B: error] ispfcvi2500_main3 fcvi qlgc qpErr: port="5c" qpname="WAFL" qpnum="0x4" state_str="Error" state="0x3" suberror="Mismatch in data relative offset" code="0x13" system_id="537416379" errcnt="5" info=""
07 Oct 2023 10:43:52 [SITEB-NODE-A: error] ispfcvi2500_main1 fcvi qlgc ioErr: port="5a" status_str="FW detected response error" status="0x121" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="9" systemID="537415743"
07 Oct 2023 10:44:00 [SITEB-NODE-A: error] ispfcvi2500_main4 fcvi qlgc qpErr: port="5d" qpname="MISC" qpnum="0x4" state_str="Error" state="0x3" suberror="Transport error on transmit path" code="0x5" system_id="537415743" errcnt="202" info=""
07 Oct 2023 10:44:00 [SITEB-NODE-A: error] wafl_exempt04 fcvi qlgc ioErr: port="5d" status_str="Request timed out" status="0x104" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="2" systemID="537415743"
07 Oct 2023 10:44:08 [SITEB-NODE-A: error] ispfcvi2500_main2 fcvi qlgc qpErr: port="5b" qpname="RAID" qpnum="0x3" state_str="Error" state="0x3" suberror="Timeout occured on the QP exchange" code="0xe" system_id="537415743" errcnt="199" info=""
07 Oct 2023 10:44:08 [SITEB-NODE-A: error] wafl_exempt06 fcvi qlgc ioErr: port="5b" status_str="Request timed out" status="0x104" opcode_str="Write Request" opcode="0x1" qpName="WAFL" qpIndex="2" systemID="537415743"
- MCC中节点的EMS也会报告NVMM_MROR错误。
07 Oct 2023 10:44:00 [SITEA-NODE-A: debug] fcvi_cm nvmm mirror aborting: partner_sysid="2" partner_type="DR PARTNER" mirror_state="NVMM_MIRROR_SYNCING_OTHER" error="NVMM_ERR_LINK_DOWN"
07 Oct 2023 10:44:00 [SITEA-NODE-A: debug] nvmm_error nvmm mirror aborting: partner_sysid="2" partner_type="DR PARTNER" mirror_state="NVMM_MIRROR_LAYOUT_SYNCING" error="NVMM_ABORT_SYNCING_MIRROR"
07 Oct 2023 10:44:00 [SITEA-NODE-A: debug] nvmm_error nvmm mirror aborting: partner_sysid="2" partner_type="DR PARTNER" mirror_state="NVMM_MIRROR_OFFLINE" error="NVMM_ABORT_SYNCING_MIRROR"
07 Oct 2023 10:44:06 [SITEA-NODE-A: notice] ispfcvi2500_main3 fcvi qlgc received disconnect: port="5c" qpname="RAID" qpnum="5" system_id="537416369" info=""
07 Oct 2023 10:44:06 [SITEA-NODE-A: debug] fcvi_cm rdma rlib connected: qp_name="RAID" port="3" client_addr="23.0.1.7" server_addr="23.0.1.5"
07 Oct 2023 10:44:06 [SITEA-NODE-A: debug] nvmm_mirror_sync nvmm mirror state change: partner_sysid="2" partner_type="DR PARTNER" prev_mirror_state="NVMM_MIRROR_LAYOUT_SYNCING" new_mirror_state="NVMM_MIRROR_LAYOUT_SYNCED" state_time="27"
07 Oct 2023 10:44:06 [SITEA-NODE-A: notice] ispfcvi2500_main3 fcvi qlgc received disconnect: port="5c" qpname="WAFL" qpnum="4" system_id="537416369" info=""
07 Oct 2023 10:44:06 [SITEA-NODE-A: debug] fcvi_cm nvmm mirror aborting: partner_sysid="2" partner_type="DR PARTNER" mirror_state="NVMM_MIRROR_CP2_FINISH" error="NVMM_ERR_LINK_DOWN"
- 正确使用了多个网络。
- 仅启用一个FCVI端口而禁用其他FCVI端口、EMS仍会报告相同错误、并且高延迟问题描述未修复。
- 禁用所有FCVI端口、错误和高延迟已修复。