机箱中多个刀片式服务器上的FCP目标和路径丢失
适用场景
- ONTAP 9
- HPE Synergy
- Brocade Fabric OS 9.1
- VMware ESXi
问题描述
- 不同ESXi主机(HPE Synergy机箱中的刀片式服务器)上的零星和间歇性路径和目标丢失
- ONTAP 确认相关启动程序未登录。受影响的启动程序和LIF可能每隔几分钟更改一次。
::*> fcp ping-igroup show -vserver SVM -igroup * -ext-status wwpn-not-logged_in
(vserver fcp ping-igroup show)
Igroup Logical Node Ping Extended
Vserver Name WWPN Interface Name Status Status
--------- ----------- -------------- ---------- --------- -------- -----------
SVM
SYNERGYESXGRP1 20:00:xx:xx:xx:xx:xx:29 SVM_fc07 NODEA12 reachable wwpn-not-logged_in
SYNERGYESXGRP2 20:00:xx:xx:xx:xx:xx:09 SVM_fc07 NODEA12 reachable wwpn-not-logged_in
SYNERGYESXGRP4 20:00:xx:xx:xx:xx:xx:01 SVM_fc02 NODEA11 reachable wwpn-not-logged_in
SYNERGYESXGRP4 20:00:xx:xx:xx:xx:xx:01 SVM_fc04 NODEA12 reachable wwpn-not-logged_in
SYNERGYESXGRP7 20:00:xx:xx:xx:xx:xx:31 SVM_fc02 NODEA11 reachable wwpn-not-logged_in
SYNERGYESXGRP8 20:00:xx:xx:xx:xx:xx:31 SVM_fc04 NODEA12 reachable wwpn-not-logged_in
- 有时、启动程序会确认为已登录、但相关主机仍会错过目标并具有失效路径
Ext_Status 0x16
在EMS /中出现错误的wqeevent log show
fcp.io.status: STIO Adapter:2a IO WQE failure, Handle 0x5, Type 8, S_ID: 10902, VPI: 259, OX_ID: 24C, Status 0x3 Ext_Status 0x16
- 由于
command termination hung
SRAM转储(可能在多个存储控制器上、但不一定在多个存储控制器上)、FC主机总线目标适配器重置
::> event log show -severity debug -event *fcp.io.status*hung*|*SRAM*
Time Node Severity Event
------------------- ---------------- ------------- ---------------------------
12/21/2022 12:44:48 NODEA12 DEBUG scsitarget.fcp.dump: FCP target SRAM dump generated for adapter 2a, fct_tpd_check_hung_commands: Command termination hung. cmd:0xfffff80917a41c60 (state=0xa, flags=0x2,ctio_sent=2/2, RecvExAddr=0x1aec, OX_ID=0x72, RX_ID=0xffff, SID=0x10902)
12/21/2022 12:44:48 NODEA12 DEBUG fcp.io.status: STIO Adapter:2a, found hung cmd:0xfffff80917a41c60(state=10, flags=0x2, ctio_sent=2/2,RecvExAddr=0x1aec, OX_ID=0x72, RX_ID=0xffff,SID=0x10902, Cmd[28], req_q_free:0)
12/21/2022 11:56:38 NODEA12 DEBUG fcp.io.status: STIO Adapter:1a, found hung cmd:0xfffff8090d1a4010(state=5, flags=0x0, ctio_sent=1/1,RecvExAddr=0x14ef, OX_ID=0x264, RX_ID=0xffff,SID=0x10902, Cmd[2A], req_q_free:0)
12/21/2022 11:55:51 NODEA11 DEBUG scsitarget.fcp.dump: FCP target SRAM dump generated for adapter 2a, fct_tpd_check_hung_commands: Command termination hung. cmd:0xfffff80917b392f8 (state=0xa, flags=0x2,ctio_sent=2/3, RecvExAddr=0x146e, OX_ID=0x178, RX_ID=0xffff, SID=0x10c03)
12/21/2022 11:55:46 NODEA11 DEBUG fcp.io.status: STIO Adapter:2a, found hung cmd:0xfffff80917b392f8(state=7, flags=0x0, ctio_sent=1/2,RecvExAddr=0x146e, OX_ID=0x178, RX_ID=0xffff,SID=0x10c03, Cmd[8A], req_q_free:0)
5 entries were displayed.
- 在Brocade交换机上、
c3timeout
主机端口的TX和存储端口的Rx上增加表示终端设备拥塞
- 运行
statsclear
- 等待15分钟
- 运行
porterrshow
示例:
FID128:admin> porterrshow 8-13
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs uncor
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err err
8: 4.8m 8.4m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9: 1.1k 1.1k 0 0 0 0 0 0 0 717 0 0 0 0 0 717 0 0 0
10: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12: 3.9m 6.8m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13: 976 1.2k 0 0 0 0 0 0 0 840 0 0 0 0 0 840 0 0 0
FID128:admin> porterrshow 20-23
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs uncor
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err err
20: 6.7m 7.6m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21: 3.9m 2.5m 0 0 0 0 0 0 0 518 0 0 0 0 0 0 259 0 0
22: 3.9m 2.5m 0 0 0 0 0 0 0 974 0 0 0 0 0 0 487 0 0
23: 6.4m 3.0m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
- 存储和交换机上的SFP读数运行状况良好(对于Synergy端口、TX/Rx也运行良好)