由于终端设备上的SFP出现故障、C3 Tx丢弃和帧超时
适用场景
- Brocade Fabric OS
- SAN
- ONTAP 9 及更高版本
问题描述
errdump -a
记录frame timeout,
、说明哪个端口接收到帧(Rx)以及无法在何处 (TX)传输帧、并记录丢失信号和超额预订警报。
2022/02/09-22:31:04, [AN-1014], 2266, SLOT 2 | FID 128, INFO, switch, Frame timeout detected, tx port 9/7 rx port 9/27, sid c8f907, did 678740, timestamp 2022-02-09 22:31:04 .
2023/12/23-10:47:16 (IST), [MAPS-1003], 274485, SLOT 2 | FID 128 | PORT 11/8, WARNING, switch, IB_INFINI_3185_W_N3P6, F-Port 11/8, Condition=ALL_PORTS(PORT_BANDWIDTH/NONE==OVERSUBSCRIBED), Current Value:[PORT_BANDWIDTH, OVERSUBSCRIBED, (TXQL=914 us, TX=72.8%) ], RuleName=defALL_PORTS_OVERSUBSCRIBED, Dashboard Category=Fabric Performance Impact, Quiet Time=15 min.
2023/12/18-01:35:43 (IST), [MAPS-1003], 270764, SLOT 1 | FID 128 | PORT 10/21, WARNING, switch, U-Port 10/21, Condition=ALL_PORTS(LOSS_SIGNAL/min>5), Current Value:[LOSS_SIGNAL, 8 LOS], RuleName=defALL_PORTSLOSS_SIGNAL_5, Dashboard Category=Port Health, Quiet Time=None.
frame timeouts
指示这些Tx端口无法将帧转发到指定的Rx端口、因此超时。IO_FRAME_LOSS
指示IO_PERF_IMPACT
交换机端errdump
中记录的帧延迟 事件的事件-
2024/12/02-04:56:04 (IST), [MAPS-1001], 1664654, SLOT 2 | FID 128 | PORT 12/8, CRITICAL, switch, slot12 port8, F-Port 12/8, Condition=ALL_PORTS(DEV_LATENCY_IMPACT/NONE==IO_FRAME_LOSS), Current Value:[DEV_LATENCY_IMPACT, IO_FRAME_LOSS, (174 ms Frame Delay in VC: 2) ], RuleName=ALL_PORTS_IO_FRAME_LOSS_UNQUAR, Dashboard Category=Fabric Performance Impact, Quiet Time=1 day.
2024/12/02-04:56:04 (IST), [MAPS-1001], 1664654, SLOT 2 | FID 128 | PORT 12/8, CRITICAL, switch, slot12 port8, F-Port 12/8, Condition=ALL_PORTS(DEV_LATENCY_IMPACT/NONE==IO_FRAME_LOSS), Current Value:[DEV_LATENCY_IMPACT, IO_FRAME_LOSS, (174 ms Frame Delay in VC: 2) ], RuleName=ALL_PORTS_IO_FRAME_LOSS_UNQUAR, Dashboard Category=Fabric Performance Impact, Quiet Time=1 day.
2024/12/02-04:56:04 (IST), [MAPS-1003], 1664655, SLOT 2 | FID 128 | PORT 12/45, WARNING, switch, slot12 port45, F-Port 12/45, Condition=ALL_PORTS(DEV_LATENCY_IMPACT/NONE==IO_PERF_IMPACT), Current Value:[DEV_LATENCY_IMPACT, IO_PERF_IMPACT, (39.3% of 10 secs in VC: 3-7) ], RuleName=defALL_PORTS_IO_PERF_IMPACT_UNQUAR_1, Dashboard Category=Fabric Performance Impact, Quiet Time=1 day.
- Rx和Tx值在终端设备上为最佳值
Rx - 958.1 (uWatts)
Tx - 958.8 (uWatts)
sfpshow
报告建议范围内的Tx和Rx功率。
=============
Slot 12/Port 45:
=============
RX Power: -2.7 dBm (532.0uW)
TX Power: -1.5 dBm (711.2 uW)
porterrshow
显示link failures
、link resets
、c3 discards
与Tx timeouts
相关的错误计数器增量。
/fabos/cliexec/porterrshow:
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs uncor
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err err
391: 45.3g 3.4g 0 0 0 0 0 0 0 43.1k 5 0 5 0 0 43.1k 0 0 2.5k
c3-timeout tx
:- 由于超时(特定于平台和端口)而在传输端口丢弃的传输类3帧的数量。
- 这表示设备已连接到交换机的问题描述。
sfpshow
表示接收功率低
=============
Port 391:
=============
RX Power: -8.1 dBm (155.8uW)
TX Power: -3.0 dBm (500.6 uW)
portshow
表示远程设备重置 链路的次数超过本地端口发送的脱机原值
portshow 391
[...]
Lr_in: 133 Ols_in: 5
Lr_out: 7 Ols_out: 6
在受影响的交换机端口上检测到多个RDY/帧丢失:
2025/01/24-13:53:15, [C5-1040], 162569, SLOT 2 | CHASSIS, WARNING, Brocade_X7-8, Multi RDY/Frame Loss detected on Slot 12, Port 680(120) m_rdy(0x1)/m_frame(0x0). Link Reset done.
2025/01/24-13:53:16, [C5-1040], 162571, SLOT 2 | CHASSIS, WARNING, Brocade_X7-8, Multi RDY/Frame Loss detected on Slot 12, Port 680(120) m_rdy(0x1)/m_frame(0x0). Link Reset done.
在继续使用解决方案之前、可以验证以下故障排除步骤。
重点分析终端设备工作负载 、对 连接到报告超时和帧丢失的端口的终端设备执行硬件检查。
- 以下设备需要进行验证、如果在其中任何设备上发现任何问题、则需要 在终端设备上更换这些设备:
- SFP
- 布线
- HBA卡
- 配线板链路和 接头故障。