由于缆线故障、导致Brocade交换机端口盖板
适用场景
Brocade 交换机上的端口
问题描述
- 交换机端口状态为
online
underswitchshow
/fabos/bin/switchshow :
Index Slot Port Address Media Speed State Proto
============================================================
129 4 33 708100 id N32 Online FC F-Port 10:00:00:10:9b:xx:xx:xx
- 在
porterrshow
下,报告多个
link failure
和loss sync
错误以及其他介质错误,如enc_out, crc err, crc g_eof, c3timeout Tx, pcs
和uncorr
错误。
/fabos/cliexec/porterrshow :
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs uncor
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err err
100: 14.1k 5.8k 2.2k 964 932 0 0 32 3.9g 6 12 0 4.4k 0 0 0 0 0 0
/fabos/cliexec/porterrshow :
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs uncor
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err err
129: 20.6k 20.7k 0 0 0 0 0 0 0 9 2.5k 0 7.2k 0 0 9 0 1.1k 26.0k
/fabos/cliexec/porterrshow:
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs uncor
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err err
37: 2.3g 6.4g 0 0 0 0 0 0 0 53 2 0 4 0 0 53 0 0 0
sfpshow
报告Rx值过低。
RX Power: -15.1 dBm (30.7 uW)
TX Power: -1.7 dBm (678.9 uW)
portshow
输出确认端口为Online
且处于In_Sync
状态、并且Lr_In
大于Ols_out
、这表示问题描述是switchport外部端口。
portshow 37
portDisableReason: None
[..]
portState: 1 Online
Protocol: FC
portPhys: 6 In_Sync portScn: 32 F_Port
FC Fastwrite: OFF
Interrupts: 48 Link_failure: 2 Frjt: 0
Unknown: 6 Loss_of_sync: 0 Fbsy: 0
Lli: 48 Loss_of_sig: 4
Proc_rqrd: 159 Protocol_err: 0
Timed_out: 0 Invalid_word: 0
Tx_unavail: 0 Invalid_crc: 0
Delim_err: 0 Address_err: 0
Lr_in: 6 Ols_in: 2
Lr_out: 2 Ols_out: 3
Cong_Prim_in: 0
在
Fabriclog
下,可以看到这两个端口在闪烁。
Fabriclog:
Switch 0; Tue Oct 11 12:34:12 2022 IST (GMT+5:30)
12:34:12.020011 SCN Port Offline;rsn=0x2,g=0x530 D2,P0 D2,P0 15 NA
12:34:12.020017 *Removing all nodes from port D2,P0 D2,P0 15 NA
12:34:12.112102 SCN Port Offline;rsn=0x0,g=0x532 D2,P0 D2,P0 127 NA
12:34:12.112108 *Removing all nodes from port D2,P0 D2,P0 127 NA
12:36:40.840204 SCN LR_PORT(0);g=0x530 D2,P0 D2,P0 15 NA
12:36:40.860941 SCN Port Online; g=0x530,isolated=0 D2,P0 D2,P1 15 NA
12:36:40.861044 Port Elp engaged D2,P1 D2,P0 15 NA
12:36:40.861057 *Removing all nodes from port D2,P0 D2,P0 15 NA
- 根据
sfpshow
输出、交换机端的SFP光学值处于最佳范围-
RX Power: -1.7 dBm (681.90uW)
TX Power: -1.5 dBm (701.10 uW)
交换机记录了
C4-5040
消息-2024/09/17-20:13:45:225851 (IST), [C4-5040], 2240091/0, SLOT 1 | CHASSIS | PORT 3/11, INFO, SWITCH, Link loss of sync debouncing event detected: Slot 3/Port 11(122)
c4-5040
当由于同步丢失而重置端口链路时、会记录-。将启动反跳计时器以清除同步信号的丢失。如果即使在上述计时器到期后仍未清除、则会发布C4-5040防抖动日志。在
fabriclog
中、我们会看到端口脱机、并且 交换机上出现LR_in
Switch 0; Tue Sep 17 20:13:23 2024 IST (GMT+5:30)
20:13:23.722607 SCN Port Offline;rsn=0x2,g=0x16a D2,P0 D2,P0 11 NA
20:13:23.722613 *Removing all nodes from port D2,P0 D2,P0 11 NA
20:13:24.365532 SCN LR_PORT(0);g=0x16a D2,P0 D2,P0 11 NA
20:13:24.423719 SCN Port Online; g=0x16a,isolated=0 D2,P0 D2,P1 11 NA
20:13:24.423926 Port Elp engaged D2,P1 D2,P0 11 NA
20:13:24.423938 *Removing all nodes from port D2,P0 D2,P0 11 NA
20:13:24.424123 SCN Port F_PORT D2,P1 D2,P0 11 NA
20:13:24.538573 SCN LR_PORT(0);g=0x16a LR_IN D2,P0 D2,P0 11 NA
portstatsshow下显示的错误计数器-
portstatsshow 11
er_bad_os 13
phy_stats_clear_ts 09-13-2024 IST Fri 03:01:23 Timestamp of phy_port stats clear
lgc_stats_clear_ts 09-13-2024 IST Fri 03:01:23 Timestamp of lgc_port stats clear
Lr_in 0 top_int : Number of link resets received
1 bottom_int : Number of link resets received
Link_failure 0 top_int : Number of link failures
1 bottom_int : Number of link failures
Loss_of_sig 0 top_int : Number of instances of signal loss detected
1 bottom_int : Number of instances of signal loss detected
- 在
MAPS policy
中 ,我们可以通过link fail ,loss sig
和LR
报告:
映射端口3/11的警报:
LOSS_SYNC(SyncLoss) -
LF(LFs) 3/11(1)
LOSS_SIGNAL(LOS) 3/11(1)
PE(Errors)
STATE_CHG 3/11(2)
LR(LRs) 3/11(1)
- 在旧的supportsaves中查看
sfpshow
端口时、 可以看到RX
电源 出现故障、但仍具有相当高的价值
Temperature: 51 Centigrade
Current: 7.786 mAmps
Voltage: 3329.70 mVolts
RX Power: -1.7 dBm (681.90uW)
TX Power: -1.5 dBm (701.10 uW)
12-SEP 2025 20h25
Current: 7.788 mAmps
Voltage: 3318.80 mVolts
RX Power: -1.6 dBm (685.00uW)
TX Power: -1.5 dBm (703.20 uW)
12-SEP 00h05
Current: 7.796 mAmps
Voltage: 3312.40 mVolts
RX Power: -1.6 dBm (694.90uW)
TX Power: -1.5 dBm (703.00 uW)
- 在
errdump
日志下触发的规则defALL_32GSWL_SFPRXP_63
和defALL_OTHER_F_PORTSSTATE_CHG_5
以及loss signal
、link failures
和frame timeout detected
错误。
2022/10/11-12:41:00, [MAPS-1004], 46328, SLOT 1 | FID 128, INFO, XXX, SFP 3/15, Condition=ALL_32GSWL_SFP(RXP<=63), Current Value:[RXP, 0 uW], RuleName=defALL_32GSWL_SFPRXP_63, Dashboard Category=Port Health.
2022/10/11-12:41:00, [MAPS-1004], 46329, SLOT 1 | FID 128, INFO, XXX, SFP 12/15, Condition=ALL_32GSWL_SFP(RXP<=63), Current Value:[RXP, 0 uW], RuleName=defALL_32GSWL_SFPRXP_63, Dashboard Category=Port Health.
2022/10/11-12:43:00, [MAPS-1004], 46330, SLOT 1 | FID 128, INFO, XXX, SFP 3/15, Condition=ALL_32GSWL_SFP(RXP<=63), Current Value:[RXP, 0 uW], RuleName=defALL_32GSWL_SFPRXP_63, Dashboard Category=Port Health.
2022/10/11-12:43:00, [MAPS-1004], 46331, SLOT 1 | FID 128, INFO, XXX, SFP 12/15, Condition=ALL_32GSWL_SFP(RXP<=63), Current Value:[RXP, 0 uW], RuleName=defALL_32GSWL_SFPRXP_63, Dashboard Category=Port Health.
2023/02/20-16:42:20 (IST), [MAPS-1003], 9812, SLOT 2 | FID 128 | PORT 4/33, WARNING, switch, slot4 port33, F-Port 4/33, Condition=ALL_HOST_PORTS(STATE_CHG/min>5), Current Value:[STATE_CHG, 6], RuleName=defALL_HOST_PORTSSTATE_CHG_5, Dashboard Category=Port Health, Quiet Time=None.
2023/02/20-16:42:20 (IST), [MAPS-1003], 9813, SLOT 2 | FID 128 | PORT 4/33, WARNING, switch, slot4 port33, F-Port 4/33, Condition=ALL_OTHER_F_PORTS(STATE_CHG/min>5), Current Value:[STATE_CHG, 6], RuleName=defALL_OTHER_F_PORTSSTATE_CHG_5, Dashboard Category=Port Health, Quiet Time=None.
2022/06/26-03:53:42, [MAPS-1003], 42403, SLOT 2 | FID 128, WARNING, switch, slot6 port9, U-Port 6/9, Condition=ALL_PORTS(LOSS_SIGNAL/min>3), Current Value:[LOSS_SIGNAL, 386 LOS], RuleName=defALL_PORTSLOSS_SIGNAL_3, Dashboard Category=Port Health.
2022/06/26-03:54:42, [MAPS-1003], 42404, SLOT 2 | FID 128, WARNING, switch, slot6 port9, U-Port 6/9, Condition=ALL_PORTS(LOSS_SIGNAL/min>3), Current Value:[LOSS_SIGNAL, 378 LOS], RuleName=defALL_PORTSLOSS_SIGNAL_3, Dashboard Health.
2021/10/23-23:38:47, [MAPS-1003], 55690, SLOT 1 | FID 128, WARNING, Fabric1, slot10 port31, U-Port 10/31, Condition=ALL_PORTS(LF/min>3), Current Value:[LF, 4], RuleName=defALL_PORTSLF_3, Dashboard Category=Port Health.
2021/10/23-04:24:46, [PORT-1003], 53615, SLOT 1 | FID 128, WARNING, Fabric1, Port 223 Faulted because of many Link Failures.
2020/04/08-21:35:56, [C3-1014], 2556, CHASSIS, WARNING, Brocade6510, Link Reset on Port S0,P8(12) vc_no=0 crd(s)lost=12 auto trigger.
2020/04/09-09:03:42, [C3-1014], 2557, CHASSIS, WARNING, Brocade6510, Link Reset on Port S0,P8(12) vc_no=0 crd(s)lost=12 auto trigger.
2023/10/15-01:03:18, [AN-1014], 589, FID 128, INFO, SWITCH, Frame timeout detected, tx port 37 rx port 15, sid 650900, did 632501, timestamp 2023-10-15 01:03:18 .
2023/10/15-01:03:18, [LOG-1000], 594, FID 128, INFO, SWITCH, Previous message repeated 5 time(s).
RuleName=defALL_32GSWL_SFPRXP_63
Rx电源开始降级时触发、表示上游问题描述I、e SFP或缆线出现故障在存储 端、
EMS
日志会报告link break
事件:[?] Tue Sep 17 20:13:44 +0530 [NetApp-2: fct_tpd_work_thread_0: scsitarget.slifct.linkBreak:error]: Link break detected on Fibre Channel target HBA 3d with event status 1 , topology type 1, status1 0x0, status2 0x0.
[?] Tue Sep 17 20:13:45 +0530 [NetApp-2: fct_tpd_work_thread_0: scsitarget.hwpfct.linkUp:notice]: Link up on Fibre Channel target adapter 3d.
涵盖问题描述Time
ITW errors
和loss of sync
的性能归档同时进行。
在交换机端、我们也可以看到
loss of sig
错误。这表示交换机和存储之间的链路上发生了一些
loss of signal
、因此、存储启动了链路重置以从这种情况中恢复。这反过来又会触发
ITW errors
、因为在此期间重置链路时、到达该链路的帧将被丢弃。ITW
错误指示电缆有故障。