在ESXi主机端持续收到路径冗余降级警报
适用场景
- ONTAP 9
- FC LUN
- ESXi 主机
- Brocade 交换机上的端口
问题描述
- ESXi主机在VMware端持续收到以下警报。
Alarm alarm.StorageConnectivityAlarm on Host hostabc.xxx.com
because Path redundancy to storage device naa.600a098xxxxxx46c3f515xxxxxxxx degraded. Path vmhba2:C0:xx:xx0 is down. Affected datastores: xxx-NetApp-xyz..
Alarm name alarm.StorageConnectivityAlarm
Description alarm.StorageConnectivityAlarm
Target Host hostabc.xxx.com
Triggered time 04/10/2023 10:40:57 AM
- FC LUN会从NetApp存储映射到这些主机。
- 主机端不会因这些警报而发生中断。
- 在主机端生成警报的频率非常高、大约需要3小时左右。
- 在问题描述时间戳期间、存储端的LUN处于联机状态并已映射、EMS日志中未发现任何错误。
- FC端口均已启动,Rx、Tx值处于 最佳范围。
- Brocade交换机日志分析表明、交换机与NetApp存储之间存在端口的问题描述。
Switchshow
报告端口联机
/fabos/bin/switchshow :
Index Port Address Media Speed State Proto
==================================================
9 9 020900 id N8 Online FC F-Port 1 N Port + 1 NPIV public
Porterrshow
表示端口报告了多个介质错误。
/fabos/cliexec/porterrshow :
frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs uncor
tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err err
9: 3.4g 1.5g 39.4k 30.9m 30.5m 0 11.3k 454.4k 2.1m 79.2k 206 0 2.4k 0 0 0 0 264.1m 0
- 根据
sfpshow
观察结果、Rx值不在最佳范围内、而Tx值在最佳范围内。
Port 9:
=============
Identifier: 3 SFP
Connector: 7 LC
Transceiver: 7004404000000000 4,8,16_Gbps M5 sw Short_dist
[..]
Vendor Name: HP-F BROCADE
Vendor OUI: 00:05:1e
Vendor PN: QK724A
[..]
RX Power: -12.1 dBm (61.0 uW)
TX Power: -3.2 dBm (483.8 uW)
Portstatsshow
表示端口遇到多个问题。
portstatsshow 9
[..]
fec_cor_detected 0 Count of blocks that were corrected by FEC
fec_uncor_detected 0 Count of blocks that were left uncorrected by FEC
er_enc_in 39648 Encoding errors inside of frames
er_crc 30968151 Frames with CRC errors
er_trunc 0 Frames shorter than minimum
er_toolong 11379 Frames longer than maximum
er_bad_eof 454430 Frames with bad end-of-frame
er_enc_out 2172705 Encoding error outside of frames
er_bad_os 3801509 Invalid ordered set
er_pcs_blk 264192520 PCS block errors
er_rx_c3_timeout 0 Class 3 receive frames discarded due to timeout
er_tx_c3_timeout 0 Class 3 transmit frames discarded due to timeout
er_unroutable 20153 Frames that are unroutable
er_unreachable 0 Frames with unreachable destination
er_other_discard 79253 Other discards
er_type1_miss 0 frames with FTB type 1 miss
er_type2_miss 0 frames with FTB type 2 miss
er_type6_miss 0 frames with FTB type 6 miss
er_zone_miss 71 frames with hard zoning miss
er_lun_zone_miss 0 frames with LUN zoning miss
er_crc_good_eof 30502349 Crc error with good eof
er_inv_arb 0 Invalid ARB
er_single_credit_loss 0 Single vcrdy/frame loss on link
er_multi_credit_loss 0 Multiple vcrdy/frame loss on link
other_credit_loss 0 Link timeout/complete credit loss
phy_stats_clear_ts 06-23-2022 UTC Thu 14:06:23 Timestamp of phy_port stats clear
lgc_stats_clear_ts 06-23-2022 UTC Thu 14:06:23 Timestamp of lgc_port stats clear
Fabriclog
表示端口已反复闪联。
Switch 0; Mon Apr 10 09:34:46 2023 GMT (GMT+0:00)
09:34:56.224522 SCN Port Offline;rsn=0x2,g=0xc0 D0,P0 D0,P0 9 NA
09:34:56.224540 *Removing all nodes from port D0,P0 D0,P0 9 NA
09:35:08.666250 SCN LR_PORT(0);g=0xc0 D0,P0 D0,P0 9 NA
09:35:08.671197 SCN Port Online; g=0xc0,isolated=0 D0,P0 D0,P1 9 NA
09:35:08.671407 Port Elp engaged D0,P1 D0,P0 9 NA
[..]
13:00:36.370320 SCN Port Online; g=0xc4,isolated=0 D0,P0 D0,P1 9 NA
13:00:36.370531 Port Elp engaged D0,P1 D0,P0 9 NA
13:00:36.370601 *Removing all nodes from port D0,P0 D0,P0 9 NA
13:00:36.370803 SCN Port F_PORT D0,P1 D0,P0 9 NA
13:11:22.396502 SCN LR_PORT(0);g=0xc4 LR_IN D0,P0 D0,P0 9 NA