跳转到主内容

StorageGRID アプライアンスのすべてのHICポートが頻繁に停止します

Views:
8
Visibility:
Public
Votes:
0
Category:
storagegrid
Specialty:
sgrid
Last Updated:

環境

NetApp StorageGRIDアプライアンス

問題

StorageGRIDノードの一部のポートで接続がランダムに失われます。ポートが切断され、再接続時に LACPと同期する場合がある(設定されている場合)

  • warn /var/local/log 影響を受けるノードの下に、 Tx Timeout HICポートののインスタンスが表示されます。

Jan 10 03:12:23 localhost kernel: [1456351.753113] [qede_tx_timeout:991(hic2)]Tx timeout!
Jan 10 03:12:23 localhost kernel: [1456351.753338] [qed_mfw_report:3613(hic2)]Txq[1]: FW cons [host] fce8, SW cons fc97, SW prod fce8 [idx c6] [Jiffies 4658987302]
Jan 10 03:12:23 localhost kernel: [1456351.753588] [qed_mfw_report:3613(hic2)]Txq[1]: SB[0x0002] - IGU: prod 00339d9f cons 00339b03 CAU Tx fce8
Jan 10 03:12:23 localhost kernel: [1456351.753832] [qed_mfw_report:3613(hic2)]Last DB: 0000fce8 [Jiffies 4658985126]

Jan 10 03:11:57 localhost kernel: [1456325.502522] NETDEV WATCHDOG: hic4 (qede): transmit queue 6 timed out
Jan 10 03:11:58 localhost kernel: [1456326.281083] [qede_tx_timeout:991(hic4)]Tx timeout!
Jan 10 03:11:58 localhost kernel: [1456326.337487] bond0: link status down for interface hic4, disabling it in 200 ms
Jan 10 03:11:58 localhost kernel: [1456326.337490] bond0: invalid new link 1 on slave hic4
Jan 10 03:11:58 localhost kernel: [1456326.474543] qede 0000:42:00.3 hic4: speed changed to 0 for port hic4
Jan 10 03:11:58 localhost kernel: [1456326.497102] [qede_generic_hw_err_handler:4012(hic4)]Starting a generic HW error handling (sleep requiring operations) - err_flags 0x80000002, err_flags_override 0x0

  • あとでHICをリカバリします。

Jan 10 03:34:59 localhost kernel: [    9.312373] qede 0000:42:00.1 hic2: renamed from eth0
Jan 10 03:35:08 localhost kernel: [   43.979425] bond0: Enslaving hic2 as a backup interface with a down link
Jan 10 03:35:08 localhost kernel: [   44.104547] [qede_validate_bond:423(hic2)]RDMA bonding - Can't bond PF1 and PF3
Jan 10 03:35:08 localhost kernel: [   44.273897] device hic2 entered promiscuous mode
Jan 10 03:35:10 localhost kernel: [   45.863791] [qede_link_update:3829(hic2)]Link is up
Jan 10 03:35:10 localhost kernel: [   45.901661] bond0: link status up for interface hic2, enabling it in 0 ms
Jan 10 03:35:10 localhost kernel: [   45.908646] bond0: link status definitely up for interface hic2, 10000 Mbps full duplex

Jan 10 03:34:59 localhost kernel: [    9.398066] qede 0000:42:00.3 hic4: renamed from eth3
Jan 10 03:35:08 localhost kernel: [   44.112259] bond0: Enslaving hic4 as a backup interface with a down link
Jan 10 03:35:08 localhost kernel: [   44.280087] device hic4 entered promiscuous mode
Jan 10 03:35:10 localhost kernel: [   46.077201] [qede_link_update:3829(hic4)]Link is up
Jan 10 03:35:10 localhost kernel: [   46.137659] bond0: link status up for interface hic4, enabling it in 200 ms
Jan 10 03:35:10 localhost kernel: [   46.144587] bond0: invalid new link 3 on slave hic4
Jan 10 03:35:10 localhost kernel: [   46.353923] bond0: link status definitely up for interface hic4, 10000 Mbps full duplex

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.