由于 NIC 问题,StorageGRID 节点意外重新启动
适用于
StorageGRID 11.9.x
问题描述
StorageGRID 节点意外重新启动并发出警报"NODE_DOWN-MAJOR"
2025-10-14T13:42:42.846919+00:00 localhost kernel: [8465559.394294] mlx5_core 0000:1c:00.0: mlx5_query_mcia:400:(pid 1451375): query_mcia_reg failed: status: 0x3 2025-10-14T13:42:42.846922+00:00 localhost kernel: [8465559.394302] mlx5_core 0000:1c:00.0 hic3: mlx5e_get_module_eeprom_by_page: mlx5_query_module_eeprom_by_page failed:0xfffffffb 2025-10-14T13:42:42.846924+00:00 localhost kernel: [8465559.394448] mlx5_core 0000:1c:00.0: mlx5_query_mcia:400:(pid 1451375): query_mcia_reg failed: status: 0x3 2025-10-14T13:42:42.846924+00:00 localhost kernel: [8465559.394451] mlx5_core 0000:1c:00.0 hic3: mlx5e_get_module_eeprom_by_page: mlx5_query_module_eeprom_by_page failed:0xfffffffb 2025-10-14T13:43:14.070943+00:00 localhost kernel: [8465590.620036] XFS (dm-0): Unmounting Filesystem 2025-10-14T13:43:15.286913+00:00 localhost kernel: [8465591.832339] XFS (dm-1): Unmounting Filesystem 2025-10-14T13:43:16.406924+00:00 localhost kernel: [8465592.954691] XFS (dm-2): Unmounting Filesystem 2025-10-14T13:43:17.350912+00:00 localhost kernel: [8465593.898064] XFS (dm-3): Unmounting Filesystem 2025-10-14T13:43:18.354918+00:00 localhost kernel: [8465594.900657] XFS (dm-4): Unmounting Filesystem 2025-10-14T13:43:19.430910+00:00 localhost kernel: [8465595.977650] XFS (dm-5): Unmounting Filesystem 2025-10-14T13:43:20.386914+00:00 localhost kernel: [8465596.934014] XFS (dm-6): Unmounting Filesystem 2025-10-14T13:43:21.410914+00:00 localhost kernel: [8465597.956470] XFS (dm-7): Unmounting Filesystem 2025-10-14T13:43:23.078915+00:00 localhost kernel: [8465599.624399] XFS (dm-8): Unmounting Filesystem 2025-10-14T13:43:24.342915+00:00 localhost kernel: [8465600.890799] XFS (dm-9): Unmounting Filesystem 2025-10-14T13:43:25.278918+00:00 localhost kernel: [8465601.825029] XFS (dm-10): Unmounting Filesystem 2025-10-14T13:43:26.558919+00:00 localhost kernel: [8465603.105400] XFS (dm-11): Unmounting Filesystem 2025-10-14T13:43:33.618913+00:00 localhost kernel: [8465610.166974] XFS (dm-12): Unmounting Filesystem 2025-10-14T13:43:34.818912+00:00 localhost kernel: [8465611.364500] XFS (dm-13): Unmounting Filesystem 2025-10-14T13:43:35.842911+00:00 localhost kernel: [8465612.389101] XFS (dm-14): Unmounting Filesystem 2025-10-14T13:43:40.214917+00:00 localhost kernel: [8465616.760851] mlx5_core 0000:af:00.1: mlx5_query_mcia:400:(pid 1460936): query_mcia_reg failed: status: 0x3 2025-10-14T13:43:40.214924+00:00 localhost kernel: [8465616.760860] mlx5_core 0000:af:00.1 hic1: mlx5e_get_module_eeprom_by_page: mlx5_query_module_eeprom_by_page failed:0xfffffffb 2025-10-14T13:43:40.214927+00:00 localhost kernel: [8465616.761016] mlx5_core 0000:af:00.1: mlx5_query_mcia:400:(pid 1460936): query_mcia_reg failed: status: 0x3 2025-10-14T13:43:40.214928+00:00 localhost kernel: [8465616.761019] mlx5_core 0000:af:00.1 hic1: mlx5e_get_module_eeprom_by_page: mlx5_query_module_eeprom_by_page failed:0xfffffffb 2025-10-14T13:43:40.218910+00:00 localhost kernel: [8465616.766324] mlx5_core 0000:1c:00.0: mlx5_query_mcia:400:(pid 1460942): query_mcia_reg failed: status: 0x3 2025-10-14T13:43:40.218913+00:00 localhost kernel: [8465616.766331] mlx5_core 0000:1c:00.0 hic3: mlx5e_get_module_eeprom_by_page: mlx5_query_module_eeprom_by_page failed:0xfffffffb 2025-10-14T13:43:40.218914+00:00 localhost kernel: [8465616.766486] mlx5_core 0000:1c:00.0: mlx5_query_mcia:400:(pid 1460942): query_mcia_reg failed: status: 0x3 2025-10-14T13:43:40.218915+00:00 localhost kernel: [8465616.766489] mlx5_core 0000:1c:00.0 hic3: mlx5e_get_module_eeprom_by_page: mlx5_query_module_eeprom_by_page failed:0xfffffffb 2025-10-14T13:43:49.858922+00:00 localhost kernel: [8465626.406945] XFS (dm-15): Unmounting Filesystem 2025-10-14T13:43:54.054922+00:00 localhost kernel: [8465630.595138] IPMI Watchdog: Unexpected close, not stopping watchdog! 2025-10-14T13:43:54.178911+00:00 localhost kernel: [8465630.723920] sda: sda1 sda2 sda3 sda4 sda126 sda127 2025-10-14T14:10:09.468356+00:00 SG kernel: [ 0.000000] Linux version 6.1.0-25-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3+ntap0 (2024-09-12) 2025-10-14T14:10:09.468409+00:00 SG kernel: [ 0.000000] Command line: netapp_sga_set_console_by_cpu=yes console=ttyS0,115200n8 root=UUID=b49389ba-a921-402e-8031-ffef4ee5fe2e ro intel_idle.max_cstate=2 intel_iommu=off consoleblank=0 elevator=noop panic=5 net.ifnames=0 biosdevname=0 nopti apparmor=0 fsck.repair=yes crashkernel=1500M,high log_buf_len=16M memmap=64K$4K 2025-10-14T14:10:09.468413+00:00 SG kernel: [ 0.000000] BIOS-provided physical RAM map: