StorageGRID 设备SG5700和SG6000偶尔重新启动、并在网格管理界面中报告关闭情况
适用场景
- NetApp StorageGRID 设备 SG6000
- NetApp StorageGRID 设备 SG5700
问题描述
- 由于节点重新启动, StorageGRID 存储节点偶尔会报告蓝色状态(管理上未知)
CASA
NRLY,
SVST
检测到并间歇性清除多个警报(如、等)(数据存储关闭)- StorageGRID 计算控制器(即 5700 控制器, SG6000 1U 节点)内核日志
/var/local/log/messages
和 / 或dmesg.txt
指示用于设备间通信的一个或多个光纤通道链路发生中止
Aug 7 08:03:46 SG kernel: 4,2160,410123880,-;qla2xxx [0000:14:00.1]-801c:7: Abort command issued nexus=7:0:0 -- 1 2002.
Aug 7 08:03:47 SG kernel: 4,2161,411147898,-;qla2xxx [0000:14:00.1]-801c:7: Abort command issued nexus=7:0:0 -- 1 2002.
Aug 7 08:03:48 SG kernel: 4,2162,412171780,-;qla2xxx [0000:14:00.1]-801c:7: Abort command issued nexus=7:0:0 -- 1 2002.
Aug 7 08:03:43 SG kernel: 4,2136,406224581,-;Call Trace:
Aug 7 08:03:43 SG kernel: 4,2137,406227012,-; [<ffffffffb9e18609>] ? __schedule+0x239/0x6f0
Aug 7 08:03:43 SG kernel: 4,2138,406232464,-; [<ffffffffb9e18af2>] ? schedule+0x32/0x80
Aug 7 08:03:43 SG kernel: 4,2139,406237574,-; [<ffffffffb9e1be17>] ? schedule_timeout+0x167/0x380
Aug 7 08:03:43 SG kernel: 4,2140,406243546,-; [<ffffffffb98e9220>] ? del_timer_sync+0x50/0x50
Aug 7 08:03:43 SG kernel: 4,2141,406249172,-; [<ffffffffb98ea02a>] ? msleep+0x2a/0x40
Aug 7 08:03:43 SG kernel: 4,2142,406254118,-; [<ffffffffc0a88481>] ? qla2x00_eh_wait_on_command+0x41/0x90 [qla2xxx]
Aug 7 08:03:43 SG kernel: 4,2143,406261651,-; [<ffffffffc0a887ab>] ? qla2xxx_eh_abort+0x2db/0x310 [qla2xxx]
Aug 7 08:03:43 SG kernel: 4,2144,406268491,-; [<ffffffffc02ba4c2>] ? scmd_eh_abort_handler+0x72/0x270 [scsi_mod]
Aug 7 08:03:43 SG kernel: 4,2145,406275762,-; [<ffffffffb989486a>] ? process_one_work+0x18a/0x430
Aug 7 08:03:43 SG kernel: 4,2146,406281734,-; [<ffffffffb9894b5d>] ? worker_thread+0x4d/0x490
Aug 7 08:03:43 SG kernel: 4,2147,406287362,-; [<ffffffffb9894b10>] ? process_one_work+0x430/0x430
Aug 7 08:03:43 SG kernel: 4,2148,406293338,-; [<ffffffffb989abc9>] ? kthread+0xd9/0xf0
Aug 7 08:03:43 SG kernel: 4,2149,406298361,-; [<ffffffffb9e1d4f1>] ? __switch_to_asm+0x41/0x70
Aug 7 08:03:43 SG kernel: 4,2150,406304074,-; [<ffffffffb989aaf0>] ? kthread_park+0x60/0x60
Aug 7 08:03:43 SG kernel: 4,2151,406309530,-; [<ffffffffb9e1d577>] ? ret_from_fork+0x57/0x70
/var/log/syslog
StorageGRID 计算控制器中的 base-os-logs 也会显示 I/O 错误
Sep 2 20:17:14 localhost kernel: [51563.447905] print_req_error: I/O error, dev sdab, sector 13323229512
Sep 2 20:17:14 localhost kernel: [51563.454244] device-mapper: multipath: Failing path 65:176.
- E 系列存储控制器(例如 E2800 控制器)
major event log
指示存在光纤通道链路错误(Event Type: 1206
)
Event type: 1206
Event category: Error
Description: Fibre channel link errors continue
注意:并非同时满足上述所有条件