升级到 9.8 后，双托架磁盘卡在无限 SDC 环路中

最后更新
另存为PDF

Views:: 16

Visibility:: Public

Votes:: 0

Category:: fas-systems

Specialty:: HW

Last Updated:

适用场景

DS4486 存储架
ONTAP 9.8

问题描述

升级到 9.8 后， DS4486 磁盘架上的多个磁盘报告 shm_setup_for_failure without any 特定发生原因：

Wed Feb 10 09:40:25 -0800 [nodeb: api_dpool_17: scsi.debug:debug]: shm_setup_for_failure disk 3a.20.7L1 (S/N ZC1xxxxx) error 40000000h
 Wed Feb 10 09:40:25 -0800 [nodeb: api_dpool_18: scsi.debug:debug]: shm_setup_for_failure disk 3a.20.17L1 (S/N K7Hxxxxx) error 40000000h
 Wed Feb 10 09:40:26 -0800 [nodeb: api_dpool_20: scsi.debug:debug]: shm_setup_for_failure disk 3a.21.20L2 (S/N ZC1xxxxx) error 40000000h

之后，当托架中的一个磁盘被清空 / 发生故障时，同一托架中的另一个磁盘将经历一个状态不正常的磁盘复制环路，该环路永远不会超过 0% ，最终会自行取消：

RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
 --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
 dparity   0d.23.2L2       0d    23  2   SA:A   0 MSATA  7200 3748319/7676558720 3815447/7814037168
 parity    3b.13.22L1      3b    13  22  SA:A   0 MSATA  7200 3748319/7676558720 3815447/7814037168
 data      3a.20.19L1      3a    20  19  SA:B   0 MSATA  7200 3748319/7676558720 3815447/7814037168
 data     3a.21.20L1      3a    21  20  SA:B   0 MSATA  7200 3748319/7676558720 3815447/7814037168 (evacuating, copy in progress)
 -> copy   3a.21.16L2      3a    21  16  SA:B   0 MSATA  7200 3748319/7676558720 3815447/7814037168 (copy 0% completed)

EMS 日志中会显示以下消息：

 raid_lm: raidlm.carrier.evac.start config_thread: raid.rg.diskcopy.start config_thread: raid.rg.diskcopy.progress raid_lm: raidlm.carrier.evac.abort config_thread: raid.rg.diskcopy.aborted

示例：

[raid.rg.diskcopy.start:notice]: /nodea_aggr_vol0/plex0/rg0: starting disk copy from 3a.21.20L1 (S/N [PCJHxxx]) to 3a.21.16L2 (S/N [ZC1Dxxxx]). Reason: Disk replace was started.. raid.rg.diskcopy.progress:debug]: Disk copy progress from (S/N PCJHxxxx) to (S/N ZC1Dxxxx) is on disk block 0 and is 0%% complete after 0:00:00 (HH:MM:SS). [raid_rg_diskcopy_aborted_1:notice]: params: {'target': '3a.21.16L2', 'duration': '3:13.16', 'source': '3a.21.20L1', 'reason': 'Source disk failed.', 'rg': '/nodea_aggr_vol0/plex0/rg0', 'owner': '', 'aggregate_uuid': 'f0f3c156-b7f6-4344-adce-249752a6fcf4', 'blockNum': '2156224'} [raid.rg.diskcopy.start:notice]: /nodea_aggr_02/plex0/rg1: starting disk copy from 3a.21.20L1 (S/N [K4G5xxxx]) to 3a.21.16L2 (S/N [ZC1Dxxxx]). Reason: Disk replace was started.. [raid.rg.diskcopy.progress:debug]: Disk copy progress from 2.21.20.1 (S/N K4G5xxxx) to 2.21.16.2 (S/N ZC1Dxxxx) is on disk block 0 and is 0%% complete after 0:00:00 (HH:MM:SS).

[raid_rg_diskcopy_aborted_1:notice]: params: {'target': '3a.21.16L2', 'duration': '2:53.00', 'source': '3a.21.20L1', 'reason': 'Source disk failed.', 'rg': '/nodea_aggr_02/plex0/rg1', 'owner': '', 'aggregate_uuid': 'cd3fa773-6ba5-48f6-9872-8c4a7ed5ff6f', 'blockNum': '793024'} [raid.rg.diskcopy.start:notice]: /nodea_aggr_vol0/plex0/rg0: starting disk copy from 3a.21.20L1 (S/N [PCJHxxxx]) to 3a.21.16L2 (S/N [ZC1Dxxxx]). Reason: Disk replace was started.. [raid.rg.diskcopy.progress:debug]: Disk copy progress from 2.21.20.1 (S/N PCJHxxxx) to 2.21.16.2 (S/N ZC1Dxxxx) is on disk block 0 and is 0%% complete after 0:00:00 (HH:MM:SS).