升级到 9.8 后,双托架磁盘卡在无限 SDC 环路中
适用场景
- DS4486 存储架
- ONTAP 9.8
问题描述
- 升级到 9.8 后, DS4486 磁盘架上的多个磁盘报告 shm_setup_for_failure without any 特定发生原因:
Wed Feb 10 09:40:25 -0800 [nodeb: api_dpool_17: scsi.debug:debug]: shm_setup_for_failure disk 3a.20.7L1 (S/N ZC1xxxxx) error 40000000h
Wed Feb 10 09:40:25 -0800 [nodeb: api_dpool_18: scsi.debug:debug]: shm_setup_for_failure disk 3a.20.17L1 (S/N K7Hxxxxx) error 40000000h
Wed Feb 10 09:40:26 -0800 [nodeb: api_dpool_20: scsi.debug:debug]: shm_setup_for_failure disk 3a.21.20L2 (S/N ZC1xxxxx) error 40000000h
- 之后,当托架中的一个磁盘被清空 / 发生故障时,同一托架中的另一个磁盘将经历一个状态不正常的磁盘复制环路,该环路永远不会超过 0% ,最终会自行取消:
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0d.23.2L2 0d 23 2 SA:A 0 MSATA 7200 3748319/7676558720 3815447/7814037168
parity 3b.13.22L1 3b 13 22 SA:A 0 MSATA 7200 3748319/7676558720 3815447/7814037168
data 3a.20.19L1 3a 20 19 SA:B 0 MSATA 7200 3748319/7676558720 3815447/7814037168
data 3a.21.20L1 3a 21 20 SA:B 0 MSATA 7200 3748319/7676558720 3815447/7814037168 (evacuating, copy in progress)
-> copy 3a.21.16L2 3a 21 16 SA:B 0 MSATA 7200 3748319/7676558720 3815447/7814037168 (copy 0% completed)
- EMS 日志中会显示以下消息:
raid_lm: raidlm.carrier.evac.start config_thread: raid.rg.diskcopy.start config_thread: raid.rg.diskcopy.progress raid_lm: raidlm.carrier.evac.abort config_thread: raid.rg.diskcopy.aborted
- 示例:
[raid.rg.diskcopy.start:notice]: /nodea_aggr_vol0/plex0/rg0: starting disk copy from 3a.21.20L1 (S/N [PCJHxxx]) to 3a.21.16L2 (S/N [ZC1Dxxxx]). Reason: Disk replace was started..
raid.rg.diskcopy.progress:debug]: Disk copy progress from (S/N PCJHxxxx) to (S/N ZC1Dxxxx) is on disk block 0 and is 0%% complete after 0:00:00 (HH:MM:SS).
[raid_rg_diskcopy_aborted_1:notice]: params: {'target': '3a.21.16L2', 'duration': '3:13.16', 'source': '3a.21.20L1', 'reason': 'Source disk failed.', 'rg': '/nodea_aggr_vol0/plex0/rg0', 'owner': '', 'aggregate_uuid': 'f0f3c156-b7f6-4344-adce-249752a6fcf4', 'blockNum': '2156224'}
[raid.rg.diskcopy.start:notice]: /nodea_aggr_02/plex0/rg1: starting disk copy from 3a.21.20L1 (S/N [K4G5xxxx]) to 3a.21.16L2 (S/N [ZC1Dxxxx]). Reason: Disk replace was started..
[raid.rg.diskcopy.progress:debug]: Disk copy progress from 2.21.20.1 (S/N K4G5xxxx) to 2.21.16.2 (S/N ZC1Dxxxx) is on disk block 0 and is 0%% complete after 0:00:00 (HH:MM:SS).
[raid_rg_diskcopy_aborted_1:notice]: params: {'target': '3a.21.16L2', 'duration': '2:53.00', 'source': '3a.21.20L1', 'reason': 'Source disk failed.', 'rg': '/nodea_aggr_02/plex0/rg1', 'owner': '', 'aggregate_uuid': 'cd3fa773-6ba5-48f6-9872-8c4a7ed5ff6f', 'blockNum': '793024'}
[raid.rg.diskcopy.start:notice]: /nodea_aggr_vol0/plex0/rg0: starting disk copy from 3a.21.20L1 (S/N [PCJHxxxx]) to 3a.21.16L2 (S/N [ZC1Dxxxx]). Reason: Disk replace was started..
[raid.rg.diskcopy.progress:debug]: Disk copy progress from 2.21.20.1 (S/N PCJHxxxx) to 2.21.16.2 (S/N ZC1Dxxxx) is on disk block 0 and is 0%% complete after 0:00:00 (HH:MM:SS).