CONTP-176582：重建期间磁盘故障延迟会导致巨大延迟

最后更新
另存为PDF

Views:: 88

Visibility:: Public

Votes:: 0

Category:: ontap-9

Specialty:: CORE

Last Updated:

问题描述

当正在进行 RAID 重建的磁盘收到过多介质错误时，与未进行 RAID 重建的磁盘相比，ONTAP 处理该磁盘故障所需的时间会更长。这是迄今为止的设计行为，因为 ONTAP 正在尝试避免任何潜在的多磁盘故障情况。
磁盘层的延迟将导致较高的读/写延迟和较长的CP。
这通常伴随着 EMS 中的以下错误：
1.[<node_name>: disk_server_0: shm.threshold.mediumErrors:error]: shm: 磁盘 9a.11.10 在 10 分钟内超过了中等错误阈值。
2.[<node_name>: disk_server_0: scsi.debug:debug]: shm_setup_for_failure 磁盘 9a.11.10 (S/N WBN6BQ3N) 错误 2h
3.[<node_name>:disk_server_0:scsi.debug:debug]:shm_setup_for_failure 磁盘 9a.11.10（序列号 WBN6BQ3N）错误 20h
4.[<node_name>: disk_latency_monitor: shm.threshold.highIOLatency:error]: 磁盘 9a.11.10 超过平均 IO 延迟阈值，将被建议故障处理。
5.[<node_name>: disk_latency_monitor: scsi.debug:debug]: shm_setup_for_failure 磁盘 9a.11.10 (S/N WBN6BQ3N) 错误 200000h
6.[<node_name>:wafl_exempt00:wafl.cp.toolong:error]:聚合<aggr_name>经历了较长的CP。
7.[<node_name>: disk_server_0: scsi.debug:debug]: shm_setup_for_failure 磁盘 9a.11.10 (S/N WBN6BQ3N) 错误 40000000h
注意：以下错误代码为非紧急错误代码，ONTAP 不会立即导致磁盘故障。

0x00000002
0x00000020
0x00200000