远程磁盘出现间歇性延迟峰值和延迟
适用场景
- ONTAP 9
- AFF -A400
- MetroCluster
- 磁盘X4013S17337T6NTE
问题描述
- 在特定时间观察到高延迟、并且ESXi 数据存储库出现服务中断
- EMS日志会在问题描述运行期间显示以下消息
[XXXXXX-XX:wafl_exempt18: wafl.cp.toolong:error]: Aggregate XXXX_XXXX_aggr1 experienceda long CP
[XXXXXX -XX:disk_latency_monitor: shm.ssd.threshold.ioLatency:notice]: SSD 0v.i1.2L20 hasexceeded the expected block latency in the current timeframe with an averagelatency of 4670 us and an average utilization of 11 percent. The next highestSSD latency: 110 us. Disk 0v.i1.2L20 Shelf 10 Bay 22 [NETAPP X4013S17337T6NTE NA53] S/N [S60RNA0R900618] UID[36305230:52900618:00253841:00000002:00000000:00000000:00000000:00000000:00000000:00000000
- 观察到背靠背CP、大部分时间花在P2冲洗阶段上
- 延迟来自远程丛上的一个磁盘
- 同一磁盘架上型号为
[X4013S17337T6NTE]
的磁盘频繁出现磁盘故障