聚合遇到长CP并发生"wafl.cp.toolong:error"事件
适用场景
- ONTAP 9
- 使用HDD磁盘的所有FAS系统
问题描述
wafl.cp.toolong
记录的错误消息导致聚合出现延迟问题。
Tue Oct 17 09:34:07 +0000 [node01: wafl_exempt01: wafl.cp.toolong:error]: Aggregate aggr1 experienced a long CP.
Tue Oct 17 09:34:47 +0000 [node01: wafl_exempt11: wafl.cp.toolong:error]: Aggregate aggr1 experienced a long CP.
Tue Oct 17 09:35:37 +0000 [node01: wafl_exempt05: wafl.cp.toolong:error]: Aggregate aggr1 experienced a long CP.
- EMS日志可能会报告与磁盘相关的错误、以解决相关的聚合问题:
Sun Dec 17 17:33:59 +0000 [Cluster01-01: disk_latency_monitor: shm.threshold.ioLatency:debug]:
Disk 1b.53.47 has exceeded the expected IO latency in the current window with average latency of 50 msecs and average utilization of 77 percent. Highest average IO latency: 1b.53.47: 50 msecs; next highest IO latency: 1b.53.9: 10 msecs.
Disk 1b.53.47 Shelf 53 Drawer 4 Slot 11 Bay 47 [NETAPP X375_SCMNE04TA07 NA00] S/N [1234abcd] UID [ABCD1234:EFGH5678:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]
statit
命令 会报告一个或多个磁盘的利用率高于同一RAID组中的其他磁盘。
1b.03.51 88 29.36 27.66 4.44 33792 1.47 61.25 1232 0.23 25.25 849 0.00 .... . 0.00
9b.10.52 25 31.66 30.00 3.93 2994 1.46 61.85 120 0.20 36.43 308 0.00 .... . 0.00
1a.05.55 1 1.85 0.00 .... . 1.52 60.87 61 0.33 38.13 244 0.00 .... . 0.00
9a.02.54 81 29.30 27.74 4.13 29802 1.33 60.53 1069 0.23 22.25 1689 0.00 .... . 0.00
此问题描述 与 错误1444991不同。