磁盘状况不正常会影响性能
适用场景
- ONTAP 9
- ONTAP 8.
- FAS
- AFF
- 不适用于已发生故障的单个磁盘(ONTAP 将根据错误和延迟阈值使驱动器发生故障)
- 未出现故障的适用场景磁盘
问题描述
- 观察到高FlexVol延迟。
- 在某些情况下、高延迟可能会导致NFS断开连接。
- 运行
qos statistics volume latency show
命令会在disk
列下显示主要延迟。 - 单个 驱动器的 利用率和延迟明显高于RAID组中的其他驱动器
- 可以使用node shell
statit
命令对此进行验证
- 可以使用node shell
cluster1::> node run -node local -command "priv set -q advanced; statit -e" ... disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs /aggr1/plex0/rg0: 0a.10.10 31 93.15 0.00 .... . 54.89 26.94 590 38.26 38.85 155 0.00 .... . 0.00 .... . 0a.10.1 33 93.98 0.00 .... . 55.75 26.55 630 38.23 38.83 183 0.00 .... . 0.00 .... . 0a.10.2 19 118.78 9.53 3.50 8515 56.77 10.57 291 52.49 9.60 543 0.00 .... . 0.00 .... . 0a.10.3 21 120.65 10.11 3.80 8440 58.10 10.88 362 52.43 9.50 566 0.00 .... . 0.00 .... . 0a.10.4 20 119.76 9.21 3.27 9108 57.79 10.54 314 52.76 9.44 552 0.00 .... . 0.00 .... . 0a.10.5 100 121.62 10.52 3.22 19375 58.78 10.20 7699 52.32 9.79 4831 0.00 .... . 0.00 .... . 0a.10.6 18 119.96 9.57 3.33 8727 57.97 10.73 216 52.42 9.64 541 0.00 .... . 0.00 .... . 0a.10.7 18 119.06 9.01 3.53 8786 57.71 10.57 223 52.34 9.56 535 0.00 .... . 0.00 .... . 0a.10.8 18 121.28 9.75 3.76 8179 59.29 10.89 235 52.24 9.72 544 0.00 .... . 0.00 .... . 0a.10.9 19 121.30 10.90 3.47 8249 58.15 11.07 217 52.26 9.87 526 0.00 .... . 0.00 .... .
- EMS日志可能会在将磁盘标记为故障之前报告多个错误并在磁盘上发生中断。
scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 3b.51.1L2: request successful after retry #1/#0: cdb 0x28:3b468100:0008 (24080). scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 3b.51.1L2: request successful after retry #1/#0: cdb 0x28:3b4681a8:0008 (24081). scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 3b.51.1L2: request successful after retry #1/#0: cdb 0x88:000000020ab11b00:00000008 (24928). scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 3b.51.1L2: request successful after retry #1/#0: cdb 0x88:00000002b7ff0d00:00000038 (24619). config_thread: raid.disk.delete.drl:debug]: aggregate Disk /aggr01_node02/plex0/rg0/3b.51.1L1 Shelf 51 Bay 1 [NETAPP X481_SMKRE06TSDB NA03] S/N [S4D12BT0] UID [5000C500:8CE40C44:00000000:00000000:00000000:00000000:00000000: 00000000:00000000:00000000] Deleting dirty region log DRL_1. wafl.cp.toolong:error]: Aggregate fas_01_DATA_AGGR experienced a long CP...
- EMS还可能报告消息: wafl.cp.toolong:error
wafl_exempt08: wafl.cp.toolong:error]: Aggregate fas_01_DATA_AGGR experienced a long CP.