单个SSD导致性能问题描述

最后更新
另存为PDF

Views:: 97

Visibility:: Public

Votes:: 0

Category:: ontap-9

Specialty:: perf<a>用于翻译</a>

Last Updated:

适用场景

AFF、ASA和C系列系统
错误ID 1479263中没有修复的ONTAP版本

问题描述

由于读/写I/O延迟、一个有问题的SSD驱动器可能会在聚合上出现发生原因性能问题。
如果磁盘已分区、则该磁盘可能会同时影响HA控制器配对节点和多个聚合。
单个SSD上的延迟较长(例如磁盘0c.01.5)：

node> statit -e Disk Statistics (per second) ut% is the percent of time the disk was busy. xfers is the number of data-transfer commands issued per second. xfers = ureads + writes + cpreads + greads + gwrites chain is the average number of 4K blocks per command. usecs is the average disk round-trip time per 4K block. disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs /aggr1/plex0/rg0: 0a.00.9 2 275.84 0.00 .... . 95.38 36.71 32 180.46 18.28 41 0.00 .... . 0.00 .... . 0a.00.1 2 276.54 0.50 1.40 120 95.88 36.57 31 180.16 18.30 40 0.00 .... . 0.00 .... . 0a.00.3 1 2659.57 2030.59 3.70 131 266.35 7.80 89 362.63 2.86 210 0.00 .... . 0.00 .... . 3d.00.4 1 2667.07 2047.99 3.79 112 261.65 8.27 56 357.43 2.93 143 0.00 .... . 0.00 .... . 0a.00.5 1 2733.05 2096.08 3.72 108 271.35 8.25 89 365.63 2.95 153 0.00 .... . 0.00 .... . 3d.00.6 1 2506.70 1916.42 3.43 124 243.45 8.19 66 346.83 2.85 146 0.00 .... . 0.00 .... . 0a.00.7 1 2450.61 1897.82 3.47 109 224.46 8.40 84 328.33 2.84 150 0.00 .... . 0.00 .... . 3d.00.8 1 2462.91 1902.72 3.58 117 228.55 8.35 69 331.63 2.89 149 0.00 .... . 0.00 .... . 3d.00.10 1 2500.00 1913.12 3.45 117 238.25 7.96 78 348.63 2.76 152 0.00 .... . 0.00 .... . 3d.00.2 1 2428.81 1839.93 3.54 117 243.75 7.98 88 345.13 2.92 149 0.00 .... . 0.00 .... . 3d.00.0 1 2451.11 1877.52 3.44 120 237.35 8.17 97 336.23 2.89 153 0.00 .... . 0.00 .... . 0c.01.5 95 2352.92 1538.77 6.53 2579 385.19 12.08 2353 428.96 3.56 2176 0.00 .... . 0.00 .... .

EMS日志显示类似于以下内容的磁盘错误：

Tue May 17 08:06:00 +0000 [node1: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device 0c.01.5: Check Condition: CDB 0x8a:000000019b222800:00000120: Sense Data SCSI:aborted command - (0xb - 0x2f 0x14 0x0)(4509). Tue May 17 08:06:00 +0000 [node1: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device 0c.01.5: Check Condition: CDB 0x8a:000000019b222928:00000010: Sense Data SCSI:aborted command - (0xb - 0x2f 0x14 0x0)(4512). Tue May 17 08:06:00 +0000 [node1: scsi_ecmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device 0c.01.5: Check Condition: CDB 0x8a:000000019b222940:000000c0: Sense Data SCSI:aborted command - (0xb - 0x2f 0x14 0x0)(4514).

驱动器稍后不久将恢复：

Tue May 17 08:06:00 +0000 [node1: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 0c.01.5: request successful after retry #0/#1: cdb 0x8a:000000019b222800:00000120 (5017). Tue May 17 08:06:00 +0000 [node1: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 0c.01.5: request successful after retry #0/#1: cdb 0x8a:000000019b222940:000000c0 (5017).

此磁盘可能会进行分区、从而影响多个聚合、如下所示：

Tue May 17 08:06:00 +0000 [node1: wafl_exempt00: wafl.cp.toolong:error]: Aggregate aggr1 experienced a long CP. Tue May 17 08:06:45 +0000 [node1: wafl_exempt00: wafl.cp.toolong:error]: Aggregate aggr2 experienced a long CP.