跳转到主内容

单个SSD导致性能问题描述

Views:
16
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
perf<a>用于翻译</a>
Last Updated:

适用场景

问题描述

  • 由于读/写I/O延迟、一个有问题的SSD驱动器可能会在聚合上出现发生原因性能问题。
  • 如果磁盘已分区、则该磁盘可能会同时影响HA控制器配对节点和多个聚合。
  • 单个SSD上的延迟较长(例如磁盘0c.01.5):
node> statit -e Disk Statistics (per second) ut% is the percent of time the disk was busy. xfers is the number of data-transfer commands issued per second. xfers = ureads + writes + cpreads + greads + gwrites chain is the average number of 4K blocks per command. usecs is the average disk round-trip time per 4K block. disk ut% xfers ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs /aggr1/plex0/rg0: 0a.00.9 2 275.84 0.00 .... . 95.38 36.71 32 180.46 18.28 41 0.00 .... . 0.00 .... . 0a.00.1 2 276.54 0.50 1.40 120 95.88 36.57 31 180.16 18.30 40 0.00 .... . 0.00 .... . 0a.00.3 1 2659.57 2030.59 3.70 131 266.35 7.80 89 362.63 2.86 210 0.00 .... . 0.00 .... . 3d.00.4 1 2667.07 2047.99 3.79 112 261.65 8.27 56 357.43 2.93 143 0.00 .... . 0.00 .... . 0a.00.5 1 2733.05 2096.08 3.72 108 271.35 8.25 89 365.63 2.95 153 0.00 .... . 0.00 .... . 3d.00.6 1 2506.70 1916.42 3.43 124 243.45 8.19 66 346.83 2.85 146 0.00 .... . 0.00 .... . 0a.00.7 1 2450.61 1897.82 3.47 109 224.46 8.40 84 328.33 2.84 150 0.00 .... . 0.00 .... . 3d.00.8 1 2462.91 1902.72 3.58 117 228.55 8.35 69 331.63 2.89 149 0.00 .... . 0.00 .... . 3d.00.10 1 2500.00 1913.12 3.45 117 238.25 7.96 78 348.63 2.76 152 0.00 .... . 0.00 .... . 3d.00.2 1 2428.81 1839.93 3.54 117 243.75 7.98 88 345.13 2.92 149 0.00 .... . 0.00 .... . 3d.00.0 1 2451.11 1877.52 3.44 120 237.35 8.17 97 336.23 2.89 153 0.00 .... . 0.00 .... . 0c.01.5 95 2352.92 1538.77 6.53 2579 385.19 12.08 2353 428.96 3.56 2176 0.00 .... . 0.00 .... .
  • EMS日志显示类似于以下内容的磁盘错误:

Tue May 17 08:06:00 +0000 [node1: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device 0c.01.5: Check Condition: CDB 0x8a:000000019b222800:00000120: Sense Data SCSI:aborted command -  (0xb - 0x2f 0x14 0x0)(4509).
Tue May 17 08:06:00 +0000 [node1: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device 0c.01.5: Check Condition: CDB 0x8a:000000019b222928:00000010: Sense Data SCSI:aborted command -  (0xb - 0x2f 0x14 0x0)(4512).
Tue May 17 08:06:00 +0000 [node1: scsi_ecmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device 0c.01.5: Check Condition: CDB 0x8a:000000019b222940:000000c0: Sense Data SCSI:aborted command -  (0xb - 0x2f 0x14 0x0)(4514).

  • 驱动器稍后不久将恢复:

Tue May 17 08:06:00 +0000 [node1: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 0c.01.5: request successful after retry #0/#1: cdb 0x8a:000000019b222800:00000120 (5017).
Tue May 17 08:06:00 +0000 [node1: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 0c.01.5: request successful after retry #0/#1: cdb 0x8a:000000019b222940:000000c0 (5017).

  • 此磁盘可能会进行分区、从而影响多个聚合、如下所示:

Tue May 17 08:06:00 +0000 [node1: wafl_exempt00: wafl.cp.toolong:error]: Aggregate aggr1 experienced a long CP.
Tue May 17 08:06:45 +0000 [node1: wafl_exempt00: wafl.cp.toolong:error]: Aggregate aggr2 experienced a long CP.

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.