跳转到主内容

故障磁盘会导致性能影响

Views:
56
Visibility:
Public
Votes:
1
Category:
ontap-9
Specialty:
perf
Last Updated:

适用于

  • 非故障驱动器
    • 不适用于已发生故障的单个驱动器
    • ONTAP 将根据错误和延迟阈值使驱动器发生故障

问题描述

  • 观察到高容量 (FlexVol) 延迟。
    • 在某些情况下,高延迟可能会导致 NFS 断开连接
  • 运行 qos statistics volume latency show 命令显示 disk 列下的主要延迟。示例

::> qos statistics volume latency show -vserver SVM_name -volume vol_name
Workload       ID   Latency   Network   Cluster     Data     Disk   QoS Max   QoS Min    NVRAM ...
--------------- ------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ...
workload_name   12345  154.92ms  294.00us     0ms  1115.00us  153.36ms    0ms     0ms   157.00us ...
workload_name   12345  117.39ms  376.00us     0ms    1.59ms  115.27ms    0ms     0ms   157.00us ...
workload_name   12345  110.26ms  391.00us     0ms    1.86ms  107.86ms    0ms     0ms   139.00us ...
...

  • 单个驱动器在 RAID 组中表现出明显更高的利用率和延迟。示例

::> system node run -node node_name -command "priv set -q advanced; statit -e"
...
disk        ut%  xfers  ureads--chain-usecs writes--chain-usecs cpreads-chain-usecs greads--chain-usecs  ...
/aggr1/plex0/rg0:
0a.10.10      31  93.15   0.00   ....    .  54.89  26.94   590  38.26  38.85   155   0.00   ....    .   ...
0a.10.1       33  93.98   0.00   ....    .  55.75  26.55   630  38.23  38.83   183   0.00   ....    .   ...
0a.10.2       19 118.78   9.53   3.50  8515  56.77  10.57   291  52.49   9.60   543   0.00   ....    .   ...
0a.10.3       21 120.65   10.11   3.80  8440  58.10  10.88   362  52.43   9.50   566   0.00   ....    .  ...
0a.10.4       20 119.76   9.21   3.27  9108  57.79  10.54   314  52.76   9.44   552   0.00   ....    .  ...
0a.10.5      100 121.62   10.52   3.22 19375  58.78  10.20  7699  52.32   9.79  4831   0.00   ....    .  ...
0a.10.6       18 119.96   9.57   3.33  8727  57.97  10.73   216  52.42   9.64   541   0.00   ....    .  ...
0a.10.7       18 119.06   9.01   3.53  8786  57.71  10.57   223  52.34   9.56   535   0.00   ....    .  ...
0a.10.8       18 121.28   9.75   3.76  8179  59.29  10.89   235  52.24   9.72   544   0.00   ....    .  ...
...

  • ONTAP 事件(EMS 日志)可报告:
    • 在将驱动器标记为故障之前,驱动器上出现多个错误和中止。示例

... scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 3b.51.1L2: request successful after retry ...
... scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 3b.51.1L2: request successful after retry ...
... scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 3b.51.1L2: request successful after retry ...
... config_thread: raid.disk.delete.drl:debug]: aggregate Disk /aggr_name/plex0/rg0/ [...] Deleting dirty region log ...

 

  • 聚合中的"长"一致性点 (CP)。示例

wafl_exempt08: wafl.cp.toolong:error]: Aggregate aggr_name experienced a long CP.

  • 存储运行状况监视器 IO 延迟(shm.threshold.ioLatency)示例

[Cluster-01: disk_latency_monitor: shm.threshold.ioLatency:debug]: Disk XX.XX.XX has exceeded the expected IO latency in the current window with average latency of 50 msecs and average utilization of 100 percent. Highest average IO latency: XX.XX.: 50 msecs; next highest IO latency: XX.XX.XX: 6 msecs. Disk XX.XX.XX Shelf X Drawer X Slot X Bay XX [NETAPP   X375_TTCRE04TA07 NA03] S/N [#########] 

 

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.