跳转到主内容

在非常特定的工作负载条件下、ST16000NM002G驱动器的故障率较高

Views:
23
Visibility:
Public
Votes:
0
Category:
e-series-santricity-os-controller-software<a>2008759385</a>
Specialty:
esg
Last Updated:

适用场景

  • E5760
  • GPFS
  • SANtricity OS 11.60.2R1 - 11.70.2
  • Seagate ST16000NM002G驱动器固件NE00和/或NE01

问题描述

到目前为止、只有在特定工作负载条件下、跨多个E5760 E系列存储阵列的IBM GPFS文件系统才会出现问题描述。
在这种特定实例中、根据驱动器供应商的驱动器分析、99.99%的写入位于驱动器的.01 %中、并且介于1.6 GB范围内。
以高达106MB/秒的速度写入到LBA范围较低的某些热点。
 
症状可能包括:
  • 驱动器端超时导致驱动器通道降级
  • 多个驱动器的写入超时()IOP_FAST_TIMEOUT_ERROR
  • PI错误
  • 报告了无法读取的扇区(URS/数据丢失)

E系列驱动器通道降级和多个单个驱动器降级路径 知识库中详细介绍的常规故障排除步骤无法解决问题。

 
问题描述发生在不同的磁盘架/抽盒/驱动器托架中、并且链中没有可识别的故障通用组件。
重新拔插所有驱动器和蛇形电缆(或上述KB中的其他故障诊断步骤)不会带来任何改进。
潜水时间不到一年(远远低于5年期限)、更换到同一插槽中的驱动器也显示相同的症状/故障。
 
主要事件日志将显示类似于以下内容的事件:
 
A:11/30/21, 3:31:03 AM (03:31:03) 2206 1209 Drive channel set to Degraded - Drive-side: channel 3 <--CRITICAL
A:11/30/21, 3:31:03 AM (03:31:03) 2205 1513 Individual drive - Degraded path - Drive-side: channel 3 <--CRITICAL
A:11/30/21, 3:30:55 AM (03:30:55) 2204 100d Timeout on drive side of controller - Shelf 40, Drawer 1, Bay 5
A:11/30/21, 3:30:46 AM (03:30:46) 2203 2014 VDD logged an error - Shelf 40, Bay A - SSID: 6, Devnum: 0x010005 LBA: 0x189fae400, Blocks: 0x400 - Recovered
----> Flags: 0x40202001 = READ: Read Operation, NOLOCK: Prevent lock during read err., PI: Error coding in effect, NOCACHE: CDB DPO cache lowest retention
----> Recovery: 0x2 = Reconstruction used, ASC: 0x1f = IOP_FAST_TIMEOUT_ERROR, Detection: 0xf80b0328
A:11/30/21, 3:30:43 AM (03:30:43) 2202 2014 VDD logged an error - Shelf 40, Bay A - SSID: 6, Devnum: 0x010005 LBA: 0xb49c7358, Blocks: 0x8 - Recovered
----> Flags: 0x40202001 = READ: Read Operation, NOLOCK: Prevent lock during read err., PI: Error coding in effect, NOCACHE: CDB DPO cache lowest retention
----> Recovery: 0x2 = Reconstruction used, ASC: 0x1f = IOP_FAST_TIMEOUT_ERROR, Detection: 0xf80b0328
A:11/30/21, 3:30:43 AM (03:30:43) 2201 100d Timeout on drive side of controller - Shelf 40, Drawer 1, Bay 5
A:11/30/21, 3:30:06 AM (03:30:06) 2200 100d Timeout on drive side of controller - Shelf 40, Drawer 1, Bay 5
A:11/30/21, 3:29:49 AM (03:29:49) 2199 100d Timeout on drive side of controller - Shelf 40, Drawer 1, Bay 5
A:11/30/21, 3:29:41 AM (03:29:41) 2198 2014 VDD logged an error - Shelf 40, Bay A - SSID: 6, Devnum: 0x010005 LBA: 0x21639800, Blocks: 0x400 - Recovered
----> Flags: 0x40202081 = READ: Read Operation, PARITY: Parity data, NOLOCK: Prevent lock during read err., PI: Error coding in effect, NOCACHE: CDB DPO cache lowest retention
----> Recovery: 0x2 = Reconstruction used, ASC: 0x1f = IOP_FAST_TIMEOUT_ERROR, Detection: 0xf80b0328
A:11/30/21, 3:29:39 AM (03:29:39) 2197 100d Timeout on drive side of controller - Shelf 40, Drawer 1, Bay 5
A:11/30/21, 3:29:38 AM (03:29:38) 2196 2014 VDD logged an error - Shelf 40, Bay A - SSID: 6, Devnum: 0x010005 LBA: 0x1538dcec0, Blocks: 0x10 - Recovered
----> Flags: 0x40202081 = READ: Read Operation, PARITY: Parity data, NOLOCK: Prevent lock during read err., PI: Error coding in effect, NOCACHE: CDB DPO cache lowest retention
----> Recovery: 0x2 = Reconstruction used, ASC: 0x1f = IOP_FAST_TIMEOUT_ERROR, Detection: 0xf80b0328
A:11/30/21, 3:29:35 AM (03:29:35) 2195 2014 VDD logged an error - Shelf 40, Bay A - SSID: 6, Devnum: 0x010005 LBA: 0x1266587a0, Blocks: 0x8 - Recovered
----> Flags: 0x40202081 = READ: Read Operation, PARITY: Parity data, NOLOCK: Prevent lock during read err., PI: Error coding in effect, NOCACHE: CDB DPO cache lowest retention
----> Recovery: 0x2 = Reconstruction used, ASC: 0x1f = IOP_FAST_TIMEOUT_ERROR, Detection: 0xf80b0328
 
A:12/31/21, 9:31:45 AM (09:31:45) 52721 6700 Unreadable sector(s) detected data loss occurred - Volume DDP06_04 - LBA: 0x12c239814b <--CRITICAL
----> Physical Drive in Tray 0 Slot 0, LBA: 0x84047314b
A:12/31/21, 9:31:44 AM (09:31:44) 52720 6700 Unreadable sector(s) detected data loss occurred - Volume DDP06_04 - LBA: 0x12c239814a <--CRITICAL
----> Physical Drive in Tray 0 Slot 0, LBA: 0x84047314a
A:12/31/21, 9:31:42 AM (09:31:42) 52719 6700 Unreadable sector(s) detected data loss occurred - Volume DDP06_04 - LBA: 0x12c2398149 <--CRITICAL
----> Physical Drive in Tray 0 Slot 0, LBA: 0x840473149
A:12/31/21, 9:31:41 AM (09:31:41) 52718 6700 Unreadable sector(s) detected data loss occurred - Volume DDP06_04 - LBA: 0x12c2398148 <--CRITICAL
----> Physical Drive in Tray 0 Slot 0, LBA: 0x840473148
A:12/31/21, 9:31:41 AM (09:31:41) 52717 201e VDD repair started - Shelf 30, Bay A - SSID: 33, Devnum: 0xffffff
A:12/31/21, 9:31:41 AM (09:31:41) 52716 201f VDD repair completed - Shelf 30, Bay A - SSID: 33, Devnum: 0x010217 LBA: 0x12c2399800
----> Flags: 0x202005 = READ: Read Operation, ERROR: IO Compl. w. Err, NOLOCK: Prevent lock during read err., PI: Error coding in effect - Error: 0x844 = UA_MISCORRECTED_DATA_ERROR
A:12/31/21, 9:31:40 AM (09:31:40) 52715 6700 Unreadable sector(s) detected data loss occurred - Volume DDP06_04 - LBA: 0x12c239994f <--CRITICAL
----> Physical Drive in Tray 32 Slot 24, LBA: 0x3eed7314f
A:12/31/21, 9:31:40 AM (09:31:40) 52714 1012 Destination driver error - Shelf 32, Drawer 2, Bay 11
A:12/31/21, 9:31:40 AM (09:31:40) 52713 1016 Drive returned unrecoverable media error - Shelf 32, Drawer 2, Bay 11
----> Sense 3/11/0 = Medium Error - Unrecovered read error - CDB: 0x7f(0x9) = Read(32) - LBA: ~0x3eed7314f
A:12/31/21, 9:31:37 AM (09:31:37) 52712 1016 Drive returned unrecoverable media error - Shelf 32, Drawer 2, Bay 11
----> Sense 3/11/0 = Medium Error - Unrecovered read error - CDB: 0x7f(0x9) = Read(32) - LBA: ~0x3eed7314f
 
A:12/25/21, 7:58:16 AM (07:58:16) 47154 100d Timeout on drive side of controller - Shelf 33, Drawer 4, Bay 5
B:12/25/21, 7:58:40 AM (07:58:40) 47153 2215 Drive marked failed - Shelf 33, Drawer 4, Bay 5
B:12/25/21, 7:58:40 AM (07:58:40) 47152 226c Drive failure - Shelf 33, Drawer 4, Bay 5 - Cause: 3 = Write failure; Drive WWN: 5000c500cadc69b7; SN: ZL29F9KB0000C107BKS5 <--CRITICAL
B:12/25/21, 7:58:40 AM (07:58:40) 47151 2226 Drive spun down - Shelf 33, Drawer 4, Bay 5
B:12/25/21, 7:58:40 AM (07:58:40) 47150 7e05 Drive recovery criteria not met - Shelf 33, Drawer 4, Bay 5
B:12/25/21, 7:58:39 AM (07:58:39) 47149 100d Timeout on drive side of controller - Shelf 33, Drawer 4, Bay 5

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.