跳转到主内容

由于后端柔性阵列磁盘丢失导致多磁盘故障

Views:
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw
Last Updated:

适用场景

  • ONTAP 9
  • 柔性阵列

问题描述

  • 由于多磁盘故障,单个节点正在重新启动:

Thu May 15 05:04:39 -0400 [Node-01: cf_main: cf.fsm.takeover.mdp:alert]: Failover monitor: takeover attempted after multi-disk failure on partner

  • 问题仅局限于单个存储端口。EMS
  • 消息显示,存储端口上的磁盘 IO 操作中止,并通过配对交换机成功重试:

Thu May 15 00:23:37 -0400 [Node-02: slifc_timeout_1: fci.device.quiesce:debug]: Adapter 2c encountered a command timeout on Disk device Switch-1:21.126 (0x010b1500) LUN 2 cdb 0x2a:0d3619d3:019b retry: 0 Quiescing the device.
Thu May 15 00:23:40 -0400 [Node-02: slifc_timeout_1: fci.device.timeout:debug]: HBA 2c encountered a device timeout on Disk device Switch-1:21.126 (0x010b1500) LUN 2 cdb 0x2a:0d3619d3:019b retry: 0
Thu May 15 00:23:46 -0400 [Node-02: slifc_intrd: scsi.cmd.abortedByHost:error]: Disk device Switch-1:21.126L42: Command aborted by host adapter: HA status 0x4: cdb 0x2a:0d3619d3:019b. 
Thu May 15 00:23:46 -0400 [Node-02: slifc_intrd: scsi.cmd.retrySuccess:debug]: Disk device Switch-2:21.126L42: request successful after retry #1/#0: cdb 0x2a:0d3619d3:019b (24266).

  • 有时,IO 不会中止,而是会失败,导致磁盘被标记为无响应:

Thu May 15 05:04:39 -0400 [Node-02: slifc_intrd: scsi.cmd.pastTimeToLive:error]: Disk device Switch-1:21.126L42: request failed after try #1: cdb 0x8a:00000001cfccd24a:00000249. 
Thu May 15 05:04:39 -0400 [Node-02: config_thread: raid.config.filesystem.disk.not.responding:notice]: File system Disk /aggr1/plex0/rg0/Switch-1:21.126L42 Shelf - Bay - [HITACHI  OPEN-V 8301] S/N [XXXXXXXXXXXX] UID [xx...xx] is not responding.
Thu May 15 05:04:39 -0400 [Node-02: config_thread: cf.multidisk.fatalProblem:error]: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr aggr1: raid volfsm, fatal disk error in RAID group with no parity disk..  Raid type - raid0 Group name plex0/rg0 state NORMAL. 1 disk failed in the group. Disk Switch-1:21.126L19 Shelf - Bay - [HITACHI  OPEN-V 8301] S/N [XXXXXXXXXXXX] UID [xx..xx] error: disk operation timed out..

  • 重新启动后,所有磁盘均可见并且聚合正常。

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.

 

  • 这篇文章对您有帮助吗?