跳转到主内容

由多个磁盘故障导致的接管

Views:
27
Visibility:
Public
Votes:
0
Category:
metrocluster
Specialty:
7dot
Last Updated:

适用场景

  • Data ONTAP ( 7- 模式) 8.2.5P5
  • FAS6250
  • 双节点光纤连接 MetroCluster

问题描述

在以下消息中观察到多个与磁盘相关的错误:
  • " 写入操作期间校验和条目无效 " 位于多个磁盘上
  • "orphaning disk because not in consistent label set ( CLS) " on 多个磁盘
  • ' 正在孤立磁盘,因为它比计算的磁盘更新 丛一致的标签集 "
  • 已触发 SyncMirror 丛失败的 AutoSupport
  • "iskown.ownerReservationMismatch" 错误
 
示例
Sat May 15 04:50:41 UTC [Node01:raid.tetris.cksum.embed:CRITICAL]: Invalid checksum entry on Disk /aggr_Node01_data/plex1/rg1/Site01-sw1:2.126L36 Shelf 31 Bay 9 [NETAPP   X422_SLTNG600A10 NA02] S/N [SerialNumber], block #60799576, during write operation.  
Sat May 15 04:51:16 UTC [Node01:raid.assim.cls.notInCls:error]: Orphaning disk Site02-sw1:2.126L14 in plex aggr_Node01_data/1, because not in consistent label set (CLS). 
Sat May 15 04:51:16 UTC [Node01:raid.assim.cls.moreRecent:error]: Orphaning disk Site01-sw2:2.126L14 in plex aggr_Node01_data/0, because it is more recent (146175/1789746823, 146175/1789746823) than the calculated plex consistent label set (146174/1789745659).
Sat May 15 04:51:16 UTC [Node01:raid.assim.rg.missingChild:error]: Aggregate aggr_Node01_data, rgobj_verify: RAID object 0 has only 18 valid children, expected 22.  
Sat May 15 04:51:16 UTC [Node01:raid.assim.plex.missingChild:error]: Aggregate aggr_Node01_data, plexobj_verify: Plex 1 only has 1 working RAID groups (2 total) and is being taken offline  
Sat May 15 04:51:16 UTC [Node01:callhome.syncm.plex:CRITICAL]: Call home for SYNCMIRROR PLEX FAILED 
Sat May 15 04:51:17 UTC [Node01:raid.config.check.failedPlex:error]: Plex /aggr_Node01_data/plex1 has failed.  
Sat May 15 04:51:17 UTC [Node01:monitor.diskLabelCheckFailed:warning]: Periodic check of RAID Disk /aggr_Node01_data/plex1/rg0/Site01-sw1:2.126L54 Shelf 32 Bay 1 [NETAPP   X422_SLTNG600A10 NA02] S/N [SerialNumber] has failed. The system will correct the problem.  
Sat May 15 04:51:17 UTC [Node01:monitor.diskLabelCheckFailed:warning]: Periodic check of RAID Disk Site01-sw1:2.126L14 Shelf 30 Bay 13 [NETAPP   X422_SCOMP600A10 NA03] S/N [SerialNumber] has failed. The system will correct the problem.  
Sat May 15 04:51:17 UTC [Node01:raid.config.check.failedPlex:error]: Plex /aggr_Node01_data/plex1 has failed.  
Sat May 15 04:51:39 UTC [Node01:diskown.ownerReservationMismatch:warning]: disk Site01-sw2:2.126L12 (S/N SerialNumber) is supposed to be owned by this node but has a persistent reservation placed by node ?? (ID 28600)
 
首次启动这些错误后不久,由于节点处于降级状态,配对节点将接管该节点。
 
示例:
 A disk reservation was detected on disk Site01-sw1:2.126L8 at DDMMMYYYY 04:53:51
Ordinarily, this will only occur if the partner node has taken over.
This node will be shutdown.
HALT: HA partner has taken over disk reservations
Uptime: ddhhmmss
System rebooting...
 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.