由多个磁盘故障导致的接管

最后更新
另存为PDF

Views:: 465

Visibility:: Public

Votes:: 0

Category:: metrocluster

Specialty:: 7dot

Last Updated:

适用场景

Data ONTAP （ 7- 模式） 8.2.5P5
FAS6250
双节点光纤连接 MetroCluster

问题描述

在以下消息中观察到多个与磁盘相关的错误：

" 写入操作期间校验和条目无效 " 位于多个磁盘上
"orphaning disk because not in consistent label set （ CLS） " on 多个磁盘
' 正在孤立磁盘，因为它比计算的磁盘更新丛一致的标签集 "
已触发 SyncMirror 丛失败的 AutoSupport
"iskown.ownerReservationMismatch" 错误

示例

Sat May 15 04:50:41 UTC [Node01:raid.tetris.cksum.embed:CRITICAL]: Invalid checksum entry on Disk /aggr_Node01_data/plex1/rg1/Site01-sw1:2.126L36 Shelf 31 Bay 9 [NETAPP   X422_SLTNG600A10 NA02] S/N [SerialNumber], block #60799576, during write operation.  
 Sat May 15 04:51:16 UTC [Node01:raid.assim.cls.notInCls:error]: Orphaning disk Site02-sw1:2.126L14 in plex aggr_Node01_data/1, because not in consistent label set (CLS). 
 Sat May 15 04:51:16 UTC [Node01:raid.assim.cls.moreRecent:error]: Orphaning disk Site01-sw2:2.126L14 in plex aggr_Node01_data/0, because it is more recent (146175/1789746823, 146175/1789746823) than the calculated plex consistent label set (146174/1789745659).
 Sat May 15 04:51:16 UTC [Node01:raid.assim.rg.missingChild:error]: Aggregate aggr_Node01_data, rgobj_verify: RAID object 0 has only 18 valid children, expected 22.  
 Sat May 15 04:51:16 UTC [Node01:raid.assim.plex.missingChild:error]: Aggregate aggr_Node01_data, plexobj_verify: Plex 1 only has 1 working RAID groups (2 total) and is being taken offline  
 Sat May 15 04:51:16 UTC [Node01:callhome.syncm.plex:CRITICAL]: Call home for SYNCMIRROR PLEX FAILED 
 Sat May 15 04:51:17 UTC [Node01:raid.config.check.failedPlex:error]: Plex /aggr_Node01_data/plex1 has failed.  
 Sat May 15 04:51:17 UTC [Node01:monitor.diskLabelCheckFailed:warning]: Periodic check of RAID Disk /aggr_Node01_data/plex1/rg0/Site01-sw1:2.126L54 Shelf 32 Bay 1 [NETAPP   X422_SLTNG600A10 NA02] S/N [SerialNumber] has failed. The system will correct the problem.  
 Sat May 15 04:51:17 UTC [Node01:monitor.diskLabelCheckFailed:warning]: Periodic check of RAID Disk Site01-sw1:2.126L14 Shelf 30 Bay 13 [NETAPP   X422_SCOMP600A10 NA03] S/N [SerialNumber] has failed. The system will correct the problem.  
 Sat May 15 04:51:17 UTC [Node01:raid.config.check.failedPlex:error]: Plex /aggr_Node01_data/plex1 has failed.  
 Sat May 15 04:51:39 UTC [Node01:diskown.ownerReservationMismatch:warning]: disk Site01-sw2:2.126L12 (S/N SerialNumber) is supposed to be owned by this node but has a persistent reservation placed by node ?? (ID 28600)

首次启动这些错误后不久，由于节点处于降级状态，配对节点将接管该节点。

示例：

 A disk reservation was detected on disk Site01-sw1:2.126L8 at DDMMMYYYY 04:53:51
 Ordinarily, this will only occur if the partner node has taken over.
 This node will be shutdown.
 HALT: HA partner has taken over disk reservations
 Uptime: ddhhmmss
 System rebooting...