由多个磁盘故障导致的接管
适用场景
- Data ONTAP ( 7- 模式) 8.2.5P5
- FAS6250
- 双节点光纤连接 MetroCluster
问题描述
在以下消息中观察到多个与磁盘相关的错误:
- " 写入操作期间校验和条目无效 " 位于多个磁盘上
- "orphaning disk because not in consistent label set ( CLS) " on 多个磁盘
- ' 正在孤立磁盘,因为它比计算的磁盘更新 丛一致的标签集 "
- 已触发 SyncMirror 丛失败的 AutoSupport
- "iskown.ownerReservationMismatch" 错误
示例
Sat May 15 04:50:41 UTC [Node01:raid.tetris.cksum.embed:CRITICAL]: Invalid checksum entry on Disk /aggr_Node01_data/plex1/rg1/Site01-sw1:2.126L36 Shelf 31 Bay 9 [NETAPP X422_SLTNG600A10 NA02] S/N [SerialNumber], block #60799576, during write operation.
Sat May 15 04:51:16 UTC [Node01:raid.assim.cls.notInCls:error]: Orphaning disk Site02-sw1:2.126L14 in plex aggr_Node01_data/1, because not in consistent label set (CLS).
Sat May 15 04:51:16 UTC [Node01:raid.assim.cls.moreRecent:error]: Orphaning disk Site01-sw2:2.126L14 in plex aggr_Node01_data/0, because it is more recent (146175/1789746823, 146175/1789746823) than the calculated plex consistent label set (146174/1789745659).
Sat May 15 04:51:16 UTC [Node01:raid.assim.rg.missingChild:error]: Aggregate aggr_Node01_data, rgobj_verify: RAID object 0 has only 18 valid children, expected 22.
Sat May 15 04:51:16 UTC [Node01:raid.assim.plex.missingChild:error]: Aggregate aggr_Node01_data, plexobj_verify: Plex 1 only has 1 working RAID groups (2 total) and is being taken offline
Sat May 15 04:51:16 UTC [Node01:callhome.syncm.plex:CRITICAL]: Call home for SYNCMIRROR PLEX FAILED
Sat May 15 04:51:17 UTC [Node01:raid.config.check.failedPlex:error]: Plex /aggr_Node01_data/plex1 has failed.
Sat May 15 04:51:17 UTC [Node01:monitor.diskLabelCheckFailed:warning]: Periodic check of RAID Disk /aggr_Node01_data/plex1/rg0/Site01-sw1:2.126L54 Shelf 32 Bay 1 [NETAPP X422_SLTNG600A10 NA02] S/N [SerialNumber] has failed. The system will correct the problem.
Sat May 15 04:51:17 UTC [Node01:monitor.diskLabelCheckFailed:warning]: Periodic check of RAID Disk Site01-sw1:2.126L14 Shelf 30 Bay 13 [NETAPP X422_SCOMP600A10 NA03] S/N [SerialNumber] has failed. The system will correct the problem.
Sat May 15 04:51:17 UTC [Node01:raid.config.check.failedPlex:error]: Plex /aggr_Node01_data/plex1 has failed.
Sat May 15 04:51:39 UTC [Node01:diskown.ownerReservationMismatch:warning]: disk Site01-sw2:2.126L12 (S/N SerialNumber) is supposed to be owned by this node but has a persistent reservation placed by node ?? (ID 28600)
首次启动这些错误后不久,由于节点处于降级状态,配对节点将接管该节点。
示例:
A disk reservation was detected on disk Site01-sw1:2.126L8 at DDMMMYYYY 04:53:51
Ordinarily, this will only occur if the partner node has taken over.
This node will be shutdown.
HALT: HA partner has taken over disk reservations
Uptime: ddhhmmss
System rebooting...