由于缺少多个磁盘,AWS 或 GCP CVO 已重新启动
适用于
- Cloud Volumes ONTAP (CVO)
- Amazon Web Services (AWS)
- Google Cloud Provider (GCP)
问题描述
- AWS / GCP CVO 节点从幸存的 HA 合作伙伴重新启动,并收到 AutoSupport
HA Group Notification (MULTIPLE DISKS MISSING) ERROR
。 - 从幸存节点的 EMS 日志中可以看出,它已失去对连接到故障节点的镜像 Pool1 磁盘的访问权限:
Mon Jun 03 16:23:02 +0000 [CVO-01: monitor: monitor.globalStatus.critical:EMERGENCY]: This node has taken over CVO-02. One or more mirrored aggregates are degraded.
Mon Jun 03 16:22:35 +0000 [CVO-01: dmgr_thread: raid.disk.missing:info]: Disk /aggr1/plex1/rg0/0d.10 S/N [00000000V9NeubcHXfRG] UID [00000000V9NeubcHXfRG] is missing from the system
Mon Jun 03 16:22:35 +0000 [CVO-01: config_thread: raid.config.filesystem.disk.missing:info]: File system Disk /aggr1/plex1/rg0/0d.10 S/N [00000000V9NeubcHXfRG] UID [00000000V9NeubcHXfRG] is missing.
注:上述错误适用于受影响节点 CVO-02 拥有的所有磁盘。
- Storage failover show 输出报告
Previous giveback failed in module: raid
如下所示:
::> storage failover show
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
CVO-01 CVO-02 false Previous giveback failed in module:
raid
CVO-02 CVO-01 - Waiting for giveback
- EMS 日志(以下错误可能会重复,直到 RAID 重新同步完成):
Sat Jul 19 04:15:20 +0000 [CVO-01: cf_main: gb.cfo.abort.raid.fm:error]: Aggregate local:aggr8 is being resynced; canceling giveback.
Sat Jul 19 04:15:20 +0000 [CVO-01: cf_main: cf.rsrc.givebackVeto:alert]: Failover monitor: raid: giveback canceled due to active state.
Sat Jul 19 04:15:20 +0000 [CVO-01: cf_main: cf.fsm.autoGivebackVetoed:error]: Failover monitor: Automatic giveback has been deferred due to long running operations
- 此事件发生后不久,可能会生成以下 AutoSupport 警报,作为丢失磁盘的残留症状:
HA Group Notification (SYNCMIRROR PLEX FAILED) ALERT
NODEOQ:来自CVO-02的HA组通知(节点超出群集仲裁)紧急情况
- 在节点重新启动后,它能够重新建立与所提供的 AWS / GCP 磁盘的连接,并成功完成回馈。