SyncMirror 丛失败—AutoSupport 消息
适用场景
- MetroCluster
- Data ONTAP 8
- ONTAP 9
- SyncMirror
事件摘要
AutoSupport 消息 SYNCMIRROR PLEX FAILED 指示SyncMirror 的丛发生故障、并且SyncMirror 关系处于降级状态。
验证
确定报告出现故障的丛:
storage aggregate show
  
 Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
 --------- -------- --------- ----- ------- ------ ---------------- ------------
 aggr0_siteA_02
            953.8GB   46.22GB   95% online       1 siteA-02         raid_dp,
                                                                    mirrored,
                                                                   normal
 aggr1_siteA_02
             2.79TB    2.78TB    0% online       2 siteA-02         raid_dp,
                                                                    mirror
                                                                   degraded
解决方法
ONTAP 9
- 确定聚合和故障丛:
siteA::>storage aggregate show
  
 Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
 --------- -------- --------- ----- ------- ------ ---------------- ------------
 aggr0_siteA_02
            953.8GB   46.22GB   95% online       1 siteA-02         raid_dp,
                                                                    mirrored,
                                                                   normal
 aggr1_siteA_02
             2.79TB    2.78TB    0% online       2 siteA-02         raid_dp,
                                                                    mirror
                                                                   degraded
  
状态为 degraded 表示丛不可操作。
- 确定丛失败的原因:- 磁盘
- shelf
- 交换机ISL故障
- 站点故障
 
SiteA::>storage aggregate show-status -aggregate aggr2_siteA_02
Owner Node: SiteA-02
  Aggregate: aggr2_siteA_02 (online, raid_dp, mirror degraded) (block checksums)
   Plex: /aggr2_siteA_02/plex0 (online, normal, active, pool0)
    RAID Group /aggr2_siteA_02/plex0/rg0 (normal, block checksums)
                                                               Usable Physical
      Position Disk                        Pool Type     RPM     Size     Size Status
      -------- --------------------------- ---- ----- ------ -------- -------- ----------
      dparity  3.11.4                       0   SAS    10000   1.09TB   1.09TB (normal)
      parity   3.11.5                       0   SAS    10000   1.09TB   1.09TB (normal)
      data     3.11.6                       0   SAS    10000   1.09TB   1.09TB (normal)
      data     3.11.18                      0   SAS    10000   1.09TB   1.09TB (normal)
      data     3.11.19                      0   SAS    10000   1.09TB   1.09TB (normal)
  Plex: /aggr2_siteA_02/plex1 (offline, failed, inactive, pool1)
    RAID Group /aggr2_siteA_02/plex1/rg0 (partial, none checksums)
                                                               Usable Physical
      Position Disk                        Pool Type     RPM     Size     Size Status
      -------- --------------------------- ---- ----- ------ -------- -------- ----------
      dparity  FAILED                       -   -          -   1.09TB       0B (failed)
      parity   FAILED                       -   -          -   1.09TB       0B (failed)
      data     FAILED                       -   -          -   1.09TB       0B (failed)
      data     FAILED                       -   -          -   1.09TB       0B (failed)
      data     FAILED                       -   -          -   1.09TB       0B (failed)
解决发生原因 可能会使磁盘恢复使用、并使丛恢复可操作状态。如果丛恢复到运行状态、则重新同步过程应自动启动。在这种情况下、无需执行进一步操作。您可以使用以下命令监控重新同步过程:
SiteA:>storage aggregate plex show
- 如果无法修复发生原因 、例如实际磁盘(硬件)故障、站点故障、电耗激增或类似情况、并且无法将足够的磁盘恢复使用以使丛联机、则丛无法修复、但必须销毁并重新创建。
注意:销毁并重新创建丛需要执行完整的镜像基线。确保池中有足够的备用磁盘来重新创建镜像。
要销毁并重新创建镜像、请执行以下步骤:
- storage aggregate plex delete -aggregate <aggr_name> -plex <degraded_plex_name>
- storage aggregate mirror -aggregate <aggr_name>
Data ONTAP 8.2
- 通过 'aggr status -v输出确定聚合和故障丛
>aggr status -v
            Aggr State           Status                Options
           aggr1 online          raid_dp, aggr         nosnap=on, raidtype=raid_dp, raidsize=14,
                                 mirrored              ignore_inconsistent=off, snapmirrored=off,
                                 64-bit                resyncsnaptime=60, fs_size_fixed=off,
                                                       lost_write_protect=on, ha_policy=cfo,
                                                       hybrid_enabled=off, percent_snapshot_space=15%,
                                                       free_space_realloc=off
                 Volumes: vol1, vol2, vol3
                Plex /aggr1/plex0: online, normal, active
                     RAID group /aggr1/plex0/rg0: normal, block checksums
                 Plex /aggr1/plex1: offline, failed, inactive
                     RAID group /aggr1/plex1/rg0: partial, block checksums
状态为""partial表示丛不可操作。
- 确定丛失败的原因:- 磁盘
- shelf
- 交换机ISL故障
- 站点故障
 
解决发生原因 可能会使磁盘恢复使用、并使丛恢复可操作状态。如果丛恢复到运行状态、则重新同步过程应自动启动。在这种情况下、无需执行进一步操作。您可以使用以下命令监控重新同步过程:
>sysconfig -r
- 如果无法修复发生原因 、例如实际磁盘(硬件)故障、站点故障、电耗激增或类似情况、并且无法将足够的磁盘恢复使用以使丛联机、则丛无法修复、但必须销毁并重新创建。
注意:销毁并重新创建丛需要执行完整的镜像基线。确保池中有足够的备用磁盘来重新创建镜像。
要销毁并重新创建镜像、请执行以下步骤:
- aggr destroy aggr0/plex1
- aggr mirror aggr0
追加信息
如果您需要有关排除丛故障的帮助或任何进一步帮助、请联系 NetApp技术支持。