跳转到主内容

SyncMirror 丛失败—AutoSupport 消息

Views:
41
Visibility:
Public
Votes:
0
Category:
metrocluster
Specialty:
metrocluster
Last Updated:

 

适用场景

  • MetroCluster
  • Data ONTAP 8
  • ONTAP 9
  • SyncMirror

事件摘要

AutoSupport 消息 SYNCMIRROR PLEX FAILED 指示SyncMirror 的丛发生故障、并且SyncMirror 关系处于降级状态。

验证

确定报告出现故障的丛:

storage aggregate show
 
Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0_siteA_02
           953.8GB   46.22GB   95% online       1 siteA-02         raid_dp,
                                                                   mirrored,
                                                                  normal
aggr1_siteA_02
            2.79TB    2.78TB    0% online       2 siteA-02         raid_dp,
                                                                   mirror
                                                                  degraded

解决方法

ONTAP 9
  1. 确定聚合和故障丛:

siteA::>storage aggregate show
 
Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0_siteA_02
           953.8GB   46.22GB   95% online       1 siteA-02         raid_dp,
                                                                   mirrored,
                                                                  normal
aggr1_siteA_02
            2.79TB    2.78TB    0% online       2 siteA-02         raid_dp,
                                                                   mirror
                                                                  degraded

 

状态为 degraded 表示丛不可操作。

  1. 确定丛失败的原因:
    • 磁盘
    • shelf
    • 交换机ISL故障
    • 站点故障

SiteA::>storage aggregate show-status -aggregate aggr2_siteA_02

Owner Node: SiteA-02
 Aggregate: aggr2_siteA_02 (online, raid_dp, mirror degraded) (block checksums)
  Plex: /aggr2_siteA_02/plex0 (online, normal, active, pool0)
   RAID Group /aggr2_siteA_02/plex0/rg0 (normal, block checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     dparity  3.11.4                       0   SAS    10000   1.09TB   1.09TB (normal)
     parity   3.11.5                       0   SAS    10000   1.09TB   1.09TB (normal)
     data     3.11.6                       0   SAS    10000   1.09TB   1.09TB (normal)
     data     3.11.18                      0   SAS    10000   1.09TB   1.09TB (normal)
     data     3.11.19                      0   SAS    10000   1.09TB   1.09TB (normal)

  Plex: /aggr2_siteA_02/plex1 (offline, failed, inactive, pool1)
   RAID Group /aggr2_siteA_02/plex1/rg0 (partial, none checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     dparity  FAILED                       -   -          -   1.09TB       0B (failed)
     parity   FAILED                       -   -          -   1.09TB       0B (failed)
     data     FAILED                       -   -          -   1.09TB       0B (failed)
     data     FAILED                       -   -          -   1.09TB       0B (failed)
     data     FAILED                       -   -          -   1.09TB       0B (failed)

解决发生原因 可能会使磁盘恢复使用、并使丛恢复可操作状态。如果丛恢复到运行状态、则重新同步过程应自动启动。在这种情况下、无需执行进一步操作。您可以使用以下命令监控重新同步过程:

SiteA:>storage aggregate plex show

 

  1. 如果无法修复发生原因 、例如实际磁盘(硬件)故障、站点故障、电耗激增或类似情况、并且无法将足够的磁盘恢复使用以使丛联机、则丛无法修复、但必须销毁并重新创建。

注意:销毁并重新创建丛需要执行完整的镜像基线。确保池中有足够的备用磁盘来重新创建镜像。

要销毁并重新创建镜像、请执行以下步骤:

  • storage aggregate plex delete -aggregate <aggr_name> -plex <degraded_plex_name>
  • storage aggregate mirror -aggregate <aggr_name>
Data ONTAP 8.2
  1. 通过 'aggr status -v输出确定聚合和故障丛

>aggr status -v
           Aggr State           Status                Options
          aggr1 online          raid_dp, aggr         nosnap=on, raidtype=raid_dp, raidsize=14,
                                mirrored              ignore_inconsistent=off, snapmirrored=off,
                                64-bit                resyncsnaptime=60, fs_size_fixed=off,
                                                      lost_write_protect=on, ha_policy=cfo,
                                                      hybrid_enabled=off, percent_snapshot_space=15%,
                                                      free_space_realloc=off
                Volumes: vol1, vol2, vol3

                Plex /aggr1/plex0: online, normal, active
                    RAID group /aggr1/plex0/rg0: normal, block checksums
                Plex /aggr1/plex1: offline, failed, inactive
                    RAID group /aggr1/plex1/rg0: partial, block checksums

状态为""partial表示丛不可操作。

  1. 确定丛失败的原因:
    • 磁盘
    • shelf
    • 交换机ISL故障
    • 站点故障

解决发生原因 可能会使磁盘恢复使用、并使丛恢复可操作状态。如果丛恢复到运行状态、则重新同步过程应自动启动。在这种情况下、无需执行进一步操作。您可以使用以下命令监控重新同步过程:

>sysconfig -r

  1. 如果无法修复发生原因 、例如实际磁盘(硬件)故障、站点故障、电耗激增或类似情况、并且无法将足够的磁盘恢复使用以使丛联机、则丛无法修复、但必须销毁并重新创建。

注意:销毁并重新创建丛需要执行完整的镜像基线。确保池中有足够的备用磁盘来重新创建镜像。

要销毁并重新创建镜像、请执行以下步骤:

  • aggr destroy aggr0/plex1
  • aggr mirror aggr0

追加信息

如果您需要有关排除丛故障的帮助或任何进一步帮助、请联系 NetApp技术支持

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.