跳转到主内容

MetroCluster 配置中的 Solaris 主机支持注意事项

Views:
22
Visibility:
Public
Votes:
0
Category:
metrocluster
Specialty:
MetroCluster
Last Updated:

适用场景

  • MetroCluster 配置中的 Solaris 主机支持注意事项
  • MetroCluster
  • ONTAP 9

问题解答

默认情况下, Solaris OS 可以在“全路径下”( APD )中生存多达 20 秒;这由 fcp_offline_delay 参数控制。  parameter.
为了使 Solaris 主机在所有 MetroCluster 工作流(如协商切换、切换、拉断器计划外切换和自动计划外切换)期间继续运行而不会中断,建议将 fcp_offline_delay 设置为 120 秒。

MetroCluster 支持的重要注意事项:

主机对本地 HA 故障转移的响应

当 fcp_offline_delay 值增加时、应用程序服务恢复时间会在本地 HA 故障转移期间增加(例如节点出现紧急情况、然后继续节点接管泛节点)。
例如,对于 fcp_offline_delay = 120s 、 Solaris 客户机最多可以占用 120 秒的时间来恢复应用程序服务。

FCP 错误处理

如果默认值为 fcp_offline_delay 、当启动程序端口连接失败时、 FCP 驱动程序需要 110 秒来通知上层 (MPXIO) 。一旦将 fcp_offline_delay 增加到 120 秒、驱动程序通知上层 (MPXIO) 所需的总时间为 210 秒;这可能会导致 I/O 延迟。请参阅 Oracle 文档 ID : 1018952.1 。如果光纤通道端口出现故障、则在设备脱机之前可能会看到额外的 110 秒延迟。

与第三方阵列共存

由于 fcp_offline_delay 参数是全局参数,可能会影响与连接到 FCP 驱动程序的所有存储的交互。

如何修改 fcp_offline_delay 的设置

对于 Solaris 10u8 、 10u9 、 10u10 和 10u11 :
可以在 /kernel/drv/fcp.conf 文件中设置 fcp_offline_delay 。添加以下行会将计时器更改为 120 秒。
fcp_offline_delay =120 ;
主机应重新引导以使设置生效。
主机启动后、检查内核是否设置了参数:
# mdb -k
> fcp_offline_delay/D
fcp_offline_delay:
fcp_offline_delay:      120
>Ctrl_D

对于 Solaris 11
,可以在 /etc/driver/drv/fcp.conf 文件中设置 fcp_offline_delay 。添加以下行会将计时器更改为 120 秒。
fcp_offline_delay =120 ;
主机应重新引导以使设置生效。
主机启动后、检查内核是否设置了参数:
# mdb -k
> fcp_offline_delay/D
fcp_offline_delay:
fcp_offline_delay:      120
>Ctrl_D

主机恢复示例:

如果发生灾难故障转移或计划外切换,并且所用时间异常长(超过 120 秒),从而可能导致主机应用程序出现发生原因故障,请在修复主机应用程序之前参见以下示例:

zpool 恢复:

确保所有 LUN 均已联机。

运行以下命令:

# zpool list
NAME             SIZE  ALLOC   FREE  CAP  HEALTH  ALTROOT
n_zpool_site_a  99.4G  1.31G  98.1G   1%  OFFLINE  -
n_zpool_site_b   124G  2.28G   122G   1%  OFFLINE  -
 
Check the individual pool status:
# zpool status n_zpool_site_b
  pool: n_zpool_site_b
 state: SUSPENDED ==============è>>>>>>>>>>>>>> POOL SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://www.sun.com/msg/ZFS-8000-HC
scan: none requested
config:
 
        NAME                                     STATE     READ WRITE CKSUM
        n_zpool_site_b                           UNAVAIL      1 1.64K     0  experienced I/O failures
          c0t600A098051764656362B45346144764Bd0  UNAVAIL      1     0     0  experienced I/O failures
          c0t600A098051764656362B453461447649d0  UNAVAIL      1    40     0  experienced I/O failures
          c0t600A098051764656362B453461447648d0  UNAVAIL      0    38     0  experienced I/O failures
          c0t600A098051764656362B453461447647d0  UNAVAIL      0    28     0  experienced I/O failures
          c0t600A098051764656362B453461447646d0  UNAVAIL      0    34     0  experienced I/O failures
          c0t600A09805176465657244536514A7647d0  UNAVAIL      0 1.03K     0  experienced I/O failures
          c0t600A098051764656362B453461447645d0  UNAVAIL      0    32     0  experienced I/O failures
          c0t600A098051764656362B45346144764Ad0  UNAVAIL      0    34     0  experienced I/O failures
          c0t600A09805176465657244536514A764Ad0  UNAVAIL      0 1.03K     0  experienced I/O failures
          c0t600A09805176465657244536514A764Bd0  UNAVAIL      0 1.04K     0  experienced I/O failures
          c0t600A098051764656362B45346145464Cd0  UNAVAIL      1     2     0  experienced I/O failures
 
The above pool has degraded.

运行以下命令以清除池状态:

#zpool clear n_zpool_site_b                    

再次检查池:

# zpool status n_zpool_site_b
  pool: n_zpool_site_b
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scan: none requested
config:
 
        NAME                                     STATE     READ WRITE CKSUM
        n_zpool_site_b                           ONLINE       0     0     0
          c0t600A098051764656362B45346144764Bd0  ONLINE       0     0     0
          c0t600A098051764656362B453461447649d0  ONLINE       0     0     0
          c0t600A098051764656362B453461447648d0  ONLINE       0     0     0
          c0t600A098051764656362B453461447647d0  ONLINE       0     0     0
          c0t600A098051764656362B453461447646d0  ONLINE       0     0     0
          c0t600A09805176465657244536514A7647d0  ONLINE       0     0     0
          c0t600A098051764656362B453461447645d0  ONLINE       0     0     0
          c0t600A098051764656362B45346144764Ad0  ONLINE       0     0     0
          c0t600A09805176465657244536514A764Ad0  ONLINE       0     0     0
          c0t600A09805176465657244536514A764Bd0  ONLINE       0     0     0
          c0t600A098051764656362B45346145464Cd0  ONLINE       0     0     0
 
errors: 1679 data errors, use '-v' for a list
 

再次检查池状态;此处,池中的磁盘已降级。

[22] 05:44:07 (root@host1) /
# zpool status n_zpool_site_b -v
cannot open '-v': name must begin with a letter
  pool: n_zpool_site_b
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scan: scrub repaired 0 in 0h0m with 0 errors on Fri Dec  4 05:44:17 2015
config:
 
        NAME                                     STATE     READ WRITE CKSUM
        n_zpool_site_b                           DEGRADED     0     0     0
          c0t600A098051764656362B45346144764Bd0  ONLINE       0     0     0
          c0t600A098051764656362B453461447649d0  ONLINE       0     0     0
          c0t600A098051764656362B453461447648d0  ONLINE       0     0     0
          c0t600A098051764656362B453461447647d0  ONLINE       0     0     0
          c0t600A098051764656362B453461447646d0  ONLINE       0     0     0
          c0t600A09805176465657244536514A7647d0  DEGRADED     0     0     0  too many errors
          c0t600A098051764656362B453461447645d0  ONLINE      0     0     0
          c0t600A098051764656362B45346144764Ad0  ONLINE       0     0     0
          c0t600A09805176465657244536514A764Ad0  ONLINE       0     0     0
          c0t600A09805176465657244536514A764Bd0  ONLINE       0     0     0
          c0t600A098051764656362B45346145464Cd0  ONLINE       0     0     0
 
errors: No known data errors

运行以下命令以清除磁盘错误:

# zpool clear n_zpool_site_b c0t600A09805176465657244536514A7647d0
 
[24] 05:45:17 (root@host1) /
# zpool status n_zpool_site_b -v
cannot open '-v': name must begin with a letter
  pool: n_zpool_site_b
 state: ONLINE
 scan: scrub repaired 0 in 0h0m with 0 errors on Fri Dec  4 05:44:17 2015
config:
 
        NAME                                    STATE     READ WRITE CKSUM
        n_zpool_site_b                           ONLINE       0     0     0
          c0t600A098051764656362B45346144764Bd0  ONLINE       0     0     0
          c0t600A098051764656362B453461447649d0  ONLINE       0     0     0
          c0t600A098051764656362B453461447648d0  ONLINE       0     0     0
          c0t600A098051764656362B453461447647d0  ONLINE       0     0     0
          c0t600A098051764656362B453461447646d0  ONLINE       0     0     0
          c0t600A09805176465657244536514A7647d0  ONLINE       0     0     0
          c0t600A098051764656362B453461447645d0  ONLINE       0     0     0
          c0t600A098051764656362B45346144764Ad0  ONLINE       0     0     0
          c0t600A09805176465657244536514A764Ad0  ONLINE       0     0     0
          c0t600A09805176465657244536514A764Bd0  ONLINE       0     0     0
          c0t600A098051764656362B45346145464Cd0  ONLINE       0     0     0
 
errors: No known data errors
 
or export and import the zpool.
 
# zpool export n_zpool_site_b
# zpool import n_zpool_site_b

此池现已联机。
如果上述步骤不能恢复池、请重新引导主机。

存储虚拟机( SVM )( metaset )
确保所有 LUN 均联机、重新引导系统并装入存储虚拟机( SVM )。

追加信息

在此处添加您的文本。

 

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.