跳转到主内容

NVDIMM 故障触发 MetroCluster 写入延迟升高

Views:
Visibility:
Public
Votes:
0
Category:
metrocluster
Specialty:
hw
Last Updated:

适用于

  • ONTAP 9
  • MetroCluster

问题

  • 在 NVDIMM(非易失性 DIMM)出现故障期间,集群上观察到写入延迟突然激增。该问题与以下事件序列同时发生:
    [node-01:cf_main:cf.fsm.takeover.panic:alert]: Failover monitor: takeover attempted after partner panic.
    [node-01:cf_takeover:cf.fm.takeoverComplete:notice]: Failover monitor: takeover completed
    [node-01:cf_main:cf.fsm.autoGivebackStarted:info]: Failover monitor: Automatic giveback started
    [node-01:cf_giveback:cf.fm.givebackComplete:notice]: Failover monitor: giveback completed
    [node-02:nphmd:hm.alert.cleared:notice]: AlertId=CriticalCECCCountMemErrAlert, AlertingResource=NVDIMM-11 cleared by monitor controller
  • Node-02 因 NVRAM 降级而发生系统崩溃,触发合作伙伴节点(Node-01)自动接管。
  • 接管完成后,ONTAP 执行自动回切,将聚合归还给受影响的节点。
  • 回切完成后,Node-02 继续在 NVRAM 降级的状态下运行,导致整个 MetroCluster 的写入延迟升高。

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.