跳转到主内容

NetApp Element 软件可能会错误报告内存错误、并导致MemCl 0上的memyEccThreshold出现集群故障

Views:
4
Visibility:
Public
Votes:
0
Category:
element-software<a>内存 EccThreshold</a><a>PE-12065</a>
Specialty:
solidfire
Last Updated:

适用场景

  • NetApp Element 软件 12.0 和 12.2
  • NetApp SolidFire SF 系列产品线
  • NetApp H-Series 存储节点

问题描述

  • NetApp Element 软件可能会将 DIMM 上的可更正错误误报为节点内存控制器上的可更正错误
  • 节点内存控制器上 ECC 错误的默认设置过于主动,即使出现一个错误,也会导致持续出现错误严重性集群故障。
  • 以下是 NetApp SolidFire Active IQ 和集群 UI 中显示的集群故障
    • 错误代码 memoryEccThreshold
    • 详细信息: Correctable ECC memory error count crossed threshold on Memory controller: MemCtlr0
  • 实际上,节点的 BMC 系统事件日志( SEL )会在发生集群故障的同时报告 DIMM 上的错误
    • [Information]  [Memory Error]   [Memory]            Correctable ECC (CPU_A0) - Asserted

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.