跳转到主内容

NetApp wins prestigious Coveo Relevance Pinnacle Award. Learn more!

Element 软件可能会误报内存错误并导致 MemClr0 上的 memyccThreshold 出现集群故障

Views:
4
Visibility:
Public
Votes:
0
Category:
element-software
Specialty:
solidfire
Last Updated:

适用场景

  • NetApp Element 软件 12.0 和 12.2
  • NetApp SolidFire SF 系列产品线
  • NetApp H-Series 存储节点

问题描述

  • NetApp Element 软件可能会将 DIMM 上的可更正错误误报为节点内存控制器上的可更正错误
  • 节点内存控制器上 ECC 错误的默认设置过于主动,即使出现一个错误,也会导致持续出现错误严重性集群故障。
  • 以下是 NetApp SolidFire Active IQ 和集群 UI 中显示的集群故障
    • 错误代码 memoryEccThreshold
    • 详细信息: Correctable ECC memory error count crossed threshold on Memory controller: MemCtlr0
  • 实际上,节点的 BMC 系统事件日志( SEL )会在发生集群故障的同时报告 DIMM 上的错误
    • [Information]  [Memory Error]   [Memory]            Correctable ECC (CPU_A0) - Asserted

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

Scan to view the article on your device