跳转到主内容

Element 软件可能会误报内存错误并导致 MemClr0 上的 memyccThreshold 出现集群故障

Views:
3
Visibility:
Public
Votes:
0
Category:
element-software
Specialty:
solidfire
Last Updated:

适用场景

  • NetApp Element 软件 12.0 和 12.2
  • NetApp SolidFire SF 系列产品线
  • NetApp H-Series 存储节点

问题描述

  • NetApp Element 软件可能会将 DIMM 上的可更正错误误报为节点内存控制器上的可更正错误
  • 节点内存控制器上 ECC 错误的默认设置过于主动,即使出现一个错误,也会导致持续出现错误严重性集群故障。
  • 以下是 NetApp SolidFire Active IQ 和集群 UI 中显示的集群故障
    • 错误代码 memoryEccThreshold
    • 详细信息: Correctable ECC memory error count crossed threshold on Memory controller: MemCtlr0
  • 实际上,节点的 BMC 系统事件日志( SEL )会在发生集群故障的同时报告 DIMM 上的错误
    • [Information]  [Memory Error]   [Memory]            Correctable ECC (CPU_A0) - Asserted

 

CUSTOMER EXCLUSIVE CONTENT

Registered NetApp customers get unlimited access to our dynamic Knowledge Base.

New authoritative content is published and updated each day by our team of experts.

Current Customer or Partner?

Sign In for unlimited access

New to NetApp?

Learn more about our award-winning Support