跳转到主内容

在 AFF A1K 中观察到 CriticalCECCCountMemErrAlert 和 BootDimmDisableAlert

Views:
Visibility:
Public
Votes:
0
Category:
aff-series
Specialty:
hw
Last Updated:

适用于

  • AFF A1K
  • 系统 DIMM 模块

问题

  • ONTAP 在 EMS 中针对一个 DIMM 模块触发 CriticalCECCCountMemErrAlertMessage 警报,如下所示

[CLUSTER-01: mgwd: callhome.hm.alert.critical:alert]: Call home for Health Monitor process nphm: CriticalCECCCountMemErrAlert[DIMM-32].

  • 命令 ::*> memory dimm show -node <node_name> 的输出将单个 DIMM 显示为"degraded"

::*> memory dimm show -node CLUSTER-01
  (system controller memory dimm show)
        DIMM    UECC  CECC  Alert   CPU       Slot      Failure
Node      Name   Count Count Method Socket Channel Number Status   Reason
------------- ------- ----- ----- ------ ------ ------- ------ ------- --------
NAS3_APP_A
        DIMM-1    0    0 bucket    1     7    0 ok      none
        ...
        ...
       DIMM-32    0 151597 bucket    0     3    0 degraded   none<<<<<<<
16 entries were displayed.

  • 更换受影响的 DIMM 无法解决此问题:
    • DIMM 在启动序列期间显示失败
    • 额外的 DIMM 失败
    • 多个 DIMM 模块被禁用

DIMM in slot 1 is disabled
DIMM in slot 5 is disabled
DIMM in slot 7 is disabled
DIMM in slot 12 is disabled
DIMM in slot 14 is disabled
DIMM in slot 16 is disabled
DIMM in slot 17 is disabled
DIMM in slot 21 is disabled
DIMM in slot 23 is disabled
DIMM in slot 28 is disabled
DIMM in slot 30 failed <<<<<< New failed
DIMM in slot 32 failed

  • 在启动顺序期间,观察到以下错误:

Apr 13 21:59:46 [CLUSTER-01:platform.reducedMemory:ALERT]: System memory (255 GB) is less than expected (1024 GB). Check DIMMs slots 1, 5, 7, 12, 14, 16, 17, 21, 23, 28, 30, 32.

  • 将 DIMM 模块更换到不同的插槽并不能解决此问题:

Initializing System Memory ...
DIMM:32 mapped out. BIOS MRC mapped out DIMM. Major / Minor Error Code: 0x46 / 0x03
Complete channel mapped out.

  • 系统可以启动,但会为每个禁用的 DIMM 触发新警报"BootDimmDisableAlert"

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.