跳转到主内容

Coming soon...New Support-Specific categorization of Knowledge Articles in the NetApp Knowledge Base site to improve navigation, searchability and your self-service journey.

AFF A700s CECC :针对错误的 DIMM 报告可更正的计算机检查错误

Views:
24
Visibility:
Public
Votes:
0
Category:
aff-series
Specialty:
hw
Last Updated:

适用场景

  • AFF A700s
  • ONTAP 9
    • ONTAP 9.1P17 及更早版本
    • ONTAP 9.3P11 及更早版本
    • ONTAP 9.4P6 及更早版本

问题描述

即使在更换之后,也会在同一 DIMM 中报告 CECC 错误:

system health alert show该命令报告的错误与集群上的以下类似:

Node           xxxxxx
Monitor         controller
Alert ID         CriticalCECCCountMemErrAlert
Alerting Resource    DIMM-x
Subsystem        Memory
Indication Time     Tue Oct 09 12:24:36 2018
Perceived Severity    Critical
Probable Cause      DIMM_Degraded
Description       The DIMM has degraded, leading to memory errors.

The following are corrective actions:

1. Contact technical support to obtain a new DIMM of the same specification
2. If possible, perform a takeover of this node and bring the node down for maintenance
3. Refer to the DIMM replacement guide for your given hardware platform to replace the DIMM
4. Bring the storage system online

Possible Effect:
Memory issues can lead to a catastrophic system panic, which can lead to data downtime on the node.


EMS 日志显示类似于以下内容的消息:报告特定 DIMM 上的 CECC 错误:

[?] Tue Oct 09 12:24:36 IST [xxxx: mgwd: callhome.hm.alert.critical:alert]: Call home for Health Monitor process nphm: CriticalCECCCountMemErrAlert[DIMM-x].

通常建议更换此 DIMM 。
但是,即使在更换之后、集群也可能会报告同一 DIMM 中的错误。

 

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

Scan to view the article on your device