跳转到主内容

由于NVDIMM上存在不可更正的错误、H610S节点脱机并处于启动环路中

Views:
18
Visibility:
Public
Votes:
0
Category:
element-software
Specialty:
solidfire
Last Updated:

适用场景

  • 采用BIOS [B06的NetApp SolidFire H610S
  • NetApp Element软件12.3.X及更低版本

问题描述

  • 多个节点或单个节点 脱机并处于启动环路中
    • 节点尝试启动、但 在加载Element之前失败
    •  出现NetApp闪屏后立即重新启动 
  • BMC系统事件日志(SEL)将显示以下内容:
    • [CATERR] Machine Check Exception (MCERR) 
    • [MCERR] Uncorrectable Error - Machine Check Error
    • [Memory Error] Uncorrectable ECC(CPU0_<xx>)
  • 可能会显示卷脱机或降级消息 

示例:当多个节点受到影响时发出Active IQ错误警报 

The following volumes are offline. [X, X, X, X, X, X]

The SolidFire Application cannot communicate with Storage node having node ID 11.

Cluster Block Data is in a degraded state, and the auto-heal process to restore full block data redundancy cannot proceed. Either too many nodes or block services are offline, or the cluster block services are too full.

示例:BMC Web图形用户界面中的SEL

 1160 Sep/8/2022 20:16:41 [Information] [Power Unit] [Power Unit] Power Off / Power Down - Deasserted 1159 Sep/8/2022 20:16:36 [Critical] [CATERR] [Processor] Machine Check Exception (MCERR) - Asserted 1158 Sep/8/2022 20:16:36 [Information] [Power Unit] [Power Unit] Power Off / Power Down - Asserted 1157 Sep/8/2022 20:16:35 [Warning] [Additional MCE Error] [OEM Record C2] ManufacturerID:001C4C, Extra Information : 0 MSCOD:0010 MCACOD:0134 1156 Sep/8/2022 20:16:35 [Critical] [CATERR] [Processor] Machine Check Exception (MCERR) - Asserted 1155 Sep/8/2022 20:16:35 [Critical] [MCERR] [Processor] Uncorrectable Error - Machine Check Error: Bank 1/CPU 0/Core 2 - Asserted 1154 Sep/8/2022 20:16:35 [Critical] [Memory Error] [Memory] Uncorrectable ECC(CPU0_F1) - Asserted 

:在H610S型号上,NVDIMM位于特定插槽中。  H610S1/S2 - CPU0_C0和CPU0_F0、  H610S4 - CPU0_C1和CPU0_F1

示例: 从ipmitool输出中选择

SEL Record ID : 0482 Record Type : 02 Timestamp : 09/08/2022 20:16:35 Generator ID : 0001 EvM Revision : 04 Sensor Type : Memory Sensor Number : 87 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data : a1ff29 Description : Uncorrectable ECC SEL Record ID : 0483 Record Type : 02 Timestamp : 09/08/2022 20:16:35 Generator ID : 0001 EvM Revision : 04 Sensor Type : Processor Sensor Number : a8 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data : ab0102 Description : Uncorrectable machine check exception SEL Record ID : 0484 Record Type : 02 Timestamp : 09/08/2022 20:16:35 Generator ID : 0020 EvM Revision : 04 Sensor Type : Processor Sensor Number : 74 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data : 0bffff Description : Uncorrectable machine check exception SEL Record ID : 0485 Record Type : c2 (OEM timestamped) Timestamp : 09/08/2022 20:16:35 Manufactacturer ID : 001c4c OEM Defined : 000010003401 [......] SEL Record ID : 0486 Record Type : 02 Timestamp : 09/08/2022 20:16:36 Generator ID : 0020 EvM Revision : 04 Sensor Type : Power Unit Sensor Number : 77 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data : 00ffff Description : Power off/down SEL Record ID : 0487 Record Type : 02 Timestamp : 09/08/2022 20:16:36 Generator ID : 0020 EvM Revision : 04 Sensor Type : Processor Sensor Number : 74 Event Type : Sensor-specific Discrete Event Direction : Assertion Event Event Data : 0bffff Description : Uncorrectable machine check exception SEL Record ID : 0488 Record Type : 02 Timestamp : 09/08/2022 20:16:41 Generator ID : 0020 EvM Revision : 04 Sensor Type : Power Unit Sensor Number : 77 Event Type : Sensor-specific Discrete Event Direction : Deassertion Event Event Data : 00ffff Description : Power off/down

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.