跳转到主内容

ONTAP 升级后的NS224 NSM100磁盘架模块错误和运行状况警报

Views:
8
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
core<a>2009年289221</a>
Last Updated:

适用场景

  • ONTAP 9
  • AFF 和NS224磁盘架
  • NSM100磁盘架模块

问题描述

  • 自动ONTAP 升级(ANDU)可使用System Manager启动
  • ONTAP 升级成功完成、无错误、集群运行状况良好
  • 几分钟后、系统将发出运行状况警报

Sat Sep 03 14:57:43 +0100 [cluster1-node2: mgwd: callhome.hm.alert.major:alert]: Call home for Health Monitor process nchm: NoPathToNSMA_Alert[7867034284049604608].

  • 处理磁盘架模块A的错误会显示在事件日志中

Sat Sep 03 14:59:01 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module A in shelf: 0x.1.0.99.1, log: Sat Sep  3 13:57:58 2022 (   0+00:00:39.013); 02000233; U?; HAL; hal; 04; +++ Application version 0165 launching +++
Sat Sep 03 14:59:01 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 13:58:03 2022 (   0+00:00:44.016); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)'}
Sat Sep 03 14:59:01 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot.unexp:error]: Unexpected reboot event reported by module A in shelf: 0x.1.0.99.1, log: Sat Sep  3 13:58:03 2022 (   0+00:00:44.016); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)
Sat Sep 03 14:59:01 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 13:58:28 2022 (   0+00:01:09.341); 03140023; S0; ENC_MGT; BrdgMgr; 02; BrdgMgr: BridgeIO log: Tahiti Bridge IO v1.6.4 is running'}

Sat Sep 03 14:59:28 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module A in shelf: 0x.0.0.99.0, log: Sat Sep  3 13:57:58 2022 (   0+00:00:39.008); 02000233; U?; HAL; hal; 04; +++ Application version 0165 launching +++
Sat Sep 03 14:59:28 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 13:58:02 2022 (   0+00:00:43.510); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)'}
Sat Sep 03 14:59:28 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot.unexp:error]: Unexpected reboot event reported by module A in shelf: 0x.0.0.99.0, log: Sat Sep  3 13:58:02 2022 (   0+00:00:43.510); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)
Sat Sep 03 14:59:28 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 13:58:35 2022 (   0+00:01:16.667); 03140023; S0; ENC_MGT; BrdgMgr; 02; BrdgMgr: BridgeIO log: Tahiti Bridge IO v1.6.4 is running'}

  • 15分钟后、系统会记录错误、指出同一磁盘架的模块A和B之间的固件不匹配、因此系统处于单路径HA状态

Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: ses.mismatch.fw.version:error]: The disk shelf modules on disk shelf 0x.0 are running two different firmware versions. Disk shelf module A is running 0163, and disk shelf module B is running 0141.
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: sfu.firmwareDownrev.shelf:error]: Shelf 0x.shelf0 has downrev firmware.
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: ses.mismatch.fw.version:error]: The disk shelf modules on disk shelf 0x.1 are running two different firmware versions. Disk shelf module A is running 0163, and disk shelf module B is running 0141.
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: sfu.firmwareDownrev.shelf:error]: Shelf 0x.shelf1 has downrev firmware.
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: shelf.config.tospha:info]: System has transitioned to single path HA attached storage
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: shelf.config.spha:info]: System is using single path HA attached storage only.

  • 大约25分钟后、磁盘架模块B会出现类似的"意外重新启动磁盘架模块A"错误

Sat Sep 03 15:22:02 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module B in shelf: 0x.0.3.99.0, log: Sat Sep  3 14:20:58 2022 (   0+00:00:39.244); 02000233; U?; HAL; hal; 04; +++ Application version 0165 launching +++
Sat Sep 03 15:22:02 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 14:21:03 2022 (   0+00:00:44.246); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)'}
Sat Sep 03 15:22:02 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot.unexp:error]: Unexpected reboot event reported by module B in shelf: 0x.0.3.99.0, log: Sat Sep  3 14:21:03 2022 (   0+00:00:44.246); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)
Sat Sep 03 15:22:02 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 14:21:29 2022 (  395+02:40:24.173); 03140023; S1; ENC_MGT; BrdgMgr; 02; BrdgMgr: BridgeIO log: Tahiti Bridge IO v1.6.4 is running'}

Sat Sep 03 15:23:32 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module B in shelf: 0x.1.3.99.1, log: Sat Sep  3 14:22:33 2022 (   0+00:00:39.335); 02000233; U?; HAL; hal; 04; +++ Application version 0165 launching +++
Sat Sep 03 15:23:32 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 14:22:38 2022 (   0+00:00:43.837); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)'}
Sat Sep 03 15:23:32 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot.unexp:error]: Unexpected reboot event reported by module B in shelf: 0x.1.3.99.1, log: Sat Sep  3 14:22:38 2022 (   0+00:00:43.837); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)
Sat Sep 03 15:23:32 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 14:23:01 2022 (  395+02:45:37.461); 03140023; S1; ENC_MGT; BrdgMgr; 02; BrdgMgr: BridgeIO log: Tahiti Bridge IO v1.6.4 is running'}

  • 此时、可能会出现其他磁盘架模块错误

Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureWarning:alert]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature warning for Temperature sensor 12: not installed or failed. Current temperature: 25 C (77 F). This element is on the unknown location.
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.electronicsWarn:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x environmental monitoring warning for SES electronics 2: communication error. ; enclosure services hardware failed This element is on the rear of the shelf at the bottom, on module B.
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.ModuleWarn:alert]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x PCI switch warning for PCI Switch 2: communication error. This element is on the rear of the shelf at the bottom, on module B.
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.ACPWarn:error]: NS224NSM100 (S/N SHFHU212200xxx) shelf 1 on channel 0x ACP Processor warning for shelf ACP processor 2: communication error. ; Alternate Control Path hardware failed e B.
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM failure for Dimm Element 5: not installed or failed. This element is on the DIMM slot 1 in the bottom shelf module (B).
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM failure for Dimm Element 6: not installed or failed. This element is on the DIMM slot 2 in the bottom shelf module (B).
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM failure for Dimm Element 7: not installed or failed. This element is on the DIMM slot 3 in the bottom shelf module (B).
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM failure for Dimm Element 8: not installed or failed. This element is on the DIMM slot 4 in the bottom shelf module (B).
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.battery.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x battery failure error for Coin Battery 2: not installed or hardware failure. This element is on the rear of the shelf, in bottom module (B).

  • 磁盘架模块错误稍后会在模块重新启动后清除

Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.ModuleInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x PCI switch information for PCI Switch 2: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.ACPInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x ACP Processor information for shelf ACP processor 2: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM notification for Dimm Element 5: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM notification for Dimm Element 6: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM notification for Dimm Element 7: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM notification for Dimm Element 8: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.battery.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x battery information for Coin Battery 2: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.etherConn.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x Ethernet connector information for port e0a: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.etherConn.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x Ethernet connector information for port e0b: normal status.
Sat Sep 03 15:23:38 +0100 [cluster1-node1: dsa_worker0: ses.status.bootDv.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x boot device notification for Boot device 2: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 12: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 13: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 14: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 15: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 16: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 17: normal status.

  • 在磁盘架模块A和B重新启动后、集群警报将清除、系统将返回运行状况良好的多路径状态

Sat Sep 03 15:26:23 +0100 [cluster1-node1: nchmd: hm.alert.cleared:notice]: Alert Id = NoPathToNSMA_Alert , Alerting Resource = 7867034284049604608 cleared by monitor node-connect
Sat Sep 03 15:26:23 +0100 [cluster1-node1: nchmd: hm.alert.cleared:notice]: Alert Id = NoPathToNSMA_Alert , Alerting Resource = 8299379848277172224 cleared by monitor node-connect
Sat Sep 03 15:33:41 +0100 [cluster1-node1: start_asup_collector_thread: shelf.config.tompha:info]: System has transitioned to multi-path HA attached storage

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.
Scan to view the article on your device