跳转到主内容

ONTAP のアップグレード後にNS224 NSM100ディスクシェルフモジュールのエラーおよびヘルスアラート

Views:
8
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw<a>2009289221</a>
Last Updated:

環境

  • ONTAP 9
  • AFFおよびNS224ディスクシェルフ
  • NSM100ディスクシェルフモジュール

問題

  • System Managerを使用してONTAPの自動アップグレード(ANDU)が開始される
  • ONTAPのアップグレードがエラーなし で正常に完了し、クラスタが正常に稼働している
  • 数分 後に健全性アラートが生成されます

Sat Sep 03 14:57:43 +0100 [cluster1-node2: mgwd: callhome.hm.alert.major:alert]: Call home for Health Monitor process nchm: NoPathToNSMA_Alert[7867034284049604608].

  • ディスク シェルフモジュールAを処理するエラーがイベントログに記録される

Sat Sep 03 14:59:01 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module A in shelf: 0x.1.0.99.1, log: Sat Sep  3 13:57:58 2022 (   0+00:00:39.013); 02000233; U?; HAL; hal; 04; +++ Application version 0165 launching +++
Sat Sep 03 14:59:01 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 13:58:03 2022 (   0+00:00:44.016); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)'}
Sat Sep 03 14:59:01 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot.unexp:error]: Unexpected reboot event reported by module A in shelf: 0x.1.0.99.1, log: Sat Sep  3 13:58:03 2022 (   0+00:00:44.016); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)
Sat Sep 03 14:59:01 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 13:58:28 2022 (   0+00:01:09.341); 03140023; S0; ENC_MGT; BrdgMgr; 02; BrdgMgr: BridgeIO log: Tahiti Bridge IO v1.6.4 is running'}

Sat Sep 03 14:59:28 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module A in shelf: 0x.0.0.99.0, log: Sat Sep  3 13:57:58 2022 (   0+00:00:39.008); 02000233; U?; HAL; hal; 04; +++ Application version 0165 launching +++
Sat Sep 03 14:59:28 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 13:58:02 2022 (   0+00:00:43.510); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)'}
Sat Sep 03 14:59:28 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot.unexp:error]: Unexpected reboot event reported by module A in shelf: 0x.0.0.99.0, log: Sat Sep  3 13:58:02 2022 (   0+00:00:43.510); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)
Sat Sep 03 14:59:28 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 13:58:35 2022 (   0+00:01:16.667); 03140023; S0; ENC_MGT; BrdgMgr; 02; BrdgMgr: BridgeIO log: Tahiti Bridge IO v1.6.4 is running'}

  • 15分後、同じディスクシェルフのモジュールAとモジュールBでファームウェアが一致していないことを示すエラーが記録され、 システムは シングルパスHA状態になります。

Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: ses.mismatch.fw.version:error]: The disk shelf modules on disk shelf 0x.0 are running two different firmware versions. Disk shelf module A is running 0163, and disk shelf module B is running 0141.
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: sfu.firmwareDownrev.shelf:error]: Shelf 0x.shelf0 has downrev firmware.
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: ses.mismatch.fw.version:error]: The disk shelf modules on disk shelf 0x.1 are running two different firmware versions. Disk shelf module A is running 0163, and disk shelf module B is running 0141.
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: sfu.firmwareDownrev.shelf:error]: Shelf 0x.shelf1 has downrev firmware.
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: shelf.config.tospha:info]: System has transitioned to single path HA attached storage
Sat Sep 03 15:13:35 +0100 [cluster1-node1: dsa_disc: shelf.config.spha:info]: System is using single path HA attached storage only.

  • ディスクシェルフモジュールBについて、約25分後に同様の「unexpected reboot disk shelf module A」エラーが表示される

Sat Sep 03 15:22:02 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module B in shelf: 0x.0.3.99.0, log: Sat Sep  3 14:20:58 2022 (   0+00:00:39.244); 02000233; U?; HAL; hal; 04; +++ Application version 0165 launching +++
Sat Sep 03 15:22:02 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 14:21:03 2022 (   0+00:00:44.246); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)'}
Sat Sep 03 15:22:02 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot.unexp:error]: Unexpected reboot event reported by module B in shelf: 0x.0.3.99.0, log: Sat Sep  3 14:21:03 2022 (   0+00:00:44.246); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)
Sat Sep 03 15:22:02 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 14:21:29 2022 (  395+02:40:24.173); 03140023; S1; ENC_MGT; BrdgMgr; 02; BrdgMgr: BridgeIO log: Tahiti Bridge IO v1.6.4 is running'}

Sat Sep 03 15:23:32 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot:notice]: Reboot event reported by module B in shelf: 0x.1.3.99.1, log: Sat Sep  3 14:22:33 2022 (   0+00:00:39.335); 02000233; U?; HAL; hal; 04; +++ Application version 0165 launching +++
Sat Sep 03 15:23:32 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 14:22:38 2022 (   0+00:00:43.837); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)'}
Sat Sep 03 15:23:32 +0100 [cluster1-node2: storlog_admin: sla.shelf.mod.reboot.unexp:error]: Unexpected reboot event reported by module B in shelf: 0x.1.3.99.1, log: Sat Sep  3 14:22:38 2022 (   0+00:00:43.837); 02000093; U?; HAL; hal; 04; Module Reboot: Startup type 5-Crash reset (regVal:0x40)
Sat Sep 03 15:23:32 +0100 [cluster1-node2: storlog_admin: sla.shelf.message:debug]: params: {'type': 'SEVERITY', 'log': 'Sat Sep  3 14:23:01 2022 (  395+02:45:37.461); 03140023; S1; ENC_MGT; BrdgMgr; 02; BrdgMgr: BridgeIO log: Tahiti Bridge IO v1.6.4 is running'}

  • この頃、他のディスクシェルフモジュールエラーが発生しています。

Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureWarning:alert]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature warning for Temperature sensor 12: not installed or failed. Current temperature: 25 C (77 F). This element is on the unknown location.
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.electronicsWarn:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x environmental monitoring warning for SES electronics 2: communication error. ; enclosure services hardware failed This element is on the rear of the shelf at the bottom, on module B.
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.ModuleWarn:alert]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x PCI switch warning for PCI Switch 2: communication error. This element is on the rear of the shelf at the bottom, on module B.
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.ACPWarn:error]: NS224NSM100 (S/N SHFHU212200xxx) shelf 1 on channel 0x ACP Processor warning for shelf ACP processor 2: communication error. ; Alternate Control Path hardware failed e B.
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM failure for Dimm Element 5: not installed or failed. This element is on the DIMM slot 1 in the bottom shelf module (B).
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM failure for Dimm Element 6: not installed or failed. This element is on the DIMM slot 2 in the bottom shelf module (B).
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM failure for Dimm Element 7: not installed or failed. This element is on the DIMM slot 3 in the bottom shelf module (B).
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM failure for Dimm Element 8: not installed or failed. This element is on the DIMM slot 4 in the bottom shelf module (B).
Sat Sep 03 15:17:19 +0100 [cluster1-node1: dsa_worker4: ses.status.battery.error:error]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x battery failure error for Coin Battery 2: not installed or hardware failure. This element is on the rear of the shelf, in bottom module (B).

  • ディスクシェルフモジュールのエラーは、モジュールのリブート後にクリアされる

Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.ModuleInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x PCI switch information for PCI Switch 2: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.ACPInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x ACP Processor information for shelf ACP processor 2: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM notification for Dimm Element 5: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM notification for Dimm Element 6: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM notification for Dimm Element 7: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.dimm.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x DIMM notification for Dimm Element 8: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.battery.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x battery information for Coin Battery 2: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.etherConn.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x Ethernet connector information for port e0a: normal status.
Sat Sep 03 15:23:29 +0100 [cluster1-node1: dsa_worker4: ses.status.etherConn.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x Ethernet connector information for port e0b: normal status.
Sat Sep 03 15:23:38 +0100 [cluster1-node1: dsa_worker0: ses.status.bootDv.info:notice]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x boot device notification for Boot device 2: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 12: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 13: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 14: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 15: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 16: normal status.
Sat Sep 03 15:23:56 +0100 [cluster1-node1: dsa_worker4: ses.status.temperatureInfo:info]: NS224NSM100 (S/N SHFHU212200xxxx) shelf 1 on channel 0x temperature information for Temperature sensor 17: normal status.

  • ディスクシェルフモジュールAとBの両方がリブートすると、クラスタのアラートがクリアされ、システムが正常なマルチパス状態に戻ります。

Sat Sep 03 15:26:23 +0100 [cluster1-node1: nchmd: hm.alert.cleared:notice]: Alert Id = NoPathToNSMA_Alert , Alerting Resource = 7867034284049604608 cleared by monitor node-connect
Sat Sep 03 15:26:23 +0100 [cluster1-node1: nchmd: hm.alert.cleared:notice]: Alert Id = NoPathToNSMA_Alert , Alerting Resource = 8299379848277172224 cleared by monitor node-connect
Sat Sep 03 15:33:41 +0100 [cluster1-node1: start_asup_collector_thread: shelf.config.tompha:info]: System has transitioned to multi-path HA attached storage

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.