跳转到主内容

shutdown pending (degraded mode) critical—AutoSupport 消息

Views:
28
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw
Last Updated:

适用场景

  • ONTAP 9
  • CallHome.shutdown。待定
  • monitor.shutdown.brokenDisk
  • HA Group Notification from NODE_name (shutdown pending (降级模式))警报

事件摘要

CallHome.shutdown。待定

如果由于RAID组降级而启动自动关闭序列、并且由于没有足够的适当备用磁盘而无法重建、则会出现此消息。即、RAID组已完全降级。 

  • "完全降级"的定义 取决于聚合使用的RAID组类型:
    • RAID4 - RAID组有一个磁盘缺失或出现故障
    • RAID-DP —RAID组有两个缺失磁盘或发生故障的磁盘
    • RAID-TEC - RAID组有三个缺失磁盘或发生故障的磁盘
    • 如果镜像聚合的两个plexes在同一定位RAID组中缺少磁盘或出现故障磁盘、则该聚合将被视为"已完全降级"。
  • 在9.12.1之前的ONTAP版本中,如果系统在完全降级模式下按定义的超时时间间隔运行,则会自动暂停以防止RAID组完整性故障和可能的数据丢失。
    • 默认超时为24小时。
  • 如果在系统以降级模式运行时备用驱动器可用、则系统会立即开始重建故障驱动器。

验证

事件日志

event log show -severity * -message-name callhome*

[node1: statd: callhome.shutdown.pending:alert]: Call home for SHUTDOWN PENDING (degraded mode)

event log show -severity * -message-name monitor.brokenDisk*

[node1: statd: monitor.brokenDisk.notice:info]: When two disks are broken in raid_dp volume, the system shuts down automatically every 24 hours to encourage you to replace the disk. If you reboot the system it will run for another 24 hours before shutting down. (The 24 hour timeout may be increased by altering the "raid.timeout" value using the "options" command.)

[node1: statd: monitor.shutdown.brokenDisk.pending:notice]: two data disks in RAID group "/aggregate_name/plex0/rg0" are broken. Halting system in 24 hours.

命令行

要验证聚合状态、请运行 storage aggregate show-status

RAID group /aggregate_name/plex0/rg1 (double degraded, block checksums) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- dparity 0b.07.12 0b 7 12 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368 parity 0b.07.13 0b 7 13 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368 data FAILED N/A 1713523/ - data 0b.07.15 0b 7 15 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368 data FAILED N/A 1713523/ - data 0b.07.21 0b 7 21 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368

 重新启动 storage failover show以验证 包含需要重建/更换的磁盘的聚合是否处于部分恢复状态

::>storage failover show
                              Takeover
Node             Partner        Possible State Description
--------------   -------------- -------- -------------------------------------
Node-1           Node-2      true     Connected to Node-2, Partial giveback
Node-2           Node-1      true     Connected to Node-1.

 

解决方法

  1. 检查是否存在未分配的磁盘。将其分配给需要备件才能开始重建的节点(重建开始后、状态应消失):

::>storage disk show -container-type unassigned

::>storage disk assign -disk <stackID>.<shelfID>.<bayID> -owner <node name>

  1. 如果处于 部分返回 状态,请完成此返回。请参阅 磁盘在部分恢复状态下不重建或清空
  2. 更换任何故障驱动器。请参阅此知识库文章以检查您的部件状态- 磁盘故障- AutoSupport消息
临时解决策

如需进一步帮助:

Please contact NetApp Technical Support or log into the NetApp Support Site to create a case. Reference this article for further assistance.

追加信息

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.