跳转到主内容

shutdown pending (degraded mode) critical—AutoSupport 消息

Views:
22
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw
Last Updated:

适用场景

  • ONTAP 9
  • callhome.shutdown.pending
  • monitor.shutdown.brokenDisk
  • node_name发出HA组通知(关闭待定(降级模式))警报

事件摘要

  • 如果磁盘驱动器发生故障、但没有合适的备用磁盘可用于重建、则会出现此消息。此外、此故障会导致聚合中的RAID组因另一个磁盘故障而失去任何剩余保护、即RAID组已完全降级。 
  • "完全降级"的定义取决于聚合使用的RAID组类型:
    • RAID4 - RAID组有一个磁盘缺失或出现故障
    • RAID-DP —RAID组有两个缺失磁盘或发生故障的磁盘
    • RAID-TEC - RAID组有三个缺失磁盘或发生故障的磁盘
    • 如果镜像聚合的两个plexes在同一定位RAID组中缺少磁盘或出现故障磁盘、则该聚合将被视为"已完全降级"。
  • 为了保护您的数据、系统将进入"降级模式"。
  • 在9.12.1之前的ONTAP版本中,如果系统在完全降级模式下按定义的超时时间间隔运行,则会自动暂停以防止RAID组完整性故障和可能的数据丢失。
    • 默认超时为24小时。
  • 如果系统在降级模式下运行时备用驱动器变为可用、系统将立即开始重建故障驱动器。

验证

事件日志

event log show -severity * -message-name callhome*

[node1: statd: callhome.shutdown.pending:alert]: Call home for SHUTDOWN PENDING (degraded mode)

event log show -severity * -message-name monitor.brokenDisk*

[node1: statd: monitor.brokenDisk.notice:info]: When two disks are broken in raid_dp volume, the system shuts down automatically every 24 hours to encourage you to replace the disk. If you reboot the system it will run for another 24 hours before shutting down. (The 24 hour timeout may be increased by altering the "raid.timeout" value using the "options" command.)

[node1: statd: monitor.shutdown.brokenDisk.pending:notice]: two data disks in RAID group "/aggregate_name/plex0/rg0" are broken. Halting system in 24 hours.

命令行

要验证聚合状态、请运行 storage aggregate show-status

RAID group /aggregate_name/plex0/rg1 (double degraded, block checksums) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- dparity 0b.07.12 0b 7 12 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368 parity 0b.07.13 0b 7 13 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368 data FAILED N/A 1713523/ - data 0b.07.15 0b 7 15 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368 data FAILED N/A 1713523/ - data 0b.07.21 0b 7 21 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368

 重新启动 storage failover show以验证 包含需要重建/更换的磁盘的聚合是否处于部分恢复状态

::>storage failover show
                              Takeover
Node             Partner        Possible State Description
--------------   -------------- -------- -------------------------------------
Node-1           Node-2      true     Connected to Node-2, Partial giveback
Node-2           Node-1      true     Connected to Node-1.

 

解决方法

  1. 检查是否存在未分配的磁盘。将其分配给需要备件才能开始重建的节点(重建开始后、状态应消失):

::>storage disk show -container-type unassigned

::>storage disk assign -disk <stackID>.<shelfID>.<bayID> -owner <node name>

  1. 如果处于 部分交还 状态、请完成交还。请参阅 在部分交还状态下磁盘不重建或清空
  2. 更换所有故障驱动器。请参阅此知识库文章以检查您 的部件状态-磁盘故障- AutoSupport消息
临时解决策

如需进一步帮助:

Please contact NetApp Technical Support or log into the NetApp Support Site to create a case. Reference this article for further assistance.

追加信息

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.
Scan to view the article on your device