跳转到主内容

Active IQ Unified Manager 警报:由于磁盘损坏、聚合RAID状态正在重建

Views:
19
Visibility:
Public
Votes:
0
Category:
oncommand-unified-manager<a>2009402617</a>
Specialty:
om
Last Updated:

适用场景

  • Active IQ Unified Manager
  • OnCommand Unified Manager ( UM ) 
  • Data ONTAP 8

问题描述

  • OnCommand Unified Manager报告磁盘正在重建状态。

-------------------------------------
Alert from OnCommand Unified Manager: Aggregate
Reconstructing
A risk was generated by XXXXXXXXXX that requires your attention.
Risk      - Aggregate
Reconstructing
Impact Area   - Availability
Severity    - Warning
Source     - node-1:aggr01
Trigger Condition -
Aggregate aggr01's RAID status is reconstructing because of broken disks -
.
-------------------------------------

  • EMS日志显示该磁盘的命令超时和高延迟、并且正在对其进行测试。

Thu Dec 15 09:51:06 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 09:51:32 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 09:52:18 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 09:59:14 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 10:29:53 JST [node-1: pmcsas_timeout_1: sas.device.quiesce:info]: Adapter 0c encountered a command timeout on disk device 0d.14.9. Quiescing the device.
Thu Dec 15 10:31:02 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 60 msecs and average utilization of 37 percent. Highest average IO latency: 0d.14.9: 60 msecs; next highest IO latency: 1d.22.6: 9 msecs
Thu Dec 15 10:31:32 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 77 msecs and average utilization of 46 percent. Highest average IO latency: 0d.14.9: 77 msecs; next highest IO latency: 1d.22.6: 9 msecs
Thu Dec 15 10:32:02 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 88 msecs and average utilization of 54 percent. Highest average IO latency: 0d.14.9: 88 msecs; next highest IO latency: 1d.22.6: 9 msecs
Thu Dec 15 10:32:32 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 94 msecs and average utilization of 62 percent. Highest average IO latency: 0d.14.9: 94 msecs; next highest IO latency: 1d.22.6: 9 msecs
Thu Dec 15 10:33:02 JST [node-1: disk_latency_monito: shm.threshold.ioLatency:debug]: Disk 0d.14.9 has exceeded the expected IO latency in the current window with average latency of 103 msecs and average utilization of 68 percent. Highest average IO latency: 0d.14.9: 103 msecs; next highest IO latency: 1d.22.5: 9 msecs
Thu Dec 15 10:33:02 JST [node-1: disk_latency_monito: shm.threshold.highIOLatency:error]: Disk 0d.14.9 exceeds the average IO latency threshold and will be recommended for failure.
Thu Dec 15 10:33:03 JST [node-1: config_thread: raid.disk.maint.start:notice]: Disk /aggr_sas_01/plex0/rg2/0d.14.9 Shelf 14 Bay 9 [NETAPP   X422_HCOBE600A10 NA00] S/N [XXXXXXXX] will be tested.
Thu Dec 15 10:33:03 JST [node-1: disk_admin: disk.failmsg:error]: Disk 0d.14.9 (XXXXXXXX): exceeded latency threshold.
Thu Dec 15 10:33:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}
Thu Dec 15 10:34:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}
Thu Dec 15 10:35:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}
Thu Dec 15 10:36:30 JST [node-1: cfdisk_config: cf.disk.skipped:info]: params: {'status': '23', 'diskname': '0d.14.9'}

  • 测试完成后、磁盘将从系统中配置为UNAILED、并移至备用池。

[?] Thu Dec 15 12:21:54 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Device 0d.14.9 invalidate debounce - 40', 'adapterName': '0c'}
[?] Thu Dec 15 12:21:54 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '0c'}
[?] Thu Dec 15 12:21:55 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Device 1a.14.9 invalidate debounce - 40', 'adapterName': '1a'}
[?] Thu Dec 15 12:21:55 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '1a'}
[?] Thu Dec 15 12:21:59 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Device 0d.14.9 came back.', 'adapterName': '0c'}
[?] Thu Dec 15 12:21:59 JST [midst03-02: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '0c'}
[?] Thu Dec 15 12:22:01 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Device 1a.14.9 came back.', 'adapterName': '1a'}
[?] Thu Dec 15 12:22:01 JST [midst03-02: pmcsas_asyncd_3: sas.adapter.debug:info]: params: {'debug_string': 'Asyncd device scan done.', 'adapterName': '1a'}
[?] Thu Dec 15 12:25:22 JST [midst03-02: api_dpool_04: ems.engine.suppressed:debug]: Event 'od.rdb.mbox.debug' suppressed 4 times in last 897 seconds.
[?] Thu Dec 15 12:25:22 JST [midst03-02: api_dpool_04: od.rdb.mbox.debug:debug]: params: {'message': 'RDB-HA readPSlot: Read blob_type 3, (pslot 0), instance 0.'}
[?] Thu Dec 15 12:25:22 JST [midst03-02: mgwd: rdb.ha.verified:notice]: Verified that cluster high availability (HA) is configured correctly, and that on-disk mailboxes are intact.
[?] Thu Dec 15 12:26:09 JST [midst03-02: mgwd: rdb.ha.verified:notice]: Verified that cluster high availability (HA) is configured correctly, and that on-disk mailboxes are intact.
[?] Thu Dec 15 12:27:01 JST [midst03-02: disk_admin: disk.partner.diskUnfail:info]: The partner has unfailed 0d.14.9.

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.