由于缺少备用磁盘以及聚合长时间降级、WAFL 不一致

最后更新
另存为PDF

Views:: 22

Visibility:: Public

Votes:: 0

Category:: aff-series<a>2009-3440</a>

Specialty:: hw

Last Updated:

适用场景

AFF/FAS 系统
ONTAP 9

问题描述

EMS 在日志中：

系统已使用所有足够的备用磁盘、并报告备用磁盘不足

Sat Mar 12 05:17:26 +0200 [node_2: config_thread: raid.rg.spares.low:error]: /aggr2_2/plex0/rg0 Sat Mar 12 05:17:26 +0200 [node_2: config_thread: callhome.spares.low:error]: Call home for SPARES_LOW

下一个故障RAID组处于已降级状态

Mon Apr 04 02:00:00 +0200 [node_2: statd: monitor.raiddp.vol.singleDegraded:error]: data disk in RAID group "/aggr2_2/plex0/rg0" is broken.

磁盘故障继续存在

Thu May 05 21:03:07 +0200 [node_2: config_thread: raid.rg.recons.cantStart:error]: The reconstruction cannot start in RAID group /aggr2_2/plex0/rg0: No matching disks available in spare pool, targeting any spare pool

Wed May 04 03:00:00 +0200 [node_2: statd: monitor.brokenDisk.notice:notice]: When two disks are broken in raid_dp volume, the system shuts down automatically every 24 hours to encourage you to replace the disk. If you reboot the system, it will run for another 24 hours before shutting down.

Wed May 04 03:00:00 +0200 [node_2: statd: monitor.shutdown.brokenDisk.pending:notice]: two data disks in RAID group "/aggr2_2/plex0/rg0" are broken. Halting system in 24 hours.

提供了备用磁盘、并且开始重建

如果RAID组中存在有问题的磁盘、则重建无法完全重建并开始标记缺少的块

Fri May 06 10:05:51 +0200 [node_2: raidio_thread: raid_multierr_bad_block_1:error]: params: {'disk_rpm': '10000', 'vendor': 'NETAPP ', 'firmware_revision': 'NA02', 'shelf': '2', 'disk_info': 'Disk /aggr2_2/plex0/rg0/0a.02.23P1 Shelf 2 Bay 23 [NETAPP X343_SSKBE1T8A10 NA02] S/N [WBN1AJT5NP001] UID [6000C500:BCA9B53B:500A0981:00000001:00000000:00000000:00000000:00000000:00000000:00000000]', 'volumeBno': '1348939177', 'site': 'Local', 'bay': '23', 'carrier': '', 'serialno': 'WBN1AJT5NP001', 'owner': '', 'model': 'X343_SSKBE1T8A10', 'disk_type': '4', 'blockNum': '81428969'} Fri May 06 10:05:51 +0200 [node_2: raidio_thread: raid_multierr_bad_missingBlk_1:debug]: params: {'owner': '', 'rg': '/aggr2_2/plex0/rg0', 'blockNum': '81428969', 'vbn': '7381173545'}

当客户端发现损坏的数据时、它会触发不一致警报

Sun May 15 18:14:30 +0200 [node_2: wafl_exempt01: wafl.raid.incons.userdata:error]: WAFL inconsistent: inconsistent user data block at VBN 3581364492 (vvbn:567776529 fbn:664341713 level:0) in public inode (fileid:96 snapid:0 file_type:15 disk_flags:0x8402 error:120 raid_set:1) in volume node_02_vol@vserver:6456a9ee-6e12-11e8-99f3-01b099c9ade9. Sun May 15 18:14:30 +0200 [node_2: wafl_exempt01: wafl.incons.userdata.vol:alert]: WAFL inconsistent: volume vol_02_vol@vserver:6456a9ee-6e12-11e8-99f3-01b099c9ade9 has an inconsistent user data block. Note: Any new Snapshot copies might contain this inconsistency.