ONTAP升级后会显示重复的聚合
适用场景
- FAS2750
- 自动化无中断升级(ANDU)
- 后台磁盘固件更新(BDFU)
- 通过9.8P21从ONTAP 9.7P15 9.10.1P17升级
问题描述
- 在ONTAP从9.8P21升级到9.10.1P17期间,磁盘(
0b.00.11
)脱机并 标记为缺失。- 磁盘固件更新导致磁盘脱机。
- 聚合
aggr01
已降级并缺少磁盘。
node04 EMS日志:
[?] Thu Dec 12 21:17:46 +0900 [node04: cf_giveback: ha.giveback.sysCommit:info]: Subsystem qos_ll_sfo_giveback took 151 msecs to commit giveback of aggregate 'aggr01'.
[?] Thu Dec 12 21:17:46 +0900 [node04: config_thread: raid.disk.assign.offline_ref:debug]: aggregate /aggr01/plex0/rg0/0b.00.5 assigned as an offline reference storage for /aggr01/plex0/rg0/0b.00.11.
[?] Thu Dec 12 21:17:46 +0900 [node04: config_thread: raid.disk.assign.offline_ref:debug]: aggregate /aggr01/plex0/rg0/0a.01.3 assigned as an offline reference storage for /aggr01/plex0/rg0/0b.00.11.
[?] Thu Dec 12 21:17:46 +0900 [node04: config_thread: raid.rg.degraded:notice]: : Raid group /aggr01/plex0/rg0 is degraded
[?] Thu Dec 12 21:17:46 +0900 [node04: config_thread: raid.disk.offline:notice]: Marking Disk /aggr01/plex0/rg0/0b.00.11 Shelf 0 Bay 11 [NETAPP X343_SSKBE1T8A10 NA02] S/N [WXXXXXXN] UID [5000C500:DE81263B:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000] offline.
[?] Thu Dec 12 21:17:46 +0900 [node04: bg_disk_fw_update_admin: bdfu.selected:info]: Disk 0b.00.11 [NETAPP X343_SSKBE1T8A10 NA02] S/N [WXXXXXXN] selected for background disk firmware update.
[?] Thu Dec 12 21:17:46 +0900 [node04: config_thread: raid.disk.online:notice]: Onlining Disk /aggr01/plex0/rg0/0b.00.11 Shelf 0 Bay 11 [NETAPP X343_SSKBE1T8A10 NA02] S/N [WXXXXXXN] UID [5000C500:DE81263B:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]
- 交还后,将使用备用磁盘
0b.00.23
重建它。
node03 EMS日志:
[?] Thu Dec 12 21:17:47 +0900 [node03: config_thread: raid.rg.recons.missing:notice]: RAID group /aggr01/plex0/rg0 is missing 1 disk(s).
[?] Thu Dec 12 21:17:47 +0900 [node03: config_thread: raid.rg.recons.info:notice]: Spare disk 0b.00.23 will be used to reconstruct one missing disk in RAID group /aggr01/plex0/rg0.
[?] Thu Dec 12 21:17:47 +0900 [node03: config_thread: raid.rg.recons.start:notice]: Disk /aggr01/plex0/rg0/0b.00.23 Shelf 0 Bay 23 [NETAPP X343_SSKBE1T8A10 NA02] S/N [WXXXXXXG] UID [5000C500:DE8204D7:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]: starting reconstruction, using disk 0b.00.23, disk block 5248.
[?] Thu Dec 12 21:17:47 +0900 [node03: config_thread: raid.vol.undestroy.info.missing:info]: params: {'disk_info': 'Disk /aggr01/plex0/rg0/0b.00.23 Shelf 0 Bay 23 [NETAPP X343_SSKBE1T8A10 NA02] S/N [WXXXXXXG] UID [5000C500:DE8204D7:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]', 'shelf': '0', 'bay': '23', 'vendor': 'NETAPP ', 'model': 'X343_SSKBE1T8A10', 'firmware_revision': 'NA02', 'serialno': 'WXXXXXXG', 'disk_type': '4', 'disk_rpm': '10000', 'carrier': '', 'site': 'Local'}
- 更换 另一个故障磁盘后、
node04
故障转移状态已更改为部分恢复。
::> storage failover show
Takeover
Node Partner Possible State Description
-------------- -------------- -------- -------------------------------------
node03 node04 true Connected to node04
node04 node03 true Connected to node03, Partial giveback
2 entries were displayed.
- 在两个HA节点上,都会显示
aggr01
,而在node04
上,仅会显示缺少的磁盘,而其他节点则标记为FAILED
。
node04 Sysconfig -r:
Aggregate aggr01 (failed, raid_dp, partial, fast zeroed) (block checksums) Plex /aggr01/plex0 (offline, failed, inactive) RAID group /aggr01/plex0/rg0 (partial, block checksums)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity FAILED N/A 1713523/ -
parity FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data 0b.00.11 0b 0 11 SA:B 0 SAS 10000 1713523/3509295616 1716957/3516328368 (fast zeroed)
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
data FAILED N/A 1713523/ -
Raid group is missing 18 disks.