在ONTAP 升级期间出现节点崩溃、并显示"Disk shelf faultion"错误
适用场景
- ONTAP 升级
- AFF A400
- NSM224磁盘架
问题描述
- ONTAP 升级期间出现磁盘架错误:
[node_name-02: monitor: monitor.globalStatus.critical:EMERGENCY]: Disk shelf fault.
- 两个节点都会对每个根聚合触发多磁盘错误崩溃、但重新启动正常:
[node_name-01: send_boot_msg_thread: mgr.stack.string:notice]: Panic string: aggr node-01_root: raid volfsm, fatal multi-disk error.. Raid type - raid_dp Group name plex0/rg0 state NORMAL. 10 disks failed in the group. Disk e0d.01.1.6
[node_name-02: send_boot_msg_thread: mgr.stack.string:notice]: Panic string: aggr node-02_root: raid volfsm, fatal multi-disk error.. Raid type - raid_dp Group name plex0/rg0 state NORMAL. 10 disks failed in the group. Disk e0c.01.0.0
- 无法连接到其中一个NS224模块:
[node_name-02: mgwd: callhome.hm.alert.major:alert]: Call home for Health Monitor process nchm: NoPathToNSMB_Alert[...].
- NSM100固件更新在ONTAP 升级期间触发、但刚刚在一个模块中完成:
[node_name-01: dsa_sfu: sfu.firmwareDownrev:error]: Disk shelf firmware needs to be updated on 1 disk shelf.
[node_name-01: dsa_sfu: sfu.downloadStarted:info]: Update of disk shelf firmware started on 1 shelf.
[node_name-01: dsa_worker1: sfu.ctrllerElmntsPerShelf:info]: [storage download shelf]: 2 ES controller elements can be updated on 0x.shelf1.
[node_name-01: dsa_worker1: sfu.downloadingController:info]: [storage download shelf]: Downloading NSM100.0141.SFW on disk shelf controller module A on 0x.shelf1.
[node_name-01: dsa_sfu: sfu.rebootRequest:info]: Issuing a request to reboot disk shelf 0x.shelf1 module A.
- 具有以下多个的NSM100模块之间的NS224磁盘架固件不匹配:
[node_name-01: dsa_disc: ses.mismatch.fw.version:error]: The disk shelf modules on disk shelf 0x.1 are running two different firmware versions. Disk shelf module A is running 0141, and disk shelf module B is running 0131.
以及
Shelf 1: NS224NSM100 Firmware rev. NSM100 A: 0141 NSM100 B: 0131