在出现"调解 器关闭"和SyncMirror丛失败"警报后、节点重新启动并显示"reboot (Internal reboot)"
适用场景
- IP MetroCluster
- ONTAP 9
- 符合MetroCluster的交换机 (用于MCC后端和其他流量的共享交换机)
问题描述
- 节点将通过触发 以下Auto支持 来重新启动
HA Group Notification (MEDIATOR DOWN, AUSO DISABLED) ALERT
HA Group Notification (SYNCMIRROR PLEX FAILED) ALERT
HA Group Notification (REBOOT (internal reboot)) NOTICE
- EMS日志将显示调解器断开连接、 传输错误、 远程驱动器上的磁盘读取预留失败、接管已禁用错误的指示
Thu Mar 02 02:35:13 +1100 [node01: geom: geom.ontap.orphan.removing:notice]: Removing unit 0 type 5. Thu Mar 02 02:35:13 +1100 [node01: geom: geom.ontap.orphan.removing:notice]: Removing unit 1 type 5. Thu Mar 02 02:35:13 +1100 [node01: pha_remove000: mlm.array.lun.removed:notice]: Array LUN '0f.1' (3337633537306161) is no longer being presented to this node.Thu Mar 02 12:17:43 +1100 [node01: disk_admin: disk.readReservationFailed:error]: Disk read reservation failed on 0m.i1.0L14 CDB 0x5e:01 - SCSI:no sense (0 0 0) Thu Mar 02 12:17:43 +1100 [node01: disk_admin: disk.readReservationFailed:error]: Disk read reservation failed on 0m.i1.0L15 CDB 0x5e:01 - SCSI:no sense (0 0 0) Thu Mar 02 12:17:42 +1100 [node01: cf_main: cf.fsm.backupMailboxError:error]: Failover monitor: partner mailbox error detected. Thu Mar 02 12:17:42 +1100 [node01: cf_main: cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of node02 disabled (partner mailbox disks not accessible or invalid). Thu Mar 02 12:17:42 +1100 [node01: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node01 by node02 disabled (unsynchronized log). Thu Mar 02 12:17:43 +1100 [node01: svc_queue_thread: cf.ic.xferTimedOut:error]: HA interconnect: OFW transfer timed out. Thu Mar 02 12:17:43 +1100 [node01: cf_firmware: cf.fm.partnerFwTransition:info]: params: {'progresscounter': '0', 'newstate': 'SF_UNKNOWN', 'prevstate': 'SF_UP'} Thu Mar 02 12:17:42 +1100 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.transportErrorEMSOnly:error]: Disk device 0v.i1.1L42: Transport error during execution of command: HA status 0x9: cdb 0x28:00000008:0008. Thu Mar 02 12:17:42 +1100 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.transportErrorEMSOnly:error]: Disk device 0v.i1.1L79: Transport error during execution of command: HA status 0x9: cdb 0x28:00000008:0008.
- 以下事件将在EMS日志中报告、作为重新启动的原因
Thu Mar 02 12:17:42 +1100 [node01: fmmbx_instanceWorker: kern.shutdown.initiator:debug]: SK reboot was initiated by "maytag.ko::mia_mccip_local_write_partial_fail+568".