SAN 主机将无法访问与结合使用的 LUN 物理 NetApp 磁盘故障
适用场景
- ONTAP 9
- Data ONTAP 7-模式
- 所有主机操作系统类型
问题描述
- 在物理磁盘出现间歇性问题的同时, NetApp LUN 对主机无响应或不可用,然后最终失败。
- 可能导致失败的消息示例:
Sun Oct 18 14:52:54 EDT [NTAP_CONTROLLER: pmcsas_timeout_0: sas.device.quiesce:info]: Adapter 0a encountered a command timeout on disk device 0b.04.15. Quiescing the device.
Sun Oct 18 14:52:58 EDT [NTAP_CONTROLLER: pmcsas_timeout_0: sas.device.timeout:error]: Adapter 0a encountered a device timeout on Disk device 0b.04.15.
Sun Oct 18 14:52:58 EDT [NTAP_CONTROLLER: pmcsas_timeout_0: sas.adapter.debug:info]: params: {'debug_string': 'Level 0 timeout: TM LUN reset: 0b.04.15 L0 (0xffffff06bab54ab8,0x2f:03a42c00:0400,0/0)', 'adapterName': '0a'}
Sun Oct 18 14:52:58 EDT [NTAP_CONTROLLER: pmcsas_timeout_0: sas.adapter.debug:info]: params: {'debug_string': 'LUN RESET on device 0b.04.15 L0', 'adapterName': '0a'}
Sun Oct 18 14:52:58 EDT [NTAP_CONTROLLER: pmcsas_timeout_0: sas.device.resetting:warning]: Resetting Disk device 0b.04.15 from adapter 0a.
Sun Oct 18 14:52:58 EDT [NTAP_CONTROLLER: pmcsas_intrd_0: scsi.cmd.abortedByHost:error]: Disk device 0b.04.15: Command aborted by host adapter: HA status 0x4: cdb 0x2f:03a42c00:0400
Sun Oct 18 14:52:58 EDT [NTAP_CONTROLLER: pmcsas_intrd_0: scsi.cmd.abortedByHost:error]: Disk device 0b.04.15: Command aborted by host adapter: HA status 0x4: cdb 0x28:1fb5a638:0008
Sun Oct 18 14:52:58 EDT [NTAP_CONTROLLER: pmcsas_intrd_0: scsi.cmd.abortedByHost:error]: Disk device 0b.04.15: Command aborted by host adapter: HA status 0x4: cdb 0x28:0a3d2840:0008
Sun Oct 18 16:45:14 EDT [NTAP_CONTROLLER: pmcsas_timeout_0: sas.device.resetting:warning]: Resetting Disk device 0b.04.15 from adapter 0a.
Sun Oct 18 16:56:45 EDT [NTAP_CONTROLLER: disk_server_0: disk.IO.status:debug]: params: {'deviceName': '0b.04.20', 'returnCode': '5', 'pathRetryCount': '0', 'adapterStatus': '0x0', 'cdb': '0x28:3027ca68:0078', 'basicTimeout': '5', 'iASCQ': '0x7', 'iSenseKey': '0x1', 'sSenseCode': '', 'ETime': '238', 'iASC': '0x18', 'victimRetryCount': '0', 'sSenseKey': 'SCSI:recovered error', 'targetStatus': '0x2', 'disk_information': '[NETAPP X412_S15K7560A15 NA08] S/N [6SL78KZN0000N4130P5S]', 'retryCount': '0', 'pathsTried': '1', 'timeoutRetryCount': '0'}
Sun Oct 18 21:50:54 EDT [NTAP_CONTROLLER: raidio_thread: raid.disk.timeout.ios.flush.end:notice]: Timeout abort/flush terminated for Disk /aggr1/plex0/rg4/2a.04.15 Shelf 4 Bay 15 [NETAPP X412_S15K7560A15 NA08] S/N [6SL78KS60000N41310H4] with status 0
Sun Oct 18 21:52:19 EDT [NTAP_CONTROLLER: config_thread: raid.config.filesystem.disk.failed:error]: File system Disk /aggr1/plex0/rg4/0b.04.15 Shelf 4 Bay 15 [NETAPP X412_S15K7560A15 NA08] S/N [6SL78KS60000N41310H4] failed.
Sun Oct 18 21:52:19 EDT [NTAP_CONTROLLER: disk_server_0: disk.IO.status:debug]: params: {'deviceName': '0b.04.15', 'returnCode': '9', 'pathRetryCount': '0', 'adapterStatus': '0x0', 'cdb': '0x28:34461fe0:0008', 'basicTimeout': '5', 'iASCQ': '0x0', 'iSenseKey': '0x3', 'sSenseCode': 'Medium format corrupted', 'ETime': '161', 'iASC': '0x31', 'victimRetryCount': '0', 'sSenseKey': 'SCSI:medium error', 'targetStatus': '0x2', 'disk_information': '[NETAPP X412_S15K7560A15 NA08] S/N [6SL78KS60000N41310H4]', 'retryCount': '0', 'pathsTried': '1', 'timeoutRetryCount': '0'}
Sun Oct 18 21:52:19 EDT [NTAP_CONTROLLER: raid_disk_thread: raid.disk.unload.done:info]: Unload of Disk 0b.04.15 Shelf 4 Bay 15 [NETAPP X412_S15K7560A15 NA08] S/N [6SL78KS60000N41310H4] has completed successfully
Sun Oct 18 21:52:19 EDT [NTAP_CONTROLLER: config_thread: callhome.fdsk.fault:error]: Call home for FILESYSTEM DISK FAILED