HDD IO 延迟计算不正确会导致磁盘故障
适用场景
- FAS2720
- ONTAP 9.7P10
问题描述
- 多个磁盘在短期内发生故障。
- EMS 报告磁盘延迟已超过平均值,因此 ONTAP 建议失败:
Wed Nov 10 01:33:37 +0900 [node_name: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device 0b.00.6: Check Condition: CDB 0x8a:00000006aad4c800:00000200: Sense Data SCSI:aborted command - (0xb - 0x4b 0x6 0x0)(2031).
Wed Nov 10 01:33:37 +0900 [node_name: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device 0b.00.6: request successful after retry #1/#0: cdb 0x8a:00000006aad4c800:00000200 (2147).
Wed Nov 10 01:33:48 +0900 [node_name: disk_latency_monitor: shm.threshold.ioLatency:debug]: Disk 0b.00.6 has exceeded the expected IO latency in the current window with average latency of 15122 msecs and average utilization of 47 percent. Highest average IO latency: 0b.00.6: 15122 msecs; next highest IO latency: 0b.00.10: 11 msecs. Disk 0b.00.6 Shelf 0 Bay 6 [NETAPP X388_SEVNE16TA07 NA00] S/N [ZL25117M] UID [5000C500:CA04E0EF:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]
Wed Nov 10 01:34:18 +0900 [node_name: disk_latency_monitor: shm.threshold.ioLatency:debug]: Disk 0b.00.6 has exceeded the expected IO latency in the current window with average latency of 15037 msecs and average utilization of 47 percent. Highest average IO latency: 0b.00.6: 15037 msecs; next highest IO latency: 0b.00.10: 11 msecs. Disk 0b.00.6 Shelf 0 Bay 6 [NETAPP X388_SEVNE16TA07 NA00] S/N [ZL25117M] UID [5000C500:CA04E0EF:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]
Wed Nov 10 01:34:48 +0900 [node_name: disk_latency_monitor: shm.threshold.ioLatency:debug]: Disk 0b.00.6 has exceeded the expected IO latency in the current window with average latency of 14792 msecs and average utilization of 50 percent. Highest average IO latency: 0b.00.6: 14792 msecs; next highest IO latency: 0b.00.10: 11 msecs. Disk 0b.00.6 Shelf 0 Bay 6 [NETAPP X388_SEVNE16TA07 NA00] S/N [ZL25117M] UID [5000C500:CA04E0EF:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000]
Wed Nov 10 01:34:48 +0900 [node_name: disk_latency_monitor: shm.threshold.highIOLatency:error]: Disk 0b.00.6 exceeds the average IO latency threshold and will be recommended for failure.
Wed Nov 10 01:34:48 +0900 [node_name: disk_latency_monitor: scsi.debug:debug]: shm_setup_for_failure disk 0b.00.6 (S/N ZL25117M) error 200000h