NVMe 驱动器上的 I/O 超时会触发节点紧急情况
适用场景
- H 系列存储节点
- SF 系列存储节点
- Element 软件 11.x 和 12.0
问题描述
在 NVMe 驱动器上遇到 I/O 超时后、节点出现紧急情况。
示例 (Kern.log)
2020-05-30T15:01:29.908585Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993359] nvme nvme7: I/O 251 QID 5 timeout, aborting
2020-05-30T15:01:29.908599Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993368] nvme nvme7: I/O 808 QID 5 timeout, aborting
2020-05-30T15:01:29.908601Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993374] nvme nvme7: I/O 76 QID 8 timeout, aborting
2020-05-30T15:01:29.908602Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993377] nvme nvme7: I/O 79 QID 8 timeout, aborting
2020-05-30T15:01:29.908604Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993380] nvme nvme7: I/O 180 QID 8 timeout, aborting
2020-05-30T15:01:29.908608Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993384] nvme nvme7: I/O 181 QID 8 timeout, aborting
2020-05-30T15:01:29.908609Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993387] nvme nvme7: I/O 182 QID 8 timeout, aborting
2020-05-30T15:01:29.908610Z KUL01-SFCL01-H610S1-06 kernel: [3319602.993390] nvme nvme7: I/O 183 QID 8 timeout, aborting
2020-05-30T15:02:00.948585Z KUL01-SFCL01-H610S1-06 kernel: [3319634.032001] nvme nvme7: I/O 251 QID 5 timeout, reset controller
2020-05-30T15:02:04.698578Z KUL01-SFCL01-H610S1-06 kernel: [3319637.781819] nvme nvme7: I/O 19 QID 0 timeout, reset controller