ESXi主机关闭、主机端出现PSOD错误和瞬时存储错误
适用场景
- ESXi 主机
- ONTAP 9
问题描述
- ESXi主机关闭并进入 紫屏死机 (PSOD错误)。
-
在中
zdump
,会记录大量瞬时存储错误:
2023-08-24T11:19:52.549Z cpu31:22473151)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:706: Path (vmhba64:C2:T2:L74) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x3f 0x3.
2023-08-24T11:19:52.549Z cpu56:22156400)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:706: Path (vmhba64:C2:T2:L21) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x3f 0x3.
2023-08-24T11:19:52.549Z cpu74:22156402)StorageDevice: 7059: End path evaluation for device naa.600a09803831357734244e4c6dxxxxxx
2023-08-24T11:19:52.549Z cpu14:2099001)NMP: nmp_ThrottleLogForDevice:3867: Cmd 0xa3 (0x45dabceec948, 0) to dev "naa.600a09803831357734244e4c6dxxxxxx" on path "vmhba64:C6:T2:L92" Failed:
2023-08-24T11:19:52.549Z cpu14:2099001)NMP: nmp_ThrottleLogForDevice:3875: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0x3. Act:NONE. cmdId.initiator=0x453a5741bbc8 CmdSN 0x0
2023-08-24T11:19:52.549Z cpu79:22473152)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:706: Path (vmhba64:C10:T2:L501) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x3f 0x3
2023-08-24T11:19:52.549Z cpu78:2098303)NMP: nmp_ThrottleLogForDevice:3867: Cmd 0xa3 (0x45ba5d814648, 0) to dev "naa.600a09803831357734244e4c6dxxxxxx" on path "vmhba64:C1:T2:L455" Failed:
-
主机端出现链路错误:
2023-08-24T12:48:14.126Z: [netCorrelator] 413700037us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic10 is down. Affected dvPort: 37129774/50 21 f4 36 4c 7c 40 51-ec ee 57 d7 8d 0e 68 33. 1 uplinks up. Failed criteria: 128
2023-08-24T12:48:14.126Z: [netCorrelator] 413700045us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic10 is down. Affected dvPort: 37139132/50 21 f4 36 4c 7c 40 51-ec ee 57 d7 8d 0e 68 33. 1 uplinks up. Failed criteria: 128
2023-08-24T12:48:14.257Z: [netCorrelator] 413830728us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic5 is down. Affected dvPort: 538d9049-db44-4779-9bc7-df06af095601/50 21 f4 36 4c 7c 40 51-ec ee 57 d7 8d 0e 68 33. 0 uplinks up. Failed criteria: 128
2023-08-23T22:39:53.759Z cpu2:2099091)WARNING: iscsi_vmk: iscsivmk_StopConnection:739: Sess [ISID: 00023d000017 TARGET: iqn.1992-08.com.netapp:sn.786b6fe056c311e98c4100a098xxxxxx:vs.23 TPGT: 41a TSIH: 0]
2023-08-23T22:39:53.759Z cpu2:2099091)WARNING: iscsi_vmk: iscsivmk_StopConnection:740: Conn [CID: 0 L: 10.111.254.47:43401 R: 10.111.254.171:3260]
-
vmhba64适配器用于连接外部存储、其中会报告瞬时错误。
-
通过此适配器突然断开存储连接,导致PSOD (紫屏死机),其中内存地址0x0被传递到`memcpy()`函数,从而导致内存访问无效。
-
此问题似乎是由于两个`ScsiDeviceDataChangeCallback()`实例处理` VMK_SCSI_device_event_UA_inQiry_parameters_changed`事件之间出现争用情况而导致的。
-
断开存储连接后,所有IOS都会开始存储在缓存上,以便在恢复联机后可以转储回存储阵列。存储恢复用时过长、缓存已满、导致主机完全崩溃。