跳转到主内容

ESXi主机关闭、主机端出现PSOD错误和瞬时存储错误

Views:
50
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
san<a>2009750531</a>
Last Updated:

适用场景

  • ESXi 主机
  • ONTAP 9

问题描述

  • ESXi主机关闭并进入 紫屏死机 (PSOD错误)。
  • 在中 zdump ,会记录大量瞬时存储错误:

2023-08-24T11:19:52.549Z cpu31:22473151)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:706: Path (vmhba64:C2:T2:L74) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x3f 0x3.
2023-08-24T11:19:52.549Z cpu56:22156400)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:706: Path (vmhba64:C2:T2:L21) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x3f 0x3.
2023-08-24T11:19:52.549Z cpu74:22156402)StorageDevice: 7059: End path evaluation for device naa.600a09803831357734244e4c6dxxxxxx
2023-08-24T11:19:52.549Z cpu14:2099001)NMP: nmp_ThrottleLogForDevice:3867: Cmd 0xa3 (0x45dabceec948, 0) to dev "naa.600a09803831357734244e4c6dxxxxxx" on path "vmhba64:C6:T2:L92" Failed:
2023-08-24T11:19:52.549Z cpu14:2099001)NMP: nmp_ThrottleLogForDevice:3875: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x3f 0x3. Act:NONE. cmdId.initiator=0x453a5741bbc8 CmdSN 0x0
2023-08-24T11:19:52.549Z cpu79:22473152)VMW_SATP_ALUA: satp_alua_issueCommandOnPath:706: Path (vmhba64:C10:T2:L501) command 0xa3 failed with transient error status Transient storage condition, suggest retry. sense data: 0x6 0x3f 0x3
2023-08-24T11:19:52.549Z cpu78:2098303)NMP: nmp_ThrottleLogForDevice:3867: Cmd 0xa3 (0x45ba5d814648, 0) to dev "naa.600a09803831357734244e4c6dxxxxxx" on path "vmhba64:C1:T2:L455" Failed:

  • 主机端出现链路错误:

2023-08-24T12:48:14.126Z: [netCorrelator] 413700037us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic10 is down. Affected dvPort: 37129774/50 21 f4 36 4c 7c 40 51-ec ee 57 d7 8d 0e 68 33. 1 uplinks up. Failed criteria: 128
2023-08-24T12:48:14.126Z: [netCorrelator] 413700045us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic10 is down. Affected dvPort: 37139132/50 21 f4 36 4c 7c 40 51-ec ee 57 d7 8d 0e 68 33. 1 uplinks up. Failed criteria: 128
2023-08-24T12:48:14.257Z: [netCorrelator] 413830728us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic5 is down. Affected dvPort: 538d9049-db44-4779-9bc7-df06af095601/50 21 f4 36 4c 7c 40 51-ec ee 57 d7 8d 0e 68 33. 0 uplinks up. Failed criteria: 128

2023-08-23T22:39:53.759Z cpu2:2099091)WARNING: iscsi_vmk: iscsivmk_StopConnection:739: Sess [ISID: 00023d000017 TARGET: iqn.1992-08.com.netapp:sn.786b6fe056c311e98c4100a098xxxxxx:vs.23 TPGT: 41a TSIH: 0]
2023-08-23T22:39:53.759Z cpu2:2099091)WARNING: iscsi_vmk: iscsivmk_StopConnection:740: Conn [CID: 0 L: 10.111.254.47:43401 R: 10.111.254.171:3260]

  • vmhba64适配器用于连接外部存储、其中会报告瞬时错误。 

  • 通过此适配器突然断开存储连接,导致PSOD (紫屏死机),其中内存地址0x0被传递到`memcpy()`函数,从而导致内存访问无效。 

  • 此问题似乎是由于两个`ScsiDeviceDataChangeCallback()`实例处理` VMK_SCSI_device_event_UA_inQiry_parameters_changed`事件之间出现争用情况而导致的。

  • 断开存储连接后,所有IOS都会开始存储在缓存上,以便在恢复联机后可以转储回存储阵列。存储恢复用时过长、缓存已满、导致主机完全崩溃。

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.