跳转到主内容

由于服务器端SFP出现故障、主机间歇性重新启动

Views:
15
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
san<a>2009849292</a>
Last Updated:

适用场景

  • ONTAP 9.
  • RHEL
  • FC
  • Cisco

问题描述

  • RHEL主机间歇性重新启动、并出现以下事件和错误:

Nov 17 15:41:39 host multipathd: asm!.asm_ctl_vmb: add path (uevent)
Nov 17 15:41:39 host multipathd: asm/.asm_ctl_vmb: failed to get path uid
Nov 17 15:41:39 host multipathd: uevent trigger error
Nov 17 15:41:39 host multipathd: asm!.asm_ctl_vbg5: add path (uevent)
Nov 17 15:41:39 host multipathd: asm/.asm_ctl_vbg5: failed to get path uid
Nov 17 15:41:39 host multipathd: uevent trigger error
Nov 17 15:10:01 host systemd: Removed slice User Slice of root.
Nov 17 15:10:37 host systemd-udevd: worker [113970] /devices/virtual/block/dm-8 is taking a long time
Nov 17 15:10:37 host systemd-udevd: worker [113971] /devices/virtual/block/dm-61 is taking a long time
Nov 17 15:10:37 host systemd-udevd: worker [113972] /devices/virtual/block/dm-6 is taking a long time

  • 主机重新启动之前、传输相关错误会记录在下 var/log/messages:

Nov 17 15:09:31 host kernel: sd 1:0:4:48: [sdlu] tag#1 CDB: Test Unit Ready 00 00 00 00 00 00
Nov 17 15:09:31 host kernel: sd 1:0:4:49: [sdlx] tag#22 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK cmd_age=0s
Nov 17 15:09:31 host kernel: sd 1:0:4:49: [sdlx] tag#22 CDB: Test Unit Ready 00 00 00 00 00 00
Nov 17 15:09:36 host kernel: sd 1:0:2:7: rejecting I/O to offline device
Nov 17 15:09:36 host kernel: sd 1:0:2:7: [sdda] killing request
Nov 17 15:09:36 host kernel: sd 1:0:2:31: [sdeg] killing request
Nov 17 15:09:36 host kernel: sd 1:0:2:31: [sdeg] killing request
Nov 17 15:09:36 host kernel: sd 1:0:2:7: [sdda] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=5s
Nov 17 15:09:36 host kernel: sd 1:0:2:7: [sdda] CDB: Write(16) 8a 00 00 00 00 00 8d 0f e3 87 00 00 00 20 00 00
Nov 17 15:09:36 host kernel: blk_update_request: 5 callbacks suppressed
Nov 17 15:09:36 host kernel: blk_update_request: I/O error, dev sdda, sector 2366628743
Nov 17 15:09:36 host kernel: sd 1:0:2:31: [sdeg] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK cmd_age=5s
Nov 17 15:09:36 host kernel: sd 1:0:2:31: [sdeg] CDB: Write(16) 8a 00 00 00 00 00 01 f3 81 74 00 00 00 16 00 00

  • 这些错误会影响所有可用的多路径路径路径、使 存储LUN的路径保持"剩余0个路径"、进而导致目标LUN的IO完全失败:

Nov 17 15:09:36 host multipathd: sdah: mark as failed
Nov 17 15:09:36 host multipathd: xxx: remaining active paths: 3
Nov 17 15:09:36 host multipathd: sdcj: mark as failed
Nov 17 15:09:36 host multipathd: xxx: remaining active paths: 2
Nov 17 15:09:36 host multipathd: sdot: mark as failed
Nov 17 15:09:36 host multipathd: xxx: remaining active paths: 1
Nov 17 15:09:36 host multipathd: sdux: mark as failed
Nov 17 15:09:36 host multipathd: xxx: remaining active paths: 0
Nov 17 15:09:36 host multipathd: sdn: mark as failed

 

  • 存储端未发现性能问题
  • 在存储中的问题描述时间期间或之前,不会记录此类错误事件 EMS
  • 在Cisco SAN交换机上、 主机连接的接口在意外事件期间出现信号丢失。
  • 查看启用日志后、 我们注意到主机连接的端口上存在增量Rx计数器、这 表明 需要进一步检查连接的终端设备。
  • 在交换机端、 flogi database 表示主机连接的接口似乎未连接到交换机、这表示从主机到交换机的路径在物理层具有问题描述。
  • 执行物理连接检查、包括电缆测试、配线板测试和服务器端SFP。
     

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.