跳转到主内容

节点关闭、多个磁盘"SCSI.cmd.pastTimeToLive:error"

Views:
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
hw
Last Updated:

适用场景

  • FAS 2820
  • ONTAP 9
  • 内部磁盘架

问题描述

  • 节点已关闭、并 出现多个磁盘scsi.cmd.pastTimeToLive:erro r错误。

[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.0: request failed after try #1: cdb 0x8a:000000046cd85e00:00000200.
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.0: request failed after try #1: cdb 0x8a:000000047237f760:00000008.
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.0: request failed after try #1: cdb 0x8f:000000046c3c7e00:00000400.
...
[?] Sat Dec 28 08:48:00 +0900 [node01: scsi_cmdblk_strthr_admin: scsi.cmd.pastTimeToLive:error]: Disk device 0b.00.8: request failed after try #1: cdb 0x88:000000047237ef90:00000008.

  • 在配对节点HA Group Notification (CONTROLLER TAKEOVER COMPLETE AUTOMATIC - Communiction Error) ALERT 中。
    • 检测到以下EMS日志。

[?] Sat Dec 28 08:48:01 +0900 [node02: cf_main: cf.fsm.takeover.mdp:alert]: Failover monitor: takeover attempted after multi-disk failure on partner

  • 磁盘架IOM端口状态显示 NO SIGNAL

Timestamp: Sat Jan 4 08:33:20 JST 2025
Shelf name: 0c.shelf0
Channel: 0c
Module: A
Shelf id: 0
Shelf UUID: 50:0a:09:80:08:6f:fb:24
Shelf S/N: SHJSG2418000037
Term switch: N/A
Shelf state: ONLINE
Module state: OK

Partial Path Link Invalid Running Loss Phy CRC Phy
Disk Port Timeout Rate DWord Disparity Dword Reset Error Change
Id State Value (ms) (Gb/s) Count Count Count Problem Count Count
--------------------------------------------------------------------------------------------
[HST0/P0:0] NO SIGNAL 7 NA 0 0 0 0 0 974
[HST1/P0:1] NO SIGNAL 7 NA 1299 1298 0 0 0 974
[HST2/P0:2] NO SIGNAL 7 NA 310 307 0 0 0 974
[HST3/P0:3] NO SIGNAL 7 NA 85 81 0 0 0 974
[HST4/P1:0] OK 7 12.0 0 0 0 0 0 3
[HST5/P1:1] OK 7 12.0 0 0 0 0 0 3
[HST6/P1:2] OK 7 12.0 0 0 0 0 0 3

  • 节点无法读取多个驱动器,并且聚合由于以下原因失败multi-disk error
    Mon Jun 02 10:17:22 +0700 [node-02: config_thread: raid.vol.failed:notice]: Aggregate aggr1_n2: Failed due to multi-disk error.
    Mon Jun 02 10:17:23 +0700 [node-02: config_thread: cf.multidisk.fatalProblem:error]: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr aggr1_n2: raid volfsm, fatal multi-disk error..  Raid type - raid_dp Group name plex0/rg0 state DOUBLEDEGRADED. 1 disk failed in the group. Disk 0a.00.2P1 Shelf 0 Bay 2 [NETAPP   X336_TTCRE04TA07 NA04] S/N [Y3F0A2XXXXXX] UID [6000039C:E82AC314:500A0981:00000001:00000000:00000000:00000000:00000000:00000000:00000000] error: disk failed..

     
  • 节点因以下原因关闭 multi-disk failure
    Mon Jun 02 10:17:23 +0700 [node-02: cf_main: cf.fsm.takeover.mdp:alert]: Failover monitor: takeover attempted after multi-disk failure on partner

     

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.