跳转到主内容

CFBRIDGE-414:两个NSM都遇到了看门狗重置、导致无法访问磁盘架X

Views:
3
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw
Last Updated:

问题描述

  • 我们看到 scsi.cmd.checkCondition  

 [Node: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device e5a.11.3.5L0: Check Condition: CDB 0x28:2183d84b:0001: Sense Data SCSI:aborted command -  (0xb - 0x90 0x2 0xfc)(2520).
 [Node: scsi_cmdblk_strthr_admin: scsi.cmd.checkCondition:error]: Disk device e5b.11.0.12L0: Check Condition: CDB 0x28:9ae4029e:0001: Sense Data SCSI:aborted command -  (0xb - 0x90 0x2 0xfc)(2669).
 [Node: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device e5a.11.1.3L0: request successful after retry #1/#0: cdb 0x28:de3dfca4:0001 (3538).
 [Node: scsi_cmdblk_strthr_admin: scsi.cmd.retrySuccess:debug]: Disk device e5b.11.0.5L0: request successful after retry #1/#0: cdb 0x28:2183d84e:0001 (3371).
 
  • 节点已重新启动、原因是

[Node: config_thread: raid.config.filesystem.disk.missing:info]: File system Disk /Node_SSD/plex0/rg2/e5b.11.2.1 Shelf 11 Bay 1 [NETAPP   X4020S173A15TNQF NA55] S/N [XXXXXXXXXXXXXX] UID [36313230:57B16662:00253845:00000002:00000000:00000000:00000000:00000000:00000000:00000000] is missing.
[Node: config_thread: raid.config.filesystem.disk.missing:info]: File system Disk /Node_SSD/plex0/rg2/e5a.11.1.2 Shelf 11 Bay 2 [NETAPP   X4020S173A15TNQF NA55] S/N [XXXXXXXXXXXXXX] UID [36313230:57B16664:00253845:00000002:00000000:00000000:00000000:00000000:00000000:00000000] is missing.
[Node: config_thread: cf.multidisk.fatalProblem:error]: Node encountered a multidisk error or other fatal error while waiting to be taken over. aggr Node_SSD: raid volfsm, fatal multi-disk error..  Raid type - raid_dp Group name plex0/rg2 state NORMAL. 8 disks failed in the group. Disk e5b.11.2.0 Shelf 11 Bay 0 [NETAPP   X4020S173A15TNQF NA55] S/N [XXXXXXXXXXXXX] UID [36313230:57B16661:00253845:00000002:00000000:00000000:00000000:00000000:00000000:00000000] error: no valid path to disk. Disk e5b.11.2.1 Shelf 11 Bay 1 [NETAPP   X4020S173A15TNQF NA55] S/N [XXXXXXXXXXXXX] UID [36313230:57B16662:00253845:00000002:00000000:00000000:00000000:00000000:00000000:00000000] error: disk does not exist. Disk e5a.11.1.2 Shelf 11 Bay 2 [NETAPP   X4020S173A15TNQF NA55] S/N 
  • 磁盘架日志中的两个模块均显示"software dogdog detected fault"(检测到软件监视器故障)

--------------------------------------------------------------
Shelflog start time: Sun Mar  9 09:15:18 GMT 2025
Controller Id: XXXXXXXXXXX
Channel: 0x Shelf: 11 Module type: NSM100 Firmware rev: 0305
Shelf product id: NS224NSM100
Shelf Serial Number: XXXXXXXXXXXXX
Module A Serial Number: XXXXXXXXXXX
Log ID: XXXXXXXXXXXXXX
Timestamp: Thu Mar 20 21:54:52 GMT 2025
--------------------------------------------------------------
EVENT LOGS
Timestamp Thu Mar 20 21:54:51 2025
(183+12:51:48.557)
Thu Mar 20 21:54:47 2025 (  183+12:51:45.089); 02000228; M0; HAL; hal; 02; Failure: software watchdog detected fault.
Thu Mar 20 21:54:47 2025 (  183+12:51:45.089); 02000229; M0; HAL; hal; 02; Failure info: Client "bridgeWdgClient" triggered wdg. tNow:3b364bc9h, tLast:3b35fa79h, interval:4e20h, failed:0h.
Thu Mar 20 21:54:47 2025 (  183+12:51:45.089); 02000263; M0; HAL; hal; 04; HAL_ProductCrashAndCoreIt: prior system(pkill -6 bio) status:0 pid:0
Thu Mar 20 21:54:47 2025 (  183+12:51:45.089); 02000263; M0; HAL; hal; 04; HAL_ProductCrashAndCoreIt: post system(pkill -6 bio) status:0 pid:3102921

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.