节点在崩溃的情况下重新启动:cpuxx上的进程挂起(wafl_ipert0X)
适用场景
- ONTAP 9
- FAS 或AFF
- Flexgroups
问题描述
- 节点可能会间歇性崩溃、并显示以下一个或多个崩溃字符串:
Process mgwd unresponsive for 380 seconds (mgwd startup: "(4139)") in process nodewatchdog on release 9.10.1P1 (C
)Process on cpu6 hung (wafl_exempt00) for 5006 milliseconds! in SK process wafl_exempt00
PANIC: process on cpu31hung (wafl_exempt03) for 5007 milliseconds! version: 9.11.1P2
- EMS日志记录
Nblade.CifsOperationTimedOut
的消息:
xx/xx/2023 19:48:39 node01 ERROR Nblade.CifsOperationTimedOut: Detected a timed out CIFS operation. SMB command for this operation: SMB2_COM_CREATE,
Number of times this command was suspended: 8002,
Number of times this command was restarted: 0,
Last CSM error during this operation: CSM_OK,
Remote blade UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx (node01),
Is QoS enabled: QoS_disabled,
Last nBlade error during this operation: SPINNP_ERR_NOMEM,
Client IP address: xxx.xxx.xxx.xxx,
Local IP address: xxx.xxx.xxx.xxx,
Target Vserver ID: 3,
Target disk's DSID: 1077,
Target Vserver Name: vs0