ONTAP节点出现故障、并且在启动时无法启动并显示VLDB或vipmgr"错误:"panic:Process vipmgr" unresponed for xxx seconds in Process nodewdogdon Release 9.x "
适用场景
- ONTAP 9
问题描述
- 由于在watchdog超时之前vifmgr未响应、节点发生崩溃、并且在重新启动时无法恢复MDB:
Panic String: PANIC : Process vifmgr unresponsive for 629 seconds version: 9.1P12
or
Panic String: PANIC : Process vldb unresponsive for 160 seconds in process nodewatchdog on release 9.10.1P12
-
由于运行2周的滚动数据包跟踪导致vol0上的快照增量和快照空间利用率较高、根卷已填满:
Mon Mar 30 08:28:37 CDT [nodename: rshd_0: kern.cli.cmd:debug]: Command-line input: The command is 'pktt'. The full command line is 'pktt start a0a-10 -d /etc/crash -m 9018 -b 8m -s 2g -r 12'.
- 在发生崩溃之前、控制台日志显示vifmgr和VLDB崩溃、无法重新启动:
Apr 13 00:49:45 [nodename:spm.vldb.process.exit:EMERGENCY]: Volume Location Database(VLDB) subsystem with ID 34409 exited as a result of signal normal exit (1). The subsystem will attempt to restart.
Apr 13 00:49:47 [nodename:spm.vifmgr.process.exit:EMERGENCY]: Logical Interface Manager(VifMgr) with ID 34415 aborted as a result of signal normal exit (1). The subsystem will attempt to restart.
-
当节点重新启动时、由于vol0上缺少空间、无法恢复MDB:
Apr 13 02:54:46 [nodename:callhome.mdb.recovery.unsuccessful:EMERGENCY]: Call home for MDB RECOVERY UNSUCCESSFUL FOR THE notifyd WARNING.
ln: /var/zoneinfo/zoneinfo: No space left on device
root: Unable to ln /mroot/etc/zoneinfo to /var/zoneinfo - error code(1)
/usr/bin/plxcoeff_log: cannot create /mroot/etc/log/plxcoeff/plxcoeff.log.tmp: No space left on devicestat: /mroot/etc/log/plxcoeff/plxcoeff.log.tmp: stat: No such file or directory