已中止和/或已恢复的NDMP操作的备份日志可能会对ONTAP 节点的根卷进行发生原因 以进行填充、从而可能导致节点发生故障
适用场景
- ONTAP 9
- 网络数据管理协议(NDMP)操作、例如
ndmpcopy
问题描述
- 单个节点根卷的已用大小快速增加。可以通过定期运行以下命令来查看此问题:
cluster1::> volume show -vserver cluster1-01
Vserver Volume Aggregate State Type Size Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
cluster1-01 vol0 aggr0 online RW 442.4GB 407.6GB 7%
(使用节点名称作为 -vserver
参数将返回该节点的根卷)
- 位于的备份日志
/mroot/etc/log/backup
将填充类似以下内容的消息:
Tue Mar 27 00:11:36 EDT 2018 /svm1/vol1 Log_msg (Flush DIRNET for BKP ID=248, type=3 interrupted while waiting for min inflight. Error = Interrupted system call.
访问 backup
日志的最简单方法是通过服务处理器基础架构(Service Processor Infrastructure、SPI)界面单击 logs
链接。 如何从ONTAP存储系统手动收集日志和复制文件 有关如何使用SPI的帮助、请参见知识库文章:如何从集群模式Data ONTAP存储系统手动收集日志和复制文件(在"选项1"下)。
- 受影响节点可能会发生崩溃、并显示类似于以下内容的消息:
示例1:
Process vldb unresponsive for 631 seconds in process nodewatchdog onrelease 9.2P1 (C)
注: 此崩溃可能是由许多其他问题引起的。 仅此崩溃状态并不表示此处所述的问题描述;请务必检查节点的根卷状态以及 备份 日志的内容。
示例2:
Apr 12 15:49:43 [node-02:callhome.mdb.recovery.unsuccessful:EMERGENCY]: Call home for MDB RECOVERY UNSUCCESSFUL FOR THE coresegd WARNING.
Apr 12 15:51:58 [node-02:callhome.mdb.recovery.unsuccessful:EMERGENCY]: Call home for MDB RECOVERY UNSUCCESSFUL FOR THE mcached WARNING.
Apr 12 15:54:07 [node-02:spm.vifmgr.process.exit:EMERGENCY]: Logical Interface Manager(VifMgr) with ID 9996 aborted as a result of signal normal exit (1). The subsystem will attempt to restart.
Apr 12 15:54:09 [node-02:callhome.mdb.recovery.unsuccessful:EMERGENCY]: Call home for MDB RECOVERY UNSUCCESSFUL FOR THE vifmgr WARNING.
Apr 12 16:03:14 [node-02:callhome.mdb.recovery.unsuccessful:EMERGENCY]: Call home for MDB RECOVERY UNSUCCESSFUL FOR THE bcomd WARNING.
PANIC : Process vifmgr unresponsive for 630 seconds
version: 9.4P3: Thu Oct 11 18:25:55 EDT 2018
conf : x86_64.optimize
cpuid = 3
KDB: stack backtrace:
PANIC: Process vifmgr unresponsive for 630 seconds in process nodewatchdog on release 9.4P3 (C) on Wed Apr 12 16:04:13 KST 2023
Apr 12 16:21:11 [node-02:extCache.rw.replay.canceled:notice]: WAFL external cache replay canceled for aggregate node2_aggr0: Aggregate came online after timeout.
Apr 12 16:22:21 [node-02:mgmtgwd.rootvolrec.low.space:EMERGENCY]: The root volume on node "node-02" is dangerously low on space. Less than 10 MB of free space remaining.
Apr 12 16:22:21 [node-02:callhome.root.vol.recovery.reqd:EMERGENCY]: Call home for ROOT VOLUME NOT WORKING PROPERLY: RECOVERY REQUIRED.
- 备份日志增长会导致 根卷空间不足、有时会导致 根聚合脱机。
214G /mroot/etc/log/backup
96G /mroot/etc/log/backup.0