跳转到主内容
NetApp Knowledge Base site will be down for 3 hours between Oct 26, 23:59 PST and Oct 27, 02:59 PST, for system maintenance and infrastructure update.

由于内存泄漏、HA节点也会发生崩溃

Views:
10
Visibility:
Public
Votes:
0
Category:
fas-systems<a>ONTAP 9</a><a>SnapMirror</a><a>崩溃</a><a>2009-284649</a>
Specialty:
core
Last Updated:

适用场景

  • ONTAP 9
  • SnapMirror

问题描述

  • HA节点交替重复发生崩溃
  • 崩溃节点始终为节点ID最低的JPC节点
  • 集群包含12个节点
  • 作为SnapMirror目标、大约有150个Flexgup SnapMirror关系、并设置了5分钟的计划
  • 崩溃摘要:

节点1
———
No panic

节点2
————
No panic

节点3
————
7/28
PANIC : Process vldb unresponsive
7/31
PANIC : Process bcomd unresponsive 
8/4
PANIC: Process vifmgr unresponsive 
8/8
PANIC: Process vldb unresponsivr 
8/14
PANIC: Process mgwd unresponsive
8/23
PANIC: Process mgwd unresponsive

节点4
———
2018
PANIC: page fault (supervisor write data, page not present) on VA 0 cs:rip

PANIC: Process vifmgr unrespnsive 

PANIC: Process mgwd unrespnsive

PANIC: page fault (supervisor write data, page not present) on VA 0 cs:rip

PANIC: Process vifmgr unrespnsive 
年7月26日8/28/3 8/6月8日
PANIC: Process mgwd unresponsive

节点5
———
No panic

节点6
———
No panic

节点7
———
No panic

节点8
———
No panic

节点9
————
No panic

节点10
————
No panic

节点11
———
No panic

节点12
———
No panic

  • EMS 日志指示内存不足状况

Wed Jul 27 18:15:15 +0900 [Node3: repl_Handle_reg: wafl.memory.statusLowMemory:notice]: WAFL is running low on memory, with 1964MB remaining.
Wed Jul 27 18:20:25 +0900 [Node3: repl_Handle_reg: wafl.memory.statusLowMemory:notice]: WAFL is running low on memory, with 2001MB remaining.
Thu Jul 28 00:29:36 +0900 [Node3: wafl_exempt02: wafl.memory.statusVeryLowMemory:alert]: WAFL is running very low on memory, with 1314MB remaining.
Thu Jul 28 00:29:38 +0900 [Node3: wafl_exempt03: wafl.memory.statusLowMemory:notice]: WAFL is running low on memory, with 1370MB remaining.

Tue Jul 26 08:46:46 +0900 [Node4: UtilFsmThread: sk.panic:alert]: Panic String: page fault (supervisor write data, page not present) on VA 0 cs:rip 0x20:0xffffffff8b48579b rflags 0x10246 in SK process UtilFsmThread on release 9.8P11 (C)
Thu Jul 28 00:49:10 +0900 [Node3: nodewatchdog: sk.panic:alert]: Panic String: Process vldb unresponsive for 210 seconds in process nodewatchdog on release 9.8P11 (C)
Fri Jul 29 17:27:00 +0900 [Node4: nodewatchdog: sk.panic:alert]: Panic String: Process vifmgr unresponsive for 540 seconds in process nodewatchdog on release 9.8P11 (C)

  • snapmirror-error-log 表示计划的更新传输失败

Tue Jul 26 08:46:45 KST 2022 FlexGroupScheduledUpdate[Jul 26 08:45:00]:3459d89a-099a-11ed-8738-d039ea368836 Operation-Uuid=f3321704-42ca-4a54-bffd-e75280d1e486 Group=flexgroup Operation-Cookie=0 SrcPath:src_svm:vol_name1 DstPath:dst_svm:vol_name1 Prim=SM:cid=101,mid=1009, sm_wf_cg_group_op_wait_for_all_item_done_xfer_post_dowork:5101, Msg=Transfer aborted
Thu Jul 28 00:49:15 KST 2022 FlexGroupScheduledUpdate[Jul 28 00:30:00]:9f695594-b157-11eb-9a9a-00a098b85311 Operation-Uuid=a8b6dd3f-b5df-440c-949b-c59d8c3ec5bf Group=flexgroup Operation-Cookie=0 SrcPath:src_svm:vol_name3 DstPath:dst_svm:vol_name3 Prim=SM:cid=101,mid=1009, sm_wf_cg_group_op_wait_for_all_item_done_xfer_post_dowork:5101, Msg=Transfer aborted. 
Fri Jul 29 17:27:02 KST 2022 FlexGroupScheduledUpdate[Jul 29 16:35:00]:9dcab4ec-b3a2-11eb-a8c6-00a098b85311 Operation-Uuid=795b6454-6069-4e10-acca-2ceb019cbf80 Group=flexgroup Operation-Cookie=0 SrcPath:src_svm:vol_name6 DstPath:dst_svm:vol_name6 Prim=SM:cid=101,mid=1009, sm_wf_cg_group_op_wait_for_all_item_done_xfer_post_dowork:5101, Msg=Transfer aborted. 
Sun Jul 31 10:43:00 KST 2022 FlexGroupScheduledUpdate[Jul 31 09:45:00]:9f695594-b157-11eb-9a9a-00a098b85311 Operation-Uuid=1d4e4ba9-30b4-4532-a37a-77da00dc758e Group=flexgroup Operation-Cookie=0 SrcPath:src_svm:vol_name3 DstPath:dst_svm:vol_name3 Prim=SM:cid=101,mid=1009, sm_wf_cg_group_op_wait_for_all_item_done_xfer_post_dowork:5101, Msg=Transfer aborted. 
Tue Aug  2 02:44:47 KST 2022 FlexGroupScheduledUpdate[Aug  2 02:00:00]:1b5bb5ee-08bd-11ed-9016-00a098657588 Operation-Uuid=3d074e79-8975-4bb7-ab96-4ad07fd7bc61 Group=flexgroup Operation-Cookie=0 SrcPath:src_svm:vol_name8 DstPath:dst_svm:vol_name8 Prim=SM:cid=101,mid=1009, sm_wf_cg_group_op_wait_for_all_item_done_xfer_post_dowork:5101, Msg=Transfer aborted. 
Wed Aug  3 16:21:48 KST 2022 FlexGroupScheduledUpdate[Aug  3 16:20:00]:a578679f-0e4a-11ed-814e-00a098bed184 Operation-Uuid=0aa7ad2a-8cef-44b2-87cc-dab8c7813941 Group=flexgroup Operation-Cookie=0 SrcPath:src_svm:vol_name10 DstPath:dst_svm:vol_name10 Prim=SM:cid=101,mid=1009, sm_wf_cg_group_op_wait_for_all_item_done_xfer_post_dowork:5101, Msg=Transfer aborted. 
Thu Aug  4 10:48:54 KST 2022 FlexGroupScheduledUpdate[Aug  4 10:30:00]:9f695594-b157-11eb-9a9a-00a098b85311 Operation-Uuid=e7337352-8c0c-44c4-a2fb-00b7f407053c Group=flexgroup Operation-Cookie=0 SrcPath:src_svm:vol_name3 DstPath:dst_svm:vol_name3 Prim=SM:cid=101,mid=1009, sm_wf_cg_group_op_wait_for_all_item_done_xfer_post_dowork:5101, Msg=Transfer aborted. 
Sat Aug  6 08:43:24 KST 2022 FlexGroupScheduledUpdate[Aug  6 08:20:00]:1b5bb5ee-08bd-11ed-9016-00a098657588 Operation-Uuid=eee93c4d-f9d7-42a3-b4d7-1b6e7dac4e07 Group=flexgroup Operation-Cookie=0 SrcPath:src_svm:vol_name8 DstPath:dst_svm:vol_name8 Prim=SM:cid=101,mid=1009, sm_wf_cg_group_op_wait_for_all_item_done_xfer_post_dowork:5101, Msg=Transfer aborted. 
Mon Aug  8 04:13:24 KST 2022 FlexGroupScheduledUpdate[Aug  8 04:00:00]:9f695594-b157-11eb-9a9a-00a098b85311 Operation-Uuid=8f73e139-ec17-42f1-ae39-6faea650e5c4 Group=flexgroup Operation-Cookie=0 SrcPath:src_svm:vol_name3 DstPath:dst_svm:vol_name3 Prim=SM:cid=101,mid=1009, sm_wf_cg_group_op_wait_for_all_item_done_xfer_post_dowork:5101, Msg=Transfer aborted. 
Sun Aug 14 17:43:33 KST 2022 FlexGroupScheduledUpdate[Aug 14 17:15:00]:9f695594-b157-11eb-9a9a-00a098b85311 Operation-Uuid=eadd1c18-37b3-4a25-829e-d0fe6ccfd0cd Group=flexgroup Operation-Cookie=0 SrcPath:src_svm:vol_name3 DstPath:dst_svm:vol_name3 Prim=SM:cid=101,mid=1009, sm_wf_cg_group_op_wait_for_all_item_done_xfer_post_dowork:5101, Msg=Transfer aborted. 
Thu Aug 18 22:13:38 KST 2022 FlexGroupScheduledUpdate[Aug 18 20:18:55]:1b5bb5ee-08bd-11ed-9016-00a098657588 Operation-Uuid=ecee02ba-63e0-456f-b4af-c463448e5954 Group=flexgroup Operation-Cookie=0 SrcPath:src_svm:vol_name8 DstPath:dst_svm:vol_name8 Prim=SM:cid=101,mid=1009, sm_wf_cg_group_op_wait_for_all_item_done_xfer_post_dowork:5101, Msg=Transfer aborted. 
Tue Aug 23 05:09:39 KST 2022 FlexGroupScheduledUpdate[Aug 23 04:35:00]:5511f07c-08ba-11ed-814e-00a098bed184 Operation-Uuid=71c51770-4023-42f0-87bf-e794dc987eb7 Group=flexgroup Operation-Cookie=0 SrcPath:src_svm:vol_name12 DstPath:dst_svm:vol_name12 Prim=SM:cid=101,mid=1009, sm_wf_cg_group_op_wait_for_all_item_done_xfer_post_dowork:5101, Msg=Transfer aborted.

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.