由于wexrpc_csm_rpc中发生内存泄漏、ONTAP集群节点意外重新启动
适用场景
- ONTAP 9
- 重复运行
volume explore -format analytics
问题描述
- ONTAP集群节点意外重新启动并显示 以下消息。
PANIC: Process mgwd unresponsive for 202 seconds (mgwd startup: "(2555)") in process nodewatchdog on release 9.11.1P8 (C) on Fri Oct 11 00:17:08 JST 2024
VMSTAT-M
of AutoSupport (ASUP) 表示wexrpc_CSM_RPC
的利用率一直在 增加:
=-=-=-=-=-= Sun Mar 05, 2023 00:09:04 +0900 VMSTAT-M 2 lines
wexrpc_CSM_RPC 7823976 488999K 488999K 488999K 0K 0K 15647950 64,128
D-wex bufs 3 1K 5K 0K 0K 1K 31295201
=-=-=-=-=-= Sun Mar 12, 2023 00:19:24 +0900 VMSTAT-M 2 lines
wexrpc_CSM_RPC 15222161 951386K 951386K 951386K 0K 0K 30444320 64,128
D-wex bufs 3 1K 5K 0K 0K 1K 60887263
=-=-=-=-=-= Sun Mar 19, 2023 00:09:26 +0900 VMSTAT-M 2 lines
wexrpc_CSM_RPC 21793443 1362091K 1362091K 1362091K 0K 0K 43586884 64,128
D-wex bufs 3 1K 5K 0K 0K 1K 87171781
=-=-=-=-=-= Sun Mar 26, 2023 00:29:12 +0900 VMSTAT-M 2 lines
wexrpc_CSM_RPC 28400316 1775020K 1775020K 1775020K 0K 0K 56800630 64,128
D-wex bufs 3 1K 5K 0K 0K 1K 113598643
wexrpc_CSM_RPC
的消息还会每小时显示在LEAK-DATA.GZ
of ASUP中:
- bsd memory - Sun Aug 13 10:50:00 JST 2023
11743 54288128 0xffffffff83573ae7 [common_kmod.ko::ck_refill_zone+71] common_kmod malloc
496 126976 0xffffffff806fdd0c [kernel::umtxq_alloc+28] umtx
748 191488 0xffffffff805bdfb2 [kernel::fuse_ipc_init+370] fuse_msgbuf
222 113664 0xffffffff8067e5ff [kernel::fget_unlocked+591] kdtrace
51638136 3304840704 0xffffffff836d1cf0 [common_kmod.ko::wex_1_common+48] wexrpc_CSM_RPC
748 191488 0xffffffff805bdf6c [kernel::fuse_ipc_init+300] fuse_msgbuf