HA互连:由于HostOS的CPU利用率较高、CFG_RV连接失败
适用场景
- FAS25xx
- 集群对等加密
- ONTAP 9.6 或更高版本
问题描述
- HA链路在
unsynchronized log
一天内多次摆动:
Tue May 17 01:51:41 0900 [Nodename: raidio_thread: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state MIRROR_ONLINE is aborted because of reason Abort Pending.
Tue May 17 01:51:41 0900 [Nodename: nvram_sync: nvmm.mirror.offlined:debug]: params: {'mirror': 'HA Partner Mirror Offlined'}
Tue May 17 01:51:41 0900 [Nodename: rendezvous_proc: cf.rv.notConnected:alert]: HA interconnect: Connection for 'cfo_rv' failed.
Tue May 17 01:51:41 0900 [Nodename: nic_mgr: cf.nm.nicViError:info]: HA interconnect: NIC 0 has an error on RAID VI (virtual interface #9): SEND_DESC_ERROR 12 2.
Tue May 17 01:51:41 0900 [Nodename: nic_mgr: cf.nm.nicReset:notice]: HA interconnect: Initiating soft reset on card 0 due to rendezvous reset.
Tue May 17 01:51:41 0900 [Nodename: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of Nodename by ae0000-vpnas1y disabled (HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support).
Tue May 17 01:51:44 0900 [Nodename: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of Nodename by ae0000-vpnas1y disabled (NVRAM size mismatch).
Tue May 17 01:51:50 0900 [Nodename: cf_main: cf.fsm.takeoverOfPartnerDisabled:error]: Failover monitor: takeover of ae0000-vpnas1y disabled (unsynchronized log).
- 观察到HostOS的高CPU利用率超过50%、这与链路摆动时间高度一致:
Cluster: Cluster_name (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
Node: node_name (yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy)
Time Range: 2022-05-16 16:00:01.000 00:00 - 22:04:01.473 00:00 GMT
time interval process process
instance pct_cpu
(%)
------------------------------------ ------------- -------
2022-05-16 16:45:01 - 16:50:02 00:00 CSM BTLS, 282 55.44
2022-05-16 16:50:02 - 16:55:02 00:00 CSM BTLS, 282 52.50