跳转到主内容

Coming soon...New Support-Specific categorization of Knowledge Articles in the NetApp Knowledge Base site to improve navigation, searchability and your self-service journey.

在BMC 1.81或更高版本上意外重新启动A700s

Views:
25
Visibility:
Public
Votes:
0
Category:
aff-series<a>A700s</a><a>Ha 互连</a><a>2008524003</a><a>ic1a</a><a>BURT 1403180</a>
Specialty:
hw
Last Updated:

适用场景

  • A700s
  • BMC 1.81或更高版本

问题描述

  • 意外节点重新启动:

[node_name_1: wafl_exempt08: wafl.vol.snap_create.done:info]: params: {'vol': 'vm_tfs_01', 'app': '', 'volident': '@vserver:34386606-fd18-11e6-aab3-00a098ae1e68', 'run_time': '504638', 'owner': '', 'type': 'Volume'}
[node_name_1: ifconfig: netif.linkUp:info]: Ethernet e0M: Link up.

  • 服务处理器将重置节点、配对节点将接管:。

[node_name_2: cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(node_name_1), system_down because reset_via_sp.
W[node_name_2: cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(node_name_1), system_down because l2_watchdog_reset.

[node_name_2: swi1: mri_ha: nvmm.mirror.aborting:debug]: mirror of sysid 1, partner_type HA Partner and mirror state MIRROR_ONLINE is aborted because of reason Abort Pending.
[node_name_2: gop_eq_thread: ic.linkStatusChange:info]: HA interconnect: Port ic1a link is down.
[node_name_2: cf_fastTimeout: cf.ic.heartBeatFailed:error]: HA interconnect: Heartbeat failed.
[node_name_2: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node_name_2 by node_name_1 disabled (unsynchronized log).
[node_name_2: rastrace_dump: rastrace.dump.saved:debug]: A RAS trace dump for module IC instance 0 was stored in /etc/log/rastrace/IC_0_20201027_17:15:50:245981.dmp.
[node_name_2: ctrl_hb_port_ic1a: ctrl.rdma.heartBeat:info]: HA interconnect: Missed heartbeat to 192.0.1.4.
[node_name_2: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node_name_2 by node_name_1 disabled (HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support).

  • 受影响的节点将重新启动、并可能在watchdog重置后交还后正常工作
  • BMC SEL日志显示NMI和watchdog信息:

  420 | 03/03/2023 | 17:51:10 | CriticalInt | Software NMI | Asserted
  421 | 03/03/2023 | 17:51:10 | Watchdog2 | Timer interrupt | Asserted
  422 | 03/03/2023 | 17:51:12 | Watchdog2 | Hard reset | Asserted
  423 | 03/03/2023 | 17:51:12 | SysReset | State Asserted | Asserted
  424 | 03/03/2023 | 18:20:22 | Platform Security #0x00 | Transition to Off Line | Asserted
  425 | 03/03/2023 | 18:42:32 | SysBoot #0xFF | State Asserted | Asserted
  426 | 03/03/2023 | 18:43:03 | Platform Security #0x00 | Transition to Off Line | Asserted

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

Scan to view the article on your device