AFF A250或FAS500f上的系统关闭SP检测信号已停止、启动时出现KCS错误
适用场景
- AFF A250
- FAS500f
问题描述
- 由于SP HBT已停止、节点关闭:
Sat Aug 19 03:46:24 -0400 [cluster-01: spmgrd: sp.heartbeat.stopped:debug]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 600 seconds.
Sat Aug 19 03:46:24 -0400 [cluster-01: spmgrd: callhome.sp.hbt.missed:debug]: Call home for SP HBT MISSED
Sat Aug 19 03:56:44 -0400 [cluster-01: spmgrd: callhome.sp.hbt.stopped:debug]: Call home for SP HBT STOPPED
Sat Aug 19 03:59:08 -0400 [cluster-01: env_mgr: sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 10 minutes.
Sat Aug 19 04:09:08 -0400 [cluster-01: env_mgr: monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the BMC)
- 由于发现配对节点正在重新启动、配对节点已接管:
Sat Aug 19 04:09:33 -0400 [cluster-02: cf_main: cf.fsm.takeover.on.reboot:debug]: Failover monitor: One node initiated automatic takeover after detecting that its partner node is rebooting.
- 如果向节点中添加了分接、则表示节点正在加载程序中。在转换到SP时、会看到以下垃圾邮件:
sh: can't create /sys/module/watchdog_hw/parameters/current_wdt_device: nonexistent directory
sh: can't create /sys/module/watchdog_hw/parameters/current_wdt_device: nonexistent directory
KCS cmd(NETFN 0x6, CMD 0x1) failed, ret -2
- 关闭并重新打开节点电源不变
- 此节点仍无法启动、并且BMC仍无响应。
- 尝试从加载程序启动boot_ONTAP会在启动时导致以下结果:
KCS cmd(NETFN 0xa, CMD 0x10) failed, ret -2
KCS cmd(NETFN 0xa, CMD 0x10) failed, ret -2
KCS cmd(NETFN 0xa, CMD 0x10) failed, ret -2
KCS cmd(NETFN 0xa, CMD 0x10) failed, ret -2
Could not patch the required SMBIOS 1 field 1 with the FRU data.
KCS cmd(NETFN 0xa, CMD 0x10) failed, ret -2
KCS cmd(NETFN 0xa, CMD 0x10) failed, ret -2
Copyright(c) 2021 American Megatrends, Inc.
��Copyright(c) 2021 American Megatrends, Inc.
��ERROR: Class:0; Subclass:20000; Operation: 1002
Boot Loader version 6.5.8
Copyright (C) 2000-2003 Broadcom Corporation.
Portions Copyright (C) 2002-2023 NetApp, Inc. All Rights Reserved.
KCS cmd(NETFN 0x6, CMD 0x1) failed, ret -2
Resetting BMC from backup FW...
Waiting 30 seconds for BMC to reboot...
KCS cmd(NETFN 0x6, CMD 0x1) failed, ret -2
Copyright(c) 2021 American Megatrends, Inc.
��ERROR: Class:0; Subclass:20000; Operation: 1002