跳转到主内容

由于电源问题,控制器完全自动接管

Views:
57
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
hw
Last Updated:

适用于

  • ONTAP 9
  • AFF 系统
  • FAS 系统

问题

  • 合作伙伴 Node 经历了自动接管,以下内容可以在 EMS-LOG-FILE 中找到
callhome.sfo.takeover:alert]: Call home for CONTROLLER TAKEOVER COMPLETE AUTOMATIC
cf_takeover: callhome.reboot.takeover:notice]: Call home for PARTNER REBOOT (CONTROLLER TAKEOVER)
cf_takeover: cf.fm.takeoverComplete:notice]: Failover monitor: takeover completed

splog_main: mgr.boot.reason_abnormal:EMERGENCY]: System rebooted due to a power glitch.
splog_main: callhome.reboot.glitch:notice]: Call home for REBOOT (power glitch)
.

  • 磁盘架还报告了电源故障,见 EMS-LOG-FILE

cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(cluster1-01), system_down because power_loss.
dsa_worker5: ses.status.psWarning:error]: DS224-12 (S/N SHF#############) shelf 0 on channel 7a power warning for Power supply 2: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom right.
dsa_worker4: ses.status.psWarning:error]: DS224-12 (S/N SHF#############) shelf 10 on channel 9d power warning for Power supply 2: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom right.
.

Sat Jan 11 00:30:40 +0100 [snes1p208_01: dsa_worker0: callhome.shlf.power.intr:error]: Call home for SHELF POWER INTERRUPTED.

  • ASUP HA 组通知还显示以下内容

HA Group Notification (CHASSIS POWER SUPPLY DEGRADED: PSU3) ERROR.

HA Group Notification (CHASSIS POWER DEGRADED: Power Supply Status Critical: PSU3.) ERROR.

HA Group Notification (SHELF POWER INTERRUPTED) ERROR.

HA Group Notification (SHELF_FAULT) ERROR.

  • 检查 SP-LATEST-SYSTEM-EVENT-LOG ,可以看到以下内容:

Record 589: Fri Aug 06 19:01:37.000000 2021 [SP.emergency]: System input power lost
Record 590: Thu Jan 01 00:00:49.400961 1970 [IPMI.notice]: 7204 | c0 | OEM: ffff7000ff00 | ManufId: 150300 | SP Power Reset
Record 591: Thu Jan 01 00:00:49.450536 1970 [IPMI.notice]: 7304 | c0 | OEM: fcff70560000 | ManufId: 150300 | POS Register: Power on Reset(Normal Power Cycle)
.

Record 407: Fri Feb 28 03:34:17.489482 2020 [Agent.notice]: 127.880: 3 : AC Power Loss Signal PSU1 de-asserted
Record 408: Fri Feb 28 03:34:17.489664 2020 [Agent.notice]: 128.100: 4 : AC Power Loss Signal PSU2 de-asserted
Record 409: Fri Feb 28 03:34:17.557708 2020 [Agent.notice]: 196.145: 4 : AC Power Loss Signal PSU2 asserted
Record 410: Fri Feb 28 03:34:17.570049 2020 [Agent.notice]: 208.526: 3 : AC Power Loss Signal PSU1 asserted
Record 411: Fri Feb 28 03:34:17.635848 2020 [Agent.notice]: 274.301: 14 : Attention LED (at Midplane) asserted
Record 412: Fri Feb 28 03:34:23.431854 2020 [Agent.notice]: 070.290: 14 : Attention LED (at Midplane) de-asserted
Record 413: Fri Feb 28 03:34:27.516634 2020 [SP.warning]: AC_OK Low Detected
Record 419: Fri Feb 28 03:39:47.942198 2020 [SP.critical]: Filer Reboots
.

  • 两个节点的电源同时丢失,不会发生接管。然后将直接执行重新启动。

[BMC.notice]: Eventd: Got an AC_OK Failed Interrupt ...
[IPMI.notice]: 01d8 | 02 | EVT: 0300ffff | Power_Good | Assertion Event, "State Deasserted"
[IPMI.notice]: 01d9 | 02 | EVT: 0300ffff | Power_Proc_OK | Assertion Event, "State Deasserted"
[IPMI.notice]: 01da | 02 | EVT: 6f01ffff | PSU1_Present | Assertion Event, "Absent"
[IPMI.notice]: 01db | 02 | EVT: 6f01ffff | PSU2_Present | Assertion Event, "Absent"
[IPMI.notice]: 01dc | 02 | EVT: 0301ffff | AC_Power_Fail | Assertion Event, "State Asserted"
[IPMI.notice]: 01dd | 02 | EVT: 0301ffff | LAN_MGMT_0_Rst | Assertion Event, "State Asserted"
[IPMI.notice]: 01de | 02 | EVT: 0900ffff | Wrench_Port_Up | Assertion Event, "Device Disabled"
[IPMI.notice]: 01df | 02 | EVT: 0300ffff | AC_Power_Fail | Assertion Event, "State Deasserted"
[IPMI.notice]: 01e0 | 02 | EVT: 0300ffff | LAN_MGMT_0_Rst | Assertion Event, "State Deasserted"
[BMC.warning]: AC_OK Low Detected
[IPMI.notice]: 01e1 | 02 | EVT: 015000ad | P3V3 | Assertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: 3.027
[IPMI.notice]: 01e2 | 02 | EVT: 015200a5 | P3V3 | Assertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: 2.887
[IPMI.notice]: 01e3 | 02 | EVT: 6fc21fff | System_FW_Status | Assertion Event, "System Firmware restarting"
[IPMI.notice]: 01e4 | 02 | EVT: 015003af | P12V_STBY | Assertion Event, "Lower Non-critical going low " | Reading: 0.186 | Threshold: 10.850
[IPMI.notice]: 01e5 | 02 | EVT: 015203aa | P12V_STBY | Assertion Event, "Lower Critical going low " | Reading: 0.186 | Threshold: 10.540
[IPMI.notice]: 01e6 | 02 | EVT: 6fc201ff | System_FW_Status | Assertion Event, "Memory initialization in progress"
[SysFW.notice]: Destage is started
[IPMI.notice]: 01e7 | 02 | EVT: 6fc203ff | System_FW_Status | Assertion Event, "Memory Initialization done"
[IPMI.notice]: 01e8 | 02 | EVT: 6fc21fff | System_FW_Status | Assertion Event, "System Firmware restarting"
[IPMI.notice]: 01e9 | 02 | EVT: 6fc201ff | System_FW_Status | Assertion Event, "Memory initialization in progress"
[SysFW.notice]: Time completing destage: 16 seconds
[IPMI.notice]: 01ea | 02 | EVT: 6fc220ff | System_FW_Status | Assertion Event, "Bootloader is running"
[IPMI.notice]: 01ea | c0 | OEM: ffff7000ff00 | ManufId: 150300 | BMC Power Reset
[IPMI.notice]: 01eb | c0 | OEM: fcff70560000 | ManufId: 150300 | POS Register: Power on Reset(Normal Power Cycle)

Or SP-LATEST-SYSTEM-EVENT-LOG.
  2cc | 11/26/2025 | 09:43:49 | Power Unit #0x60 | Power off/down | Asserted
 2cd | 11/26/2025 | 09:44:07 | OEM record c0 | 000000 | 000105000000
 2ce | 01/01/2000 | 00:00:20 | System Event #0xff | Timestamp Clock Sync | Asserted
 2cf | 11/26/2025 | 09:46:18 | System Event #0xff | Timestamp Clock Sync | Asserted
 2d0 | 11/26/2025 | 09:46:18 | Battery #0x4a | State Deasserted
 2d1 | 11/26/2025 | 09:46:18 | Battery #0x4b | State Asserted
 2d2 | 11/26/2025 | 09:46:18 | Battery #0x4c | State Asserted
 2d3 | 11/26/2025 | 09:46:18 | Battery #0x4d | State Deasserted
 2d4 | 11/26/2025 | 09:46:18 | Other FRU #0x50 | 
 2d5 | 11/26/2025 | 09:46:18 | Other FRU #0x50 | 
 2d6 | 11/26/2025 | 09:46:18 | Other FRU #0x50 | 
 2d7 | 11/26/2025 | 09:46:18 | Other FRU #0x50 | 
 2d8 | 11/26/2025 | 09:46:25 | Power Supply #0x20 | Presence detected | Asserted
 2d9 | 11/26/2025 | 09:46:25 | Power Supply #0x25 | Presence detected | Asserted
 2da | 11/26/2025 | 09:46:25 | Power Supply #0x72 | Presence detected | Asserted
 2db | 11/26/2025 | 09:46:25 | Power Supply #0x73 | Presence detected | Asserted
 2dc | 11/26/2025 | 09:46:26 | OEM record c0 | 000000 | 000105000000
 2dd | 11/26/2025 | 09:46:34 | Battery #0x4f | State Deasserted
 2de | 11/26/2025 | 09:46:35 | OEM record df | FPGA pull BMC whole reset
 2df | 11/26/2025 | 09:46:35 | OEM record df | Pilot FPGA AC cycle

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.