由于电源问题,控制器完全自动接管
适用于
- ONTAP 9
- AFF 系统
- FAS 系统
问题
- 合作伙伴 Node 经历了自动接管,以下内容可以在
EMS-LOG-FILE中找到
callhome.sfo.takeover:alert]: Call home for CONTROLLER TAKEOVER COMPLETE AUTOMATICcf_takeover: callhome.reboot.takeover:notice]: Call home for PARTNER REBOOT (CONTROLLER TAKEOVER)cf_takeover: cf.fm.takeoverComplete:notice]: Failover monitor: takeover completedsplog_main: mgr.boot.reason_abnormal:EMERGENCY]: System rebooted due to a power glitch..
splog_main: callhome.reboot.glitch:notice]: Call home for REBOOT (power glitch)
- 磁盘架还报告了电源故障,见
EMS-LOG-FILE
cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(cluster1-01), system_down because power_loss..
dsa_worker5: ses.status.psWarning:error]: DS224-12 (S/N SHF#############) shelf 0 on channel 7a power warning for Power supply 2: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom right.
dsa_worker4: ses.status.psWarning:error]: DS224-12 (S/N SHF#############) shelf 10 on channel 9d power warning for Power supply 2: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom right.
Sat Jan 11 00:30:40 +0100 [snes1p208_01: dsa_worker0: callhome.shlf.power.intr:error]: Call home for SHELF POWER INTERRUPTED.
- ASUP HA 组通知还显示以下内容
HA Group Notification (CHASSIS POWER SUPPLY DEGRADED: PSU3) ERROR.
HA Group Notification (CHASSIS POWER DEGRADED: Power Supply Status Critical: PSU3.) ERROR.
HA Group Notification (SHELF POWER INTERRUPTED) ERROR.
HA Group Notification (SHELF_FAULT) ERROR.
- 检查
SP-LATEST-SYSTEM-EVENT-LOG,可以看到以下内容:
Record 589: Fri Aug 06 19:01:37.000000 2021 [SP.emergency]: System input power lost.
Record 590: Thu Jan 01 00:00:49.400961 1970 [IPMI.notice]: 7204 | c0 | OEM: ffff7000ff00 | ManufId: 150300 | SP Power Reset
Record 591: Thu Jan 01 00:00:49.450536 1970 [IPMI.notice]: 7304 | c0 | OEM: fcff70560000 | ManufId: 150300 | POS Register: Power on Reset(Normal Power Cycle)
Record 407: Fri Feb 28 03:34:17.489482 2020 [Agent.notice]: 127.880: 3 : AC Power Loss Signal PSU1 de-asserted.
Record 408: Fri Feb 28 03:34:17.489664 2020 [Agent.notice]: 128.100: 4 : AC Power Loss Signal PSU2 de-asserted
Record 409: Fri Feb 28 03:34:17.557708 2020 [Agent.notice]: 196.145: 4 : AC Power Loss Signal PSU2 asserted
Record 410: Fri Feb 28 03:34:17.570049 2020 [Agent.notice]: 208.526: 3 : AC Power Loss Signal PSU1 asserted
Record 411: Fri Feb 28 03:34:17.635848 2020 [Agent.notice]: 274.301: 14 : Attention LED (at Midplane) asserted
Record 412: Fri Feb 28 03:34:23.431854 2020 [Agent.notice]: 070.290: 14 : Attention LED (at Midplane) de-asserted
Record 413: Fri Feb 28 03:34:27.516634 2020 [SP.warning]: AC_OK Low Detected
Record 419: Fri Feb 28 03:39:47.942198 2020 [SP.critical]: Filer Reboots
- 两个节点的电源同时丢失,不会发生接管。然后将直接执行重新启动。
[BMC.notice]: Eventd: Got an AC_OK Failed Interrupt ...
[IPMI.notice]: 01d8 | 02 | EVT: 0300ffff | Power_Good | Assertion Event, "State Deasserted"
[IPMI.notice]: 01d9 | 02 | EVT: 0300ffff | Power_Proc_OK | Assertion Event, "State Deasserted"
[IPMI.notice]: 01da | 02 | EVT: 6f01ffff | PSU1_Present | Assertion Event, "Absent"
[IPMI.notice]: 01db | 02 | EVT: 6f01ffff | PSU2_Present | Assertion Event, "Absent"
[IPMI.notice]: 01dc | 02 | EVT: 0301ffff | AC_Power_Fail | Assertion Event, "State Asserted"
[IPMI.notice]: 01dd | 02 | EVT: 0301ffff | LAN_MGMT_0_Rst | Assertion Event, "State Asserted"
[IPMI.notice]: 01de | 02 | EVT: 0900ffff | Wrench_Port_Up | Assertion Event, "Device Disabled"
[IPMI.notice]: 01df | 02 | EVT: 0300ffff | AC_Power_Fail | Assertion Event, "State Deasserted"
[IPMI.notice]: 01e0 | 02 | EVT: 0300ffff | LAN_MGMT_0_Rst | Assertion Event, "State Deasserted"
[BMC.warning]: AC_OK Low Detected
[IPMI.notice]: 01e1 | 02 | EVT: 015000ad | P3V3 | Assertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: 3.027
[IPMI.notice]: 01e2 | 02 | EVT: 015200a5 | P3V3 | Assertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: 2.887
[IPMI.notice]: 01e3 | 02 | EVT: 6fc21fff | System_FW_Status | Assertion Event, "System Firmware restarting"
[IPMI.notice]: 01e4 | 02 | EVT: 015003af | P12V_STBY | Assertion Event, "Lower Non-critical going low " | Reading: 0.186 | Threshold: 10.850
[IPMI.notice]: 01e5 | 02 | EVT: 015203aa | P12V_STBY | Assertion Event, "Lower Critical going low " | Reading: 0.186 | Threshold: 10.540
[IPMI.notice]: 01e6 | 02 | EVT: 6fc201ff | System_FW_Status | Assertion Event, "Memory initialization in progress"
[SysFW.notice]: Destage is started
[IPMI.notice]: 01e7 | 02 | EVT: 6fc203ff | System_FW_Status | Assertion Event, "Memory Initialization done"
[IPMI.notice]: 01e8 | 02 | EVT: 6fc21fff | System_FW_Status | Assertion Event, "System Firmware restarting"
[IPMI.notice]: 01e9 | 02 | EVT: 6fc201ff | System_FW_Status | Assertion Event, "Memory initialization in progress"
[SysFW.notice]: Time completing destage: 16 seconds
[IPMI.notice]: 01ea | 02 | EVT: 6fc220ff | System_FW_Status | Assertion Event, "Bootloader is running"
[IPMI.notice]: 01ea | c0 | OEM: ffff7000ff00 | ManufId: 150300 | BMC Power Reset
[IPMI.notice]: 01eb | c0 | OEM: fcff70560000 | ManufId: 150300 | POS Register: Power on Reset(Normal Power Cycle)
Or SP-LATEST-SYSTEM-EVENT-LOG.2cc | 11/26/2025 | 09:43:49 | Power Unit #0x60 | Power off/down | Asserted
2cd | 11/26/2025 | 09:44:07 | OEM record c0 | 000000 | 000105000000
2ce | 01/01/2000 | 00:00:20 | System Event #0xff | Timestamp Clock Sync | Asserted
2cf | 11/26/2025 | 09:46:18 | System Event #0xff | Timestamp Clock Sync | Asserted
2d0 | 11/26/2025 | 09:46:18 | Battery #0x4a | State Deasserted
2d1 | 11/26/2025 | 09:46:18 | Battery #0x4b | State Asserted
2d2 | 11/26/2025 | 09:46:18 | Battery #0x4c | State Asserted
2d3 | 11/26/2025 | 09:46:18 | Battery #0x4d | State Deasserted
2d4 | 11/26/2025 | 09:46:18 | Other FRU #0x50 |
2d5 | 11/26/2025 | 09:46:18 | Other FRU #0x50 |
2d6 | 11/26/2025 | 09:46:18 | Other FRU #0x50 |
2d7 | 11/26/2025 | 09:46:18 | Other FRU #0x50 |
2d8 | 11/26/2025 | 09:46:25 | Power Supply #0x20 | Presence detected | Asserted
2d9 | 11/26/2025 | 09:46:25 | Power Supply #0x25 | Presence detected | Asserted
2da | 11/26/2025 | 09:46:25 | Power Supply #0x72 | Presence detected | Asserted
2db | 11/26/2025 | 09:46:25 | Power Supply #0x73 | Presence detected | Asserted
2dc | 11/26/2025 | 09:46:26 | OEM record c0 | 000000 | 000105000000
2dd | 11/26/2025 | 09:46:34 | Battery #0x4f | State Deasserted
2de | 11/26/2025 | 09:46:35 | OEM record df | FPGA pull BMC whole reset
2df | 11/26/2025 | 09:46:35 | OEM record df | Pilot FPGA AC cycle