由于电源问题,控制器完全自动接管
适用于
- ONTAP 9
- AFF 系统
- FAS 系统
问题描述
- 合作伙伴节点经历了自动接管,以下内容可以在
EMS-LOG-FILE
callhome.sfo.takeover:alert]: Call home for CONTROLLER TAKEOVER COMPLETE AUTOMATICcf_takeover: callhome.reboot.takeover:notice]: Call home for PARTNER REBOOT (CONTROLLER TAKEOVER)cf_takeover: cf.fm.takeoverComplete:notice]: Failover monitor: takeover completedsplog_main: mgr.boot.reason_abnormal:EMERGENCY]: System rebooted due to a power glitch.
splog_main: callhome.reboot.glitch:notice]: Call home for REBOOT (power glitch)
- 货架还报告了电源故障,见
EMS-LOG-FILE
cf_hwassist: cf.hwassist.takeoverTrapRecv:notice]: hw_assist: Received takeover hw_assist alert from partner(cluster1-01), system_down because power_loss.
dsa_worker5: ses.status.psWarning:error]: DS224-12 (S/N SHF#############) shelf 0 on channel 7a power warning for Power supply 2: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom right.
dsa_worker4: ses.status.psWarning:error]: DS224-12 (S/N SHF#############) shelf 10 on channel 9d power warning for Power supply 2: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom right.
Sat Jan 11 00:30:40 +0100 [snes1p208_01: dsa_worker0: callhome.shlf.power.intr:error]: Call home for SHELF POWER INTERRUPTED
- ASUP HA 组通知还显示以下内容
HA Group Notification (CHASSIS POWER SUPPLY DEGRADED: PSU3) ERROR
HA Group Notification (CHASSIS POWER DEGRADED: Power Supply Status Critical: PSU3.) ERROR
HA Group Notification (SHELF POWER INTERRUPTED) ERROR
HA Group Notification (SHELF_FAULT) ERROR
- 检查
SP-LATEST-SYSTEM-EVENT-LOG可以看到以下内容:
Record 589: Fri Aug 06 19:01:37.000000 2021 [SP.emergency]: System input power lost
Record 590: Thu Jan 01 00:00:49.400961 1970 [IPMI.notice]: 7204 | c0 | OEM: ffff7000ff00 | ManufId: 150300 | SP Power Reset
Record 591: Thu Jan 01 00:00:49.450536 1970 [IPMI.notice]: 7304 | c0 | OEM: fcff70560000 | ManufId: 150300 | POS Register: Power on Reset(Normal Power Cycle)
Record 407: Fri Feb 28 03:34:17.489482 2020 [Agent.notice]: 127.880: 3 : AC Power Loss Signal PSU1 de-asserted
Record 408: Fri Feb 28 03:34:17.489664 2020 [Agent.notice]: 128.100: 4 : AC Power Loss Signal PSU2 de-asserted
Record 409: Fri Feb 28 03:34:17.557708 2020 [Agent.notice]: 196.145: 4 : AC Power Loss Signal PSU2 asserted
Record 410: Fri Feb 28 03:34:17.570049 2020 [Agent.notice]: 208.526: 3 : AC Power Loss Signal PSU1 asserted
Record 411: Fri Feb 28 03:34:17.635848 2020 [Agent.notice]: 274.301: 14 : Attention LED (at Midplane) asserted
Record 412: Fri Feb 28 03:34:23.431854 2020 [Agent.notice]: 070.290: 14 : Attention LED (at Midplane) de-asserted
Record 413: Fri Feb 28 03:34:27.516634 2020 [SP.warning]: AC_OK Low Detected
Record 419: Fri Feb 28 03:39:47.942198 2020 [SP.critical]: Filer Reboots
- 两个节点的电源同时丢失,不会发生接管。然后将直接执行重新启动。
[BMC.notice]: Eventd: Got an AC_OK Failed Interrupt ...
[IPMI.notice]: 01d8 | 02 | EVT: 0300ffff | Power_Good | Assertion Event, "State Deasserted"
[IPMI.notice]: 01d9 | 02 | EVT: 0300ffff | Power_Proc_OK | Assertion Event, "State Deasserted"
[IPMI.notice]: 01da | 02 | EVT: 6f01ffff | PSU1_Present | Assertion Event, "Absent"
[IPMI.notice]: 01db | 02 | EVT: 6f01ffff | PSU2_Present | Assertion Event, "Absent"
[IPMI.notice]: 01dc | 02 | EVT: 0301ffff | AC_Power_Fail | Assertion Event, "State Asserted"
[IPMI.notice]: 01dd | 02 | EVT: 0301ffff | LAN_MGMT_0_Rst | Assertion Event, "State Asserted"
[IPMI.notice]: 01de | 02 | EVT: 0900ffff | Wrench_Port_Up | Assertion Event, "Device Disabled"
[IPMI.notice]: 01df | 02 | EVT: 0300ffff | AC_Power_Fail | Assertion Event, "State Deasserted"
[IPMI.notice]: 01e0 | 02 | EVT: 0300ffff | LAN_MGMT_0_Rst | Assertion Event, "State Deasserted"
[BMC.warning]: AC_OK Low Detected
[IPMI.notice]: 01e1 | 02 | EVT: 015000ad | P3V3 | Assertion Event, "Lower Non-critical going low " | Reading: 0.000 | Threshold: 3.027
[IPMI.notice]: 01e2 | 02 | EVT: 015200a5 | P3V3 | Assertion Event, "Lower Critical going low " | Reading: 0.000 | Threshold: 2.887
[IPMI.notice]: 01e3 | 02 | EVT: 6fc21fff | System_FW_Status | Assertion Event, "System Firmware restarting"
[IPMI.notice]: 01e4 | 02 | EVT: 015003af | P12V_STBY | Assertion Event, "Lower Non-critical going low " | Reading: 0.186 | Threshold: 10.850
[IPMI.notice]: 01e5 | 02 | EVT: 015203aa | P12V_STBY | Assertion Event, "Lower Critical going low " | Reading: 0.186 | Threshold: 10.540
[IPMI.notice]: 01e6 | 02 | EVT: 6fc201ff | System_FW_Status | Assertion Event, "Memory initialization in progress"
[SysFW.notice]: Destage is started
[IPMI.notice]: 01e7 | 02 | EVT: 6fc203ff | System_FW_Status | Assertion Event, "Memory Initialization done"
[IPMI.notice]: 01e8 | 02 | EVT: 6fc21fff | System_FW_Status | Assertion Event, "System Firmware restarting"
[IPMI.notice]: 01e9 | 02 | EVT: 6fc201ff | System_FW_Status | Assertion Event, "Memory initialization in progress"
[SysFW.notice]: Time completing destage: 16 seconds
[IPMI.notice]: 01ea | 02 | EVT: 6fc220ff | System_FW_Status | Assertion Event, "Bootloader is running"
[IPMI.notice]: 01ea | c0 | OEM: ffff7000ff00 | ManufId: 150300 | BMC Power Reset
[IPMI.notice]: 01eb | c0 | OEM: fcff70560000 | ManufId: 150300 | POS Register: Power on Reset(Normal Power Cycle)