跳转到主内容

HA 对中的两个节点均因断电而重新启动

Views:
28
Visibility:
Public
Votes:
0
Category:
aff-series
Specialty:
hw
Last Updated:

适用场景

  • FAS系统
  • AFF系统

问题描述

  • HA 对中的两个节点同时重启。EMS
  • 日志示例(同时在两个节点上重复):两个 PSU 的直流电压欠压和交流电源故障:

[node_name: dsa_worker3: ses.status.psWarning:error]: DS224-12 (S/N 012345678910) shelf 0 on channel 0b power warning for Power supply 1: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom left.
[node_name: dsa_worker4: ses.status.psError:alert]: DS224-12 (S/N 012345678910) shelf 0 on channel 0b power error for Power supply 1: critical status; AC Fail. This module is on the rear of the shelf at the bottom left.
[node_name: dsa_worker4: callhome.shlf.power.intr:error]: Call home for SHELF POWER INTERRUPTED
[node_name: statd: monitor.shelf.fault:alert]: Critical fault reported on disk storage shelf attached to channel 0b. Check fans, power supplies, disks, and temperature sensors.
[node_name: power_low_monitor: monitor.chassisPower.degraded:alert]: Chassis power is degraded: Power Supply Status Critical: PSU1.
[node_name: power_low_monitor: callhome.chassis.power:error]: Call home for CHASSIS POWER DEGRADED: Power Supply Status Critical: PSU1.
[node_name: monitor: monitor.globalStatus.critical:EMERGENCY]: Power Supply Status Critical: PSU1. Disk shelf fault.
[node_name: dsa_worker2: ses.status.psInfo:info]: DS224-12 (S/N 9872957495809) shelf 0 on channel 0b power supply information for Power supply 1: normal status.
[node_name: dsa_worker0: ses.status.psWarning:error]: DS224-12 (S/N 012345678910) shelf 0 on channel 0b power warning for Power supply 2: warning status; DC undervoltage. This module is on the rear of the shelf at the bottom right.
[node_name: dsa_worker2: callhome.shlf.ps.fault:error]: Call home for SHELF POWER SUPPLY WARNING

  • BMC/SP 事件报告断电(同时在两个节点上重复):

Record 2435: Mon Dec 05 22:33:43.000000 2022 [BMC.emergency]: System input power lost
Record 2436: Sun Jan 01 00:00:22.310000 2017 [IPMI.notice]: 05f2 | c0 | OEM: ffff7000ff00 | ManufId: 150300 | BMC Power Reset
Record 2437: Sun Jan 01 00:00:22.330000 2017 [IPMI.notice]: 05f3 | c0 | OEM: fcff70560000 | ManufId: 150300 | POS Register: Power on Reset(Normal Power Cycle)

Record 1596: Sat Sep 11 08:03:16 2021 [SP.emergency]: System input power lost
Record 1597: Thu Jan  1 00:00:32 1970 [IPMI.notice]: ce01 | c0 | OEM: ffff7000ff00 | ManufId: 150300 | SP Power Reset
Record 1598: Thu Jan  1 00:00:32 1970 [IPMI.notice]: cf01 | c0 | OEM: fcff70560000 | ManufId: 150300 | POS Register: Power on Reset(Normal Power Cycle)

  • BMC/SP 系统日志报告电源问题(同时在两个节点上重复出现)示例

BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: Power Action:needed(0) action(0); Alert Action: needed(1) action(17)
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: EventFilter: event on sensor(#0x32 dir:3) match (15) ALERT
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: Power Action:needed(0) action(0); Alert Action: needed(1) action(17)
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: EventFilter: event on sensor(#0x34 dir:3) match (15) ALERT
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: Power Action:needed(0) action(0); Alert Action: needed(1) action(17)
BMC hsam[1426]: FRU /chassis-1 LED on
BMC hsam[1426]: FRU /chassis-1/controller-b/cna-3 LED on
BMC hsam[1426]: HSAM OS(bmc):cmd(set) FLD(cna-4):fault(Overcurrent Protection Fault)
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: EventFilter: event on sensor(#0x5b dir:3) match (15) ALERT
BMC hsam[1426]: FRU /chassis-1 LED on
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: Power Action:needed(0) action(0); Alert Action: needed(1) action(17)
BMC hsam[1426]: FRU /chassis-1/controller-b/cna-4 LED on
BMC hsam[1426]: HSAM OS(bmc):cmd(set) FLD(cna-1):fault(Overcurrent Protection Fault)
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: EventFilter: event on sensor(#0x5d dir:3) match (15) ALERT
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: Power Action:needed(0) action(0); Alert Action: needed(1) action(17)
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: EventFilter: event on sensor(#0x5e dir:3) match (15) ALERT
BMC IPMIMain[1142]: [1142 : 1167 INFO]PEF.c: Power Action:needed(0) action(0); Alert Action: needed(1) action(17)

  • 重新安装或更换 PSU 和/或控制器后,问题仍然存在。

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.