AFF 或 FAS 80x0 节点在发生电源正常的已取消断言 SP 事件时被接管
适用场景
- AFF 和 FAS 8000 系列
- AFF8080
- AFF8060
- AFF8040
- AFF8020
- FAS8080
- FAS8060
- FAS8040
- FAS8020
问题描述
- AFF 或 FAS 8000 服务器节点会自动接管,此节点可能无法重新启动,也可能无法等待交还
- NetApp Active IQ 报告自动故障转移:
HA Group Notification (CONTROLLER TAKEOVER COMPLETE AUTOMATIC) ALERT
- 从已关闭节点收集的 SP 日志指示事件日志中存在 Power Good 已取消断言事件的 NMI :
Record 2539: Sun Feb 28 15:43:20.796297 2021 [Agent.notice]: 561.059: 29 : Non-maskable Interrupt from PCH to CPU asserted
Record 2540: Sun Feb 28 15:43:20.796468 2021 [Agent.notice]: 561.059: 49 : PCH Platform Reset asserted
Record 2541: Sun Feb 28 15:43:20.796617 2021 [Agent.notice]: 561.060: 0 : CPU Power Good from PCH via voltage translation de-asserted
Record 2542: Sun Feb 28 15:43:20.796752 2021 [Agent.notice]: 561.060: 11 : Controller Attention LED asserted
Record 2543: Sun Feb 28 15:43:20.796890 2021 [Agent.notice]: 561.060: 14 : Attention LED (at Midplane) asserted
Record 2544: Sun Feb 28 15:43:20.797031 2021 [Agent.notice]: 561.090: 2 : CPU domain power good (from seq. CPLD) de-asserted
Record 2545: Sun Feb 28 15:43:20.797164 2021 [Agent.notice]: 561.090: 63 : BIOS Complete from PCH de-asserted
Record 2546: Sun Feb 28 15:43:20.797303 2021 [Agent.notice]: 561.548: 59 : NVRAM 12V Power Good de-asserted
Record 2547: Sun Feb 28 15:43:22.906594 2021 [Agent.notice]: 671.323: 30 : Non-maskable Interrupt to PCH asserted
Record 2548: Sun Feb 28 15:43:23.002877 2021 [Agent.notice]: 767.320: 30 : Non-maskable Interrupt to PCH de-asserted
Record 2549: Sun Feb 28 15:43:23.026810 2021 [IPMI Event.critical]: NMI
Record 2550: Sun Feb 28 15:43:23.954074 2021 [IPMI Event.critical]: L2 watchdog timeout hard reset
Record 2551: Sun Feb 28 15:43:23.983703 2021 [Trap Event.critical]: hwassist l2_watchdog_reset (29)