AFF A250、FAS500f上的控制器意外重新启动、并自动执行配对节点接管和交还
适用场景
- FAS500f
- AFF A250
- BMC FW 15.3 或更低版本
问题描述
- 节点意外重新启动,并自动从配对节点接管和交还。
- 重新启动节点时未显示可疑的 EMS 消息。示例:
Sun Jan 02 01:25:45 +0200 [node_name-01: config_thread: raid.rg.scrub.summary.lw:notice]: Scrub found 0 RAID write signature inconsistencies in /aggregate/plex0/rg0.
Sun Jan 02 01:43:35 +0200 [node_name-01: kernel: netif.linkUp:info]: Ethernet lo0: Link up.
- BMC 事件与 BMC 重新启动。示例:
35d | 01/01/2022 | 10:39:55 | System Event #0xff | Timestamp Clock Sync | Asserted
35e | 01/01/2000 | 00:00:20 | System Event | Timestamp Clock Sync | Asserted
35f | 01/01/2000 | 00:00:20 | System Event #0xff | Timestamp Clock Sync | Asserted
360 | 01/01/2022 | 23:42:54 | System Event #0xff | Timestamp Clock Sync | Asserted
361 | 01/01/2022 | 23:42:54 | System Event | Timestamp Clock Sync | Asserted
362 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
363 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
364 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
365 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
366 | 01/01/2022 | 23:43:10 | Power Supply #0x20 | Presence detected | Asserted
367 | 01/01/2022 | 23:43:10 | Power Supply #0x25 | Presence detected | Asserted
368 | 01/01/2022 | 23:43:14 | Battery #0x4f | State Deasserted
369 | 01/01/2022 | 23:45:00 | System Event #0xff | Timestamp Clock Sync | Asserted
- 配对节点接管消息。示例:
Sun Jan 02 01:41:39 +0200 [node_name-02: cf_main: cf.fsm.partnerNotResponding:notice]: Failover monitor: partner not responding
Sun Jan 02 01:41:39 +0200 [node_name-02: cf_main: cf.fsm.takeoverCountdown:info]: Failover monitor: takeover scheduled in 10 seconds
Sun Jan 02 01:41:39 +0200 [node_name-02: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node_name-02 by netapp03-06 disabled (HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support).
Sun Jan 02 01:41:49 +0200 [node_name-02: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.
Sun Jan 02 01:41:49 +0200 [node_name-02: cf_takeover: cf.fm.takeoverStarted:notice]: Failover monitor: takeover started