AFF A250、FAS500f上的控制器意外重新启动、并自动执行配对节点接管和交还
适用场景
- FAS500f
- AFF A250
- BMC FW 15.3 或更低版本
问题描述
- 节点意外重新启动,并自动从配对节点接管和交还。
- 重新启动节点时未显示可疑的 EMS 消息。示例:
Sun Jan 02 01:25:45 +0200 [node_name-01: config_thread: raid.rg.scrub.summary.lw:notice]: Scrub found 0 RAID write signature inconsistencies in /aggregate/plex0/rg0.Sun Jan 02 01:43:35 +0200 [node_name-01: kernel: netif.linkUp:info]: Ethernet lo0: Link up.- BMC 事件与 BMC 重新启动。示例:
35d | 01/01/2022 | 10:39:55 | System Event #0xff | Timestamp Clock Sync | Asserted
35e | 01/01/2000 | 00:00:20 | System Event | Timestamp Clock Sync | Asserted
35f | 01/01/2000 | 00:00:20 | System Event #0xff | Timestamp Clock Sync | Asserted
360 | 01/01/2022 | 23:42:54 | System Event #0xff | Timestamp Clock Sync | Asserted
361 | 01/01/2022 | 23:42:54 | System Event | Timestamp Clock Sync | Asserted
362 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
363 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
364 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
365 | 01/01/2022 | 23:43:10 | Other FRU #0x50 |
366 | 01/01/2022 | 23:43:10 | Power Supply #0x20 | Presence detected | Asserted
367 | 01/01/2022 | 23:43:10 | Power Supply #0x25 | Presence detected | Asserted
368 | 01/01/2022 | 23:43:14 | Battery #0x4f | State Deasserted
369 | 01/01/2022 | 23:45:00 | System Event #0xff | Timestamp Clock Sync | Asserted
- 配对节点接管消息。示例:
Sun Jan 02 01:41:39 +0200 [node_name-02: cf_main: cf.fsm.partnerNotResponding:notice]: Failover monitor: partner not respondingSun Jan 02 01:41:39 +0200 [node_name-02: cf_main: cf.fsm.takeoverCountdown:info]: Failover monitor: takeover scheduled in 10 secondsSun Jan 02 01:41:39 +0200 [node_name-02: cf_main: cf.fsm.takeoverByPartnerDisabled:error]: Failover monitor: takeover of node_name-02 by netapp03-06 disabled (HA interconnect error. Verify that the partner node is running and that the HA interconnect cabling is correct, if applicable. For further assistance, contact technical support).Sun Jan 02 01:41:49 +0200 [node_name-02: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.Sun Jan 02 01:41:49 +0200 [node_name-02: cf_takeover: cf.fm.takeoverStarted:notice]: Failover monitor: takeover started