AFF-A300 上的 HA 互连链路断开
适用于
AFF-A300
问题描述
- 更换故障节点上的主板后,HA 互连保持离线状态。
- 系统反复出现链路抖动,最终停机。
系统 ha-interconnect status show 的输出:
Node A: Logical Link status is Down
Node B: Logical Link status is Down
NODE-A
slot 0: Interconnect HBA: Generic OFED Provider
Port Name: ic0a
GID: fe80:0000:0000:0000:0000:0000:0000:0104
Base LID: 0x104
Active MTU: 8192
slot 0: NTB Interconnect (PLX87b0)
Max HW Data Rate: PCIe Gen 3 x 8
HW Data Rate: PCIe Gen 1 x 0
SW Data Rate: PCIe Gen 1 x 0
Logical Link: Down <<<<<<
Port State: Enabled
NODE-B
slot 0: Interconnect HBA: Generic OFED Provider
Port Name: ic0a
GID: fe80:0000:0000:0000:0000:0000:0000:0105
Base LID: 0x105
Active MTU: 8192
slot 0: NTB Interconnect (PLX87b0)
Max HW Data Rate: PCIe Gen 3 x 8
HW Data Rate: PCIe Gen 1 x 8
SW Data Rate: PCIe Gen 3 x 0
Logical Link: Down <<<<<
Port State: Enabled
EMS 日志:
[?] Tue Sep 09 14:24:42 +0200 [NODE-A: gop_eq_thread: ic.linkStatusChange:info]: HA interconnect: Port ic0a link is down.
[?] Tue Sep 09 14:25:55 +0200 [NODE-A: gop_eq_thread: ic.linkStatusChange:info]: HA interconnect: Port ic0a link is up.
或
[?] Mon Sep 15 19:00:00 +0200 [NODE-A: statd: ic.HAInterconnectDown:error]: HA interconnect: Interconnect down for 5438 minutes: links down
[?] Mon Sep 15 20:00:00 +0200 [NODE-A: statd: ic.HAInterconnectDown:error]: HA interconnect: Interconnect down for 5498 minutes: links down
- 通过从底盘上卸下控制器来执行 HA 对的硬电源循环
- HA 对暂时恢复但出现震荡并再次失败
- 在插入伙伴节点的情况下,尝试对节点 A 进行主板重新就位,但没有变化
- 在插入伙伴节点的情况下,对节点 A 执行了主板更换,但没有变化
- 在机箱中插入伙伴节点的情况下,对节点 B 执行了主板重新就位,但没有变化