集群交换机 N9K C9336 重启导致集群通信中断
适用于
- FAS/AFF 系统
- Cisco N9K-C9336C-FX2 集群交换机
- NX-OS 版本 10.2.5
问题
- 所有节点上的两个集群端口同时关闭,导致集群通信丢失:
Sat Nov 01 00:30:34 [Node-01: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3b: Link down, check cable.Sat Nov 01 00:31:23 [Node-01: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3a: Link down, check cable.
Sat Nov 01 00:30:34 [Node-02: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3b: Link down, check cable.Sat Nov 01 00:31:23 [Node-02: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3a: Link down, check cable.
Sat Nov 01 00:30:35 [Node-03: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3b: Link down, check cable.Sat Nov 01 00:31:24 [Node-03: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3a: Link down, check cable.
Sat Nov 01 00:30:35 [Node-04: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3b: Link down, check cable.Sat Nov 01 00:31:24 [Node-04: mgmt_port_link_status_poll: netif.linkDown:info]: Ethernet e3a: Link down, check cable.
- 集群中的所有节点都超出 CLAM 仲裁:
Sat Nov 01 00:32:30 [Node-01: kltp: clam.node.ooq:EMERGENCY]: Node (name=Node-01, ID=1000) is out of "CLAM quorum" (reason=node in minority).Sat Nov 01 00:32:31 [Node-02: kltp: clam.node.ooq:EMERGENCY]: Node (name=Node-01, ID=1000) is out of "CLAM quorum" (reason=node in minority).Sat Nov 01 00:32:22 [Node-03: kltp: clam.node.ooq:EMERGENCY]: Node (name=Node-01, ID=1000) is out of "CLAM quorum" (reason=node in minority).Sat Nov 01 00:32:31 [Node-04: kltp: clam.node.ooq:EMERGENCY]: Node (name=Node-01, ID=1000) is out of "CLAM quorum" (reason=node in minority).
- 群集的 RDB 变得不同步,导致仲裁丢失。
- 交换机日志表明,两个群集交换机都进行了重新启动,并且链接到这些交换机的群集端口变为非活动状态:
Cluster-switch1:
Sat Nov 1 04:39:01 2025: Card Uptime Record----------------------------------------------Uptime: 83, 0 days 0 hour(s) 1 minute(s) 23 second(s)Reset Reason: Unknown (0)
Cluster-switch-2:
Sat Nov 1 04:38:33 2025: Card Uptime Record----------------------------------------------Uptime: 83, 0 days 0 hour(s) 1 minute(s) 23 second(s)Reset Reason: Unknown (0)