由于未连接磁盘,在 FAS62xx/FAS80xx 上更换主板后无法交还
适用场景
- FAS62xx
- FAS80xx
- AFF8080
- 主板更换
- NVRAM 更换
- 未分区的驱动器
问题描述
- 由于接管节点上的 HA 互连端口关闭而未找到根卷,因此无法执行交还。
WARNING: there do not appear to be any disks attached to the system. No root volume found. Rebooting... (press ctrl-c during boot to break reboot loop)
- 互连链路在接管节点上关闭,则 NVRAM 卡可能会在接管节点上处于挂起状态。
- 控制器 IOXM ( CI )设置中,物理端口两端均显示为 down ( loopback 表示卡上的两个互连链路均已关闭)。
- 接管后,您可能会从接管节点上的 EMS 收到以下消息
Wed Dec 06 12:37:27 GMT [n2: ib_nap_tx_2: connectx.shoutTimeout:debug]: Node advertisement send timed out on Port ib0b.
Wed Dec 06 12:37:29 GMT [n2: ib_nap_tx_1: connectx.shoutTimeout:debug]: Node advertisement send timed out on Port ib0a.
Wed Dec 06 12:37:37 GMT [n2: cfdisk_config: cf.diskinventory.sendFailed:debug]: params: {'errorCode': '1', 'reason': 'HA Interconnect down'}
Wed Dec 06 12:37:40 GMT [n2: ib_nap_tx_2: connectx.shout.portDisabled:critical]: Node advertisement send timed out on Port ib0b. ConnectX registers have been dumped to the /etc/ConnectX_regdump file.
Wed Dec 06 12:37:40 GMT [n2: mlx4_intr_handler: mlx4.link.statusChange:info]: InfiniBand port ib0b: Link down.
Wed Dec 06 12:37:41 GMT [n2: ib_nap_tx_2: ems.engine.suppressed:debug]: Event 'rdma.rdr.opFailed' suppressed 5 times in last 29618503 seconds.
Wed Dec 06 12:37:41 GMT [n2: ib_nap_tx_2: rdma.rdr.opFailed:debug]: RDR operation get_entity_property failed on error 7005.
Wed Dec 06 12:37:42 GMT [n2: ib_nap_tx_1: connectx.shout.portDisabled:critical]: Node advertisement send timed out on Port ib0a. ConnectX registers have been dumped to the /etc/ConnectX_regdump file.
Wed Dec 06 12:37:42 GMT [n2: mlx4_intr_handler: mlx4.link.statusChange:info]: InfiniBand port ib0a: Link down.
Wed Dec 06 12:37:44 GMT [n2: ib_mad2_wq: ems.engine.suppressed:debug]: Event 'ic.rdma.qpDisconnected' suppressed 4 times in last 29618502 seconds.
Wed Dec 06 12:37:44 GMT [n2: ib_mad2_wq: ic.rdma.qpDisconnected:debug]: kstat is disconnected.
- 尝试执行交还时,接管节点不会将配对节点显示为正在等待交还:
示例:
7- 模式:(配对节点正在接管,但未显示 Waiting for Giveback ):
n2(takeover)> cf status
n1 has taken over n2.
集群模式
n2
<-- 应 " 正在等待交还 "
n1 false In takeover
n1
n2 - Unknown
- 检查互连时,请注意此互连已关闭
7-模式:
n2*> ic status
Link 0: down
Link 1: down
IC RDMA connection : down
集群模式
cluster::*> storage failover interconnect show-link local
Node Port Number Link State
------------------------------------------------------------------------------
n2
0 down
1 down
2 entries were displayed.
- 在物理上,如果控制器采用控制器 -IOXM ( CI )设置,则物理 HA 互连链路将不显示链路指示灯。如果在故障节点等待交还时在 HA 互连端口上进行环路(使用缆线从端口 0 到同一控制器上的端口 1 ),则故障控制器上会亮起,但接管节点上不会亮起。
- 尝试手动启动互连端口,但收到以下错误
7-模式:
n2(takeover)*> ic link on 0
Error: Failed to perform requested operation on port 0 due to an internal error.
The port has been disabled. To re-enable the port, reboot the system.
集群模式
cluster::*> interconnect link on -node n2 -link 0
(system ha interconnect link on)
Error: command failed: Failed to perform requested operation on link 0 due to
an internal error. The port has been disabled. To re-enable the port,
reboot the system.
- 如果在接管节点上发现上述错误,则 NVRAM 卡可能会进入挂起状态。