ONTAP 升级后,交换机运行状况显示“未知”
适用场景
- ONTAP 9
- NVIDIA 交换机
问题描述
- 升级 ONTAP 后,交换机健康监控子系统报告“未知”状态并挂在启动发现状态
日志输出示例:
::> system health subsystem show -subsystem switch-health -instance
Subsystem: Switch-Health
Health: unknown
InitializationState: start_discovery
Number of Outstanding Alerts: 0
Number of Suppressed Alerts: 0
Node: <Node-Name>
SubsystemRefreshInterval: 5m
- 然而,交换机可以通过 SNMP 成功监控
::> system switch ethernet show
Switch Type Address Model
--------------------------- ------------------ ---------------- ---------------
NETAPP-SW1 (9X:XX:XX:XX:XX:XX) storage-network 10.xx.xx.xx MSN2100-CB2FC
Serial Number: MT2302TXXXXX
Is Monitored: true
Reason: None
Software Version: Cumulus Linux version 5.11.0 running on Mellanox
Technologies Ltd. MSN2100
Version Source: SNMP
NETAPP-SW2 (8X:XX:XX:XX:XX:XX) storage-network 10.xx.xx.xx MSN2100-CB2FC
Serial Number: MT2308TXXXXX
Is Monitored: true
Reason: None
Software Version: Cumulus Linux version 5.11.0 running on Mellanox
Technologies Ltd. MSN2100
Version Source: SNMP
- 删除并重新添加交换机无法解决问题
::*> system switch ethernet delete -device <device_name>
::*> system switch ethernet create -device "<DEVICE_NAME> (<MAC_ADRESS>)" -address <IP_ADDRESS> -snmp-version <SNMP_VERSION> -community-or-username <COMUNITY_OR_USERNAME> -model <MODEL> -type <TYPE> -is-monitoring-enabled-admin <TRUE_OR_FALSE>
- 重新启动“cshmd”也会卡在启动发现阶段