Unified Manager 9.14 及更高版本无法添加或监控具有云代理连接错误的集群
适用于
- Active IQ Unified Manager (AIQUM) 9.14 及更高版本
- 所有 OS 平台
- ONTAP 9.14 及更高版本
问题描述
- 事件管理系统(EMS)在集群上报告错误:
[cluster1-01: mgwd: mhost.ca.connect.failure:error]: Cluster agent connection of the client: UnifiedManager_<UUID> is not healthy. Attempting to reconnect. Error: AMQP transport failed for connection "UnifiedManager_<UUID>". Reason: Error during websocket handshake: Unable to connect.
- Active IQ Unified Manager 无法添加集群,并且失败原因如下:
Unable to fetch the HTTPS certificate from <IP address>. Enter a valid Host name or Port.
Unable to add cluster data source. This can occur if the clocks on the systems are not synchronized and the Active IQ Unified Manager HTTPS certificate start date is later than the date on the cluster, or if the cluster has reached the maximum number of EMS notification destinations.
um datasource add(UM CLI):
ERROR: Server returned HTTP status 500.HTTP error message :{"timestamp":'2024-05-30T17:09:32.879+00:00","status":500,"error":"Internal Server Error","path":"/acquisition-api/server/datasource"}
server.log可以显示以下任意内容:
ERROR [org.springframework.boot.web.servlet.support.ErrorPageFilter] (default task-20) Forwarding to error page from request [/server/datasource] due to exception [Fail to add a new DS, message: java.lang.RuntimeException: java.lang.RuntimeException: Failed to establish connection for cloud agent instance "UnifiedManager_XXXXXX_XXXXXX_XXXXXXX_XXXXXXX". Reason: AMQP transport failed for connection "UnifiedManager_XXXXXX_XXXXXX_XXXXXXX_XXXXXXX". Reason: Error during websocket handshake: conn fail: 61.]: com.onaro.sanscreen.acquisition.sessions.AcquisitionUnitException [928-255-224]
ERROR [common-pool-100] c.o.s.a.f.d.BaseDataSource (DataSourceErrorException.java:246) - XXX-XXXX-XXXX [Internal error] - Failed in conversion ([Device name General Device]: Failed in conversion)
/jboss/server_acq.log可以显示以下任何内容:
ERROR [default task-9] c.n.u.RestUtil (RestUtil.java:411) - Job Failed: Failed to establish connection for cloud agent instance"UnifiedManager_<UUID>". Reason: AMQP transport failed for connection"UnifiedManager_<UUID>". Reason: Error during websocket handshake: DNS NXDOMAIN.
ERROR [default task-9] c.n.u.CloudAgentConnectionUtil (CloudAgentConnectionUtil.java:223) - Failed to establish connection for cloud agent instance "UnifiedManager_<UUID>". Reason: AMQP transport failed for connection "UnifiedManager_<UUID>". Reason: Error during websocket handshake: DNS NXDOMAIN.
ERROR [default task-9] c.n.u.CloudAgentConnectionUtil (CloudAgentConnectionUtil.java:254) - Establishing connection with cluster XXX.XXX.XXX.XXX failed
ERROR [default task-XXXX] c.n.u.CloudAgentConnectionUtil (CloudAgentConnectionUtil.java:XXX) - Failed to establish connection for cloud agent instance"UnifiedManager_XXXXXX_XXXXXX_XXXXXXX_XXXXXXX". Reason: Interrupted
ERR: dc_manifest: handle_amqp_create:src/tables/dc_manifest.cc:1679 Failed to connect to host. Error: Failed to establish connection for cloud agent instance "UnifiedManager_ID". Reason: Timeout: Operation "connection setup" took longer than 10 seconds to complete. Unified Manager 9.14 unable to add cluster with cloud agent connection error
ERROR [QpidJMS Connection Executor: ID:1fe52e62-2be5-437e-9a28-16824e0f1491:1] c.n.s.m.JmsBase (JmsBase.java:236) - Linked Exception: org.apache.qpid.jms.provider.exceptions.ProviderConnectionRemotelyClosedException: Connection closed by external action [condition = amqp:connection:forced] ERROR [QpidJMS Connection Executor: ID:1fe52e62-2be5-437e-9a28-16824e0f1491:1] c.n.s.m.JmsBase (JmsBase.java:238) - Disconnected from Broker, retrying to connect INFO [Thread-3244] c.n.s.m.JmsBase (JmsBase.java:103) - Trying to connect to Broker WARN [Thread-3245] c.n.s.m.JmsBase (JmsBase.java:219) - Failed to connect to Broker, will retry shortly
SERVER.PROPERTIES日志显示enable.cloudagent=true