由于最大连接限制耗尽,AIQUM 中所有集群的采集失败
适用于
- ActiveIQ Unified Manager (AIQUM) 9.6+
- 所有操作系统平台
- ONTAP 9.x
问题
- 对于添加到 AIQUM 的所有集群,采集间歇性失败
Cluster Monitoring Failed和Cluster Not Reachable警报由 AIQUM 触发- 但是,采集会在一段时间后或手动触发后自动开始工作。
- 所有先决条件如 AV 排除和CPU/内存/磁盘空间方面的资源可用性都应用于 AIQUM。
- AIQUM 以及 ONTAP 集群的 SSL 证书有效。
- AIQUM
au.log:
ERROR [common-pool-2064] c.o.s.a.d.n.NetAppOCIEArchivePerformancePackage (NetAppOCIEArchivePerformancePackage.java:381) - Failed to get archive file names from zapi. java.net.SocketTimeoutException: connect timed outat java.net.PlainSocketImpl.waitForConnect(Native Method) ~[?:?]...Wrapped by: com.onaro.sanscreen.acquisition.framework.datasource.DataSourceErrorException: Failed to connect to <cluster IP/Hostname>at com.onaro.sanscreen.acquisition.datasource.netapp_ocie.transport.zapi.ZAPIConnection.createDefaultNaServer(ZAPIConnection.java:803) ~[au-datasource-netappfoundation.jar:9.13.0-2023.09.J299]...ERROR [common-pool-2064] c.o.s.a.f.d.BaseDataSource (DataSourceErrorException.java:246) - <cluster_IP/Hostname> [Error connecting] - Failed to connect to <cluster IP/Hostname> (connect timed out)
ERROR [common-pool-8838] c.o.s.a.f.d.BaseDataSource (DataSourceErrorException.java:244) - Cluster-mgmt [Error connecting] - Failed to connect to Cluster-mgmt (Read timed out) java.net.SocketTimeoutException: Read timed out
ERROR [foundation-poll-1] c.n.u.RestUtil (RestUtil.java:227) - Establishing connection with datasource failed java.net.SocketTimeoutException: Read timed out
- AIQUM
ocumserver.log显示:
ERROR [oncommand] [reconciliation-0] [c.n.d.c.ClusterStatusListener] Socket connection error for cluster: <cluster IP/Hostname>java.net.ConnectException: Connection timed out: connectERROR [oncommand] [reconciliation-0] [c.n.d.c.ClusterStatusListener] Cluster : <cluster IP/Hostname> is not reachable. Generating cluster not reachable event.WARN [oncommand] [reconciliation-0] [c.n.d.c.ClusterStatusListener] Acquisition Failed for cluster : Cluster-mgmt message : Read timed outapache_error.log显示已达到 HTTP 连接限制:
[mpm_event:warn] [pid 7215:tid 34401862144] A keepalive connection from ipspace ID -1, remote address <AIQUM IP/Hostname> is being suspended between requests while the 80-connection limit has been reached. (80 active, 8 waiting) Clients should limit the number of concurrent keepalive connections to avoid large performance penalties and/or failures.
[mpm_event:notice] [pid 7215:tid 34402611200] Holding a connection from ipspace ID -1, remote address <AIQUM_IP/Hostname> while 54 others are held and 80 are active
[mpm_event:notice] [pid 7215:tid 34402611200] Holding a connection from ipspace ID -1, remote address <AIQUM_IP/Hostname> while 55 others are held and 80 are active
apache_access.log显示 AIQUM API 调用请求的状态 408(请求超时)/ 404(未找到)/ 500(内部服务器错误):
<AIQUM IP/Hostname> pii_encrypt/3haVFUKxlfQdtYhedGIaWKrSBVCn+5sImuFntsUoOAk=/pii_encrypt - - [Date/Time] "-"408 - 38 - 0 - - -<AIQUM IP/Hostname> pii_encrypt/SDcefmsALVePKOMbzl/puA6V6v/HBFUwpN2g9sF8sNo=/pii_encrypt - - [Date/Time] "GET /spi/Node1/etc/log/autosupport/timestamp.0.files/ HTTP/1.1"500 - 72496937 - 0 - - - - -<AIQUM IP/Hostname> pii_encrypt/SDcefmsALVePKOMbzl/puA6V6v/HBFUwpN2g9sF8sNo=/pii_encrypt - - [Date/Time] "GET /spi/Node2/etc/log/autosupport/timestamp.0.files/ HTTP/1.1"404 - 72064359 - 0 + - - - -