由于ONTAP 数据收集器出现内部错误、Cloud Insight性能轮询失败
适用场景
- Cloud Insights ( CI )
- ONTAP 9
- ONTAP System Manager
- NetApp ONTAP 数据管理软件数据收集器(启用了高级指标)
问题描述
- 无法获取性能数据,因为性能轮询失败,并在 CI 或 OnCommand System Manager 中的数据收集器登录页面中显示以下消息:
Unable to poll performance ... error = Performance Recent Status
Internal error:
com.onaro.sanscreen.acquisition.framework.datasource.DataSourceErrorException: General Error
或
Data ONTAP API fail: System busy: 7 requests on table "perf_object_get_instances" have been pending for 1678674 seconds. The last completed call took 0 seconds.
- 在查看
storageperformance
相关 ONTAP 集群的数据收集器的示例日志(位于acq folder
storageperformance_datacollectorname
one of the timestamp folders
log_sample.log
中)后,可能会出现以下错误消息:
示例:
2021-03-12 17:19:33,895 ERROR [com.onaro.sanscreen.acquisition.datasource.netapp_ontap.NetAppOntapPerformancePackage] datalake collect and report (Poll Count: 1207, Is Macro Poll: false) : [storageperformance] data-collector-name: 1 apis failed: [storageperformance] data-collector-name: perf-object-get-instances(Object : workload) failed: Trying to perform arithmetic between two counters with different cardinality. Counter "read_io_type" has 1 elements, but the other counter "read_io_type" has 10 elements. (1 times)
2021-03-12 17:21:54,206 ERROR [com.onaro.sanscreen.acquisition.datasource.netapp_ontap.builder.ZapiIterBase] Aborting all performance api calls due to: perf-object-instance-list-info-iter(Object : lif) failed: System busy: 7 requests on table "perf_object_instance_list_info" have been pending for 2922550 seconds. The last completed call took 0 seconds.
2022-03-19 01:13:22,377 ERROR [com.onaro.sanscreen.acquisition.datasource.netapp_ontap.NetAppOntapPerformancePackage] datalake collect and report (Poll Count: 10124, Is Macro Poll: false) : [storageperformance] data-collector-name: 15 apis failed: [storageperformance] data-collector-name: perf-object-get-instances(Object : workload) failed: RPC: Remote system error [from mgwd on node "node_name" (VSID: -1) to cm at 127.0.0.1] (1 times)
- 此外,如果尝试在
statistics lif show
命令行界面中对集群 SVM 运行命令(通过集群中任一节点的节点管理 LIF 进行访问),则也可能会出现类似的错误,如下所示。
注意: 此错误消息应与从 CI 中的性能示例日志中获取的突出显示部分相同,但这 performance-object
两个错误消息之间的特定 ZAPI 调用可能不同:
cluster1::> statistics lif show -vserver cluster1
Error: command failed: System busy: 7 requests on table "perf_object_get_instances" have been pending for 1147109 seconds. The last completed call took 0 seconds.