EC重新平衡作业失败、并显示"作业需要重试"
适用场景
NetApp StorageGRID 11.7
问题描述
运行EC平衡作业时、此作业将失败、并且不会出现错误。
bycast.log
如何在StorageGRID 中找到EC领导者您看到的EC主管如下:ba9de970532e9 ECJM: 16605481304316448032(rebalance 20): saving state. status: JOBSTATUS_PAUSED
Sep 27 08:34:06 FBRPSSTO004 ADE: |21977923 1592539400 ECJM ^RDY 2023-09-27T08:34:06.557857| WARNING 0067 a5bba9de970532e9 ECJM: Caught exception 'ENFORCE failed: !"Job needs retry."' when running job 16605481304316448032: Site Rebalance - Group ID 20.
Sep 27 08:34:06 FBRPSSTO004 ADE: |21977923 1592539400 ECJM ^RDY 2023-09-27T08:34:06.557933| ERROR 1083 a5bba9de970532e9 PROC: Exception: /build/src/modules/ErasureCoding/EC_JobManager_Module/SiteRebalanceJob.cc(410): Throw in function CXD_AtomContainer erasurecoding::SiteRebalanceJob::doRebalance()#012Dynamic exception type: boost::wrapexcept<std::runtime_error>#012std::exception::what: ENFORCE failed: !"Job needs retry."#012
Sep 27 08:34:06 FBRPSSTO004 ADE: |21977923 1568886980 ECJM #DON 2023-09-27T08:34:06.557935| NOTICE 0959 0fc4f7218174f516 ECJM: Received job completion message.
Sep 27 08:34:06 FBRPSSTO004 ADE: |21977923 1568886980 ECJM #DON 2023-09-27T08:34:06.557949| NOTICE 0965 0fc4f7218174f516 ECJM: Job 16605481304316448032 completed with result GERR.
Sep 27 08:34:06 FBRPSSTO004 ADE: |21977923 1568886980 ECJM #DON 2023-09-27T08:34:06.557956| NOTICE 1229 0fc4f7218174f516 ATOM: #DON:CONT:
Sep 27 08:34:06 FBRPSSTO004 ADE: |21977923 1568886980 ECJM #DON 2023-09-27T08:34:06.557965| NOTICE 1205 0fc4f7218174f516 ATOM: RSLT:FC32:'GERR'
Sep 27 08:34:06 FBRPSSTO004 ADE: |21977923 1568886980 ECJM #DON 2023-09-27T08:34:06.557972| NOTICE 1205 0fc4f7218174f516 ATOM: JBID:UI64:16605481304316448032
Sep 27 08:34:06 FBRPSSTO004 ADE: |21977923 1568886980 ECJM #DON 2023-09-27T08:34:06.557976| NOTICE 1233 0fc4f7218174f516 ATOM: END
Sep 27 08:34:06 FBRPSSTO004 ADE: |12320679 2877050429 S3RQ %CEA 2023-09-27T08:34:06.587048| NOTICE 1418 9e04d1c712a4942c S3RQ: EVENT_PROCESS_CREATE - con