StorageGRID EC重新平衡未进展/卡在终止位置
适用场景
StorageGRID 11.6.0.13
问题描述
在重新平衡和更新E系列控制器固件期间将节点置于维护模式进行升级后、EC重新平衡作业停止(不再取得任何进展)。
尝试取消EC重新平衡作业将使其处于终止状态:
===================================================================
Job ID : <job-ID>
Site : <sitename>
State : Terminating...
Total Moves : 1712
Completed Moves : 1555
Canceled Moves : 22
Failures (retryable) : 0
Failures (non—retryable) : 0
Percentage : 91
Start Time : 2023-12-07 05:26:01 UTC
Retry Rebalance : No
===================================================================
在bycast日志中、已尝试中止、但从未成功:
Feb 20 14:33:03 <node-name> ADE: |<node-id> 0484709853 ECJM #ABT 2024-02-20T14:33:03.027448| WARNING 0242 2e9d1becf0c43622 ECJM: Handling job abort for job 12085808753338013043
Feb 20 14:33:03 <node-name> ADE: |<node-id> 0484709853 ECJM #ABT 2024-02-20T14:33:03.036515| WARNING 0258 2e9d1becf0c43622 ECJM: Sending abort message to running job 12085808753338013043
Feb 20 14:33:03 <node-name> ADE: |<node-id> 0484709853 ECJM #ABT 2024-02-20T14:33:03.042362| INFO 0549 2e9d1becf0c43622 ECJM: Informing process 484739606, message: #ABT
Feb 20 14:33:03 <node-name> ADE: |<node-id> 0484739606 ECJM #DON 2024-02-20T14:33:03.043585| WARNING 0017 ECJM: Unexpected message from ECJM:{484709853@21228096}: [#ABT:[JBID(UI64):12085808753338013043]]