跳转到主内容

Cassandra 修复进度缓慢警报, Cassandra-reaper 服务频繁重新启动 在 StorageGRID 11.4 上

Views:
2
Visibility:
Public
Votes:
0
Category:
storagegrid-webscale
Specialty:
sgrid
Last Updated:

适用于

  • StorageGRID 11.4 ( 11.4.0.3 之前)
  • 全新 StorageGRID 部署
  • StorageGRID 环境从 11.3 版升级( 11.3.0.11 之前的修补程序)

问题

  • 在新部署 StorageGRID 11.4 或从 11.3.0.11 之前的版本(例如 11.3.0.10 或任何其他 11.3 版本)升级到 11.4 后,用户可能会在 StorageGRID 图形用户界面中收到以下警报:

 

alert.PNG 进度缓慢
 
 
  • Cassandra repair progress slow可能是由于许多问题造成的,包括服务不可用和通信问题。为了确认此问题与本文匹配,可以检查少量其他签名:
  1. Cassandra repair progress slow此警报已持续 2 天以上,有效修复百分比为 0% 。
  2. 负责 Cassandra 修复操作的 Cassandra-reaper 服务正在各种存储节点上频繁重新启动。可以通过 /var/local/log/servermanager.log 存储节点上的文件确认此问题:

 

| cassandra-reaper      | restart initiated
| cassandra-reaper      | cassandra-reaper ended
| reaper           | starting reaper

 

  1. /var/local/log/cassandra-reaper.log lumberjack 集合下或中的 Cassandra reaper 日志 reaper.log 包含无法实现一致性级别仲裁或 each 仲裁的异常:

WARN [storagegrid:615635d0-342b-11eb-b6cc-4bacd6a2d5fe:615c9e91-342b-11eb-b6cc-4bacd6a2d5fe] 2020-12-08 18:57:38,140 i.c.s.SegmentRunner - Failed to connect to a coordinator node for segment 615c9e91-342b-11eb-b6cc-4bacd6a2d5fe 

com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency EACH_QUORUM (2 required but only 0 alive)

  1. 存储 reaper_commands.txt 节点的 lumberjack 集合中的 Cassandra reaper 修复列表,或者通过 spreaper --reaper-host=localhost --reaper-port=9403 status-cluster storagegrid 在与存储节点的 SSH 会话中运行此命令,指示某些或所有密钥空间的修复包含针对最后一个事件的以下消息:

   "creation_time": "2020-11-24T23:05:08Z", 
   "current_time": "2020-12-08T18:59:39Z", 
   "datacenters": [], 
   "duration": "7 days 0 hours 2 minutes 13 seconds", 
   "end_time": "2020-12-01T23:07:22Z", 
   "estimated_time_of_arrival": null, 
   "id": "7f8d00b0-2ea9-11eb-b76b-d7a5b22a5393", 
   "incremental_repair": false, 
   "intensity": 1.000, 
   "keyspace_name": "storagegrid", 
   "last_event": "Postponed a segment because no coordinator was reachable"
   "nodes": [], 
   "owner": "auto-scheduling", 
   "pause_time": null, 
   "repair_parallelism": "PARALLEL", 
   "repair_thread_count": 4, 
   "repair_unit_id": "dc8dbfa0-17c7-11eb-b890-676ddd59fc8a", 
   "segments_repaired": 0, 
   "start_time": "2020-11-24T23:05:08Z", 
   "state": "ABORTED", 

   "creation_time": "2020-11-17T20:50:58Z", 
   "current_time": "2020-12-08T18:59:40Z", 
   "datacenters": [], 
   "duration": "7 days 0 hours 0 minutes 32 seconds", 
   "end_time": "2020-11-24T20:51:31Z", 
   "estimated_time_of_arrival": null, 
   "id": "9882a450-2916-11eb-8180-07cae1e33f50", 
   "incremental_repair": false, 
   "intensity": 1.000, 
   "keyspace_name": "reaper_db", 
   "last_event": "Postponed a segment because no coordinator was reachable"
   "nodes": [], 
   "owner": "auto-scheduling", 
   "pause_time": null, 
   "repair_parallelism": "PARALLEL", 
   "repair_thread_count": 4, 
   "repair_unit_id": "dc818aa0-17c7-11eb-b890-676ddd59fc8a", 
   "segments_repaired": 0, 
   "start_time": "2020-11-17T20:50:59Z", 
   "state": "ABORTED", 

 

CUSTOMER EXCLUSIVE CONTENT

Registered NetApp customers get unlimited access to our dynamic Knowledge Base.

New authoritative content is published and updated each day by our team of experts.

Current Customer or Partner?

Sign In for unlimited access

New to NetApp?

Learn more about our award-winning Support