跳转到主内容

StorageGRID 扩展失败,并显示错误:正在启动 Cassandra 。错误:无法启动。正在重试 "

Views:
12
Visibility:
Public
Votes:
0
Category:
storagegrid-webscale
Specialty:
sgrid
Last Updated:

适用于

  • StorageGRID Webscale 11.1
  • StorageGRID Webscale 11.0

问题

在尝试扩展除显示为Complete" " 的非存储节点以外的一组节点时,网格管理接口( GMI )将显示以下存储节点:


'Waiting for Cassandra nodes to join the cluster'
'Starting Cassandra. Error: Failed to start. Retrying'
'Waiting to Start Services'


位于下的节点的 Cassandra 日志文件/var/local/log/cassandra/system.log 显示以下错误:

ERROR [main] 2018-08-31 10:40:14,250 CassandraDaemon.java (line 678) Exception encountered during startup
java.lang.RuntimeException: A node with address localhost-grid/<IP_Address> already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
   at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:559) ~[cassandra-all-3.0.15.162564.jar:3.0.15.162564]
   at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:889) ~[cassandra-all-3.0.15.162564.jar:3.0.15.162564]
   at org.apache.cassandra.service.StorageService.initServer(StorageService.java:666) ~[cassandra-all-3.0.15.162564.jar:3.0.15.162564]
   at org.apache.cassandra.service.StorageService.initServer(StorageService.java:614) ~[cassandra-all-3.0.15.162564.jar:3.0.15.162564]
   at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:354) [cassandra-all-3.0.15.162564.jar:3.0.15.162564]
   at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:582) [cassandra-all-3.0.15.162564.jar:3.0.15.162564]
   at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:665) [cassandra-all-3.0.15.162564.jar:3.0.15.162564]

原因

  • 扩展一组节点时,存储节点需要在 Cassandra 服务级别以组的形式加入。
  • 如果一个存储节点在 Cassandra 中遇到错误,另一个存储节点将等待该节点跟上此过程。因此,整个扩展过程将暂停,直到解决该节点上的错误为止。
  • 相反,非存储节点(例如管理节点或网关节点)则不具有此类依赖关系。其扩展过程无需等待其他节点即可完成。

解决方案

在显示错误'Starting Cassandra. Error: Failed to start. Retrying的存储节点上执行以下步骤: " :

  1. 通过 SSH 连接到节点并运行以下命令升级到 root 权限: su -
  2. /etc/cassandra/cassandra-env.sh 运行以下命令备份现有的 Cassandra 环境文件: cp /etc/cassandra/cassandra-env.sh /etc/cassandra/cassandra-env.sh_bk
  3. 在 Cassandra 环境文件末尾添加以下两行:
    1. JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=<SG_Node_IP>"
    2. JVM_OPTS="$JVM_OPTS -Dcassandra.allow_unsafe_replace=true"
  4. 启动 Cassandra 服务: service cassandra start
  5. 通过 SSH 连接到显示的节点 Waiting for Cassandra nodes to join the cluster
    1. 确认其 Cassandra 服务已成功启动: service cassandra status
    2. 确认集群中 Cassandra 级别的节点状态: nodetool status
    3. 如果节点最初出现错误(步骤 2 )显示 "UN" (正常运行),继续执行该错误节点上的 " 下一步 "
  6. 删除或注释掉步骤 2 中添加的两行。
  7. 继续执行存储节点上发生错误中断的扩展过程: touch /tmp/unhalt

整个扩展过程将在短暂的时间后在 GMI 中恢复。

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.
Scan to view the article on your device