站点扩展在重建cassandrebuild时停滞
适用场景
问题描述
由于扩展过程选择的站点不是最优的站点进行了cassandrebuild、站点扩展停止。

 
 卡桑德拉 system.log 显示:
 
  INFO [RMI TCP Connection(22710)-127.0.0.1] 2024-03-07 05:30:08,752 RangeStreamer.java (line 127) Rebuild: range (7543867250329265734,7544102375703298946] exists on /<IP_3H> for keyspace accounts
  INFO [RMI TCP Connection(22710)-127.0.0.1] 2024-03-07 05:30:08,753 RangeStreamer.java (line 127) Rebuild: range (7543867250329265734,7544102375703298946] exists on /<IP_3I> for keyspace accounts
  WARN [RMI TCP Connection(22710)-127.0.0.1] 2024-03-07 05:30:08,753 StorageService.java (line 1503) Parameter error while rebuilding node
 java.lang.IllegalStateException: Unable to find sufficient sources for streaming range (-6636921090683170249,-6636701783084431689] in keyspace accounts
 at org.apache.cassandra.dht.RangeStreamer.handleSourceNotFound(RangeStreamer.java:306)
 at org.apache.cassandra.dht.RangeStreamer.getRangeFetchMap(RangeStreamer.java:285)
 at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:129)
 at org.apache.cassandra.service.StorageService.rebuild(StorageService.java:1429)
 at org.apache.cassandra.service.StorageService.rebuild(StorageService.java:1343)
 at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
 at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
 at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
 at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
 at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
 at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
 at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
 at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
 at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
 at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
 at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
 at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
 at sun.rmi.transport.Transport$1.run(Transport.java:200)
 at sun.rmi.transport.Transport$1.run(Transport.java:197)
 at java.security.AccessController.doPrivileged(Native Method)
 at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
 at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
 at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
 at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
 at java.security.AccessController.doPrivileged(Native Method)
 at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:750)
  INFO [RMI TCP Connection(22716)-127.0.0.1] 2024-03-07 05:30:20,425 StorageService.java (line 1402) starting rebuild for (All keyspaces), (All tokens), RESET_NO_SNAPSHOT,  included DCs: group20
nodetool status  从扩展站点(site 4 / group40)运行的命令会将站点2 (group20)中的节点显示为"DS(down /已停止)
 、但是、站点2 (group20)中的节点已启动且正在运行、可从其他站点访问。
Datacenter: group10
 ===================
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving/Stopped
 --  Address       Load       Tokens       Owns (effective)  Host ID   Rack
 UN  <IP_1A>    1.77 TiB   256          50.9%             <UUID_1A>   unknown
 UN  <IP_1B>    1.52 TiB   256          49.0%             <UUID_1B>   unknown
 UN  <IP_1C>    1.48 TiB   256          49.2%             <UUID_1C>   unknown
 UN  <IP_1D>    1.68 TiB   256          49.5%             <UUID_1D>   unknown
 UN  <IP_1E>    1.77 TiB   256          51.4%             <UUID_1E>   unknown
 UN  <IP_1F>    1.56 TiB   256          49.9%             <UUID_1F>   unknown
Datacenter: group20
 ===================
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving/Stopped
 --  Address       Load       Tokens       Owns (effective)  Host ID   Rack
 DS  <IP_2A>    3.06 TiB   256          100.0%            <UUID_2A>   unknown
 DS  <IP_2B>    3.13 TiB   256          100.0%            <UUID_2B>   unknown
 DS  <IP_2C>    2.99 TiB   256          100.0%            <UUID_2C>   unknown
Datacenter: group30
 ===================
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving/Stopped
 --  Address       Load       Tokens       Owns (effective)  Host ID   Rack
 UN  <IP_3A>    1.18 TiB   256          33.4%             <UUID_3A>   unknown
 UN  <IP_3B>    1.03 TiB   256          33.6%             <UUID_3B>   unknown
 UN  <IP_3C>    1.1 TiB    256          34.8%             <UUID_3C>   unknown
 UN  <IP_3D>    1.03 TiB   256          31.4%             <UUID_3D>   unknown
 UN  <IP_3E>    1.02 TiB   256          34.1%             <UUID_3E>   unknown
 UN  <IP_3F>   964.94 GiB  256          32.1%             <UUID_3F>   unknown
 UN  <IP_3G>    1.02 TiB   256          34.7%             <UUID_3G>   unknown
 UN  <IP_3H>   969.99 GiB  256          31.6%             <UUID_3H>   unknown
 UN  <IP_3I>    1.1 TiB    256          34.3%             <UUID_3I>   unknown
Datacenter: group40
 ===================
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving/Stopped
 --  Address       Load       Tokens       Owns (effective)  Host ID   Rack
 UN  <IP_4A>    10.35 GiB  256          37.1%             <UUID_4A>   unknown
 UN  <IP_4B>    7.62 GiB   256          35.7%             <UUID_4B>   unknown
 UN  <IP_4C>    4.83 GiB   256          39.9%             <UUID_4C>   unknown
 UN  <IP_4D>    11.75 GiB  256          40.1%             <UUID_4D>   unknown
 UN  <IP_4E>    10.69 GiB  256          38.8%             <UUID_4E>   unknown
 UN  <IP_4F>    6.12 GiB   256          35.2%             <UUID_4F>   unknown
 UN  <IP_4G>     8.06 GiB   256          37.0%             <UUID_4G>   unknown
 UN  <IP_4H>     14.86 GiB  256          36.0%             <UUID_4H>   unknown