Note (June 2020): this article is old and not really revelant anymore. If you use a modern version of cassandra, look at
-Dcassandra.replace_address_first_boot option !
I want to share some tips about my experimentations with Cassandra (version 2.0.x).
I found some documentations on datastax website about replacing a dead node, but it is not suitable for our needs, because in case of hardware crash, we will set up a new node with exactly the same IP (replace “in place”). Update : the documentation in now up to date on datastax !
If you try to start the new node with the same IP, cassandra doesn’t start with :
java.lang.RuntimeException: A node with address /10.20.10.2 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
So, we need to use the “cassandra.replace_address” directive (which is not really documented ? :() See this commit and this bug report, available since 1.2.11/2.0.0, it’s an easier solution and it works.
+ - New replace_address to supplant the (now removed) replace_token and + replace_node workflows to replace a dead node in place. Works like the + old options, but takes the IP address of the node to be replaced.
It’s a JVM directive, so we can add it at the end of /etc/cassandra/cassandra-env.sh (debian package), for example:
Of course, 10.20.10.2 = ip of your dead/new node.
Now, start cassandra, and in logs you will see :
INFO [main] 2014-03-10 14:58:17,804 StorageService.java (line 941) JOINING: schema complete, ready to bootstrap INFO [main] 2014-03-10 14:58:17,805 StorageService.java (line 941) JOINING: waiting for pending range calculation INFO [main] 2014-03-10 14:58:17,805 StorageService.java (line 941) JOINING: calculation complete, ready to bootstrap INFO [main] 2014-03-10 14:58:17,805 StorageService.java (line 941) JOINING: Replacing a node with token(s): [...] [...] INFO [main] 2014-03-10 14:58:17,844 StorageService.java (line 941) JOINING: Starting to bootstrap... INFO [main] 2014-03-10 14:58:18,551 StreamResultFuture.java (line 82) [Stream #effef960-6efe-11e3-9a75-3f94ec5476e9] Executing streaming plan for Bootstrap
Node is in boostraping mode and will retrieve data from cluster. This may take lots of time.
If the node is a seed node, a warning will indicate that the node did not auto bootstrap. This is normal, you need to run a nodetool repair on the node.
On the new node :
# nodetools netstats Mode: JOINING Bootstrap effef960-6efe-11e3-9a75-3f94ec5476e9 /10.20.10.1 Receiving 102 files, 17467071157 bytes total [...]
After some time, you will see some informations on logs !
On the new node :
INFO [STREAM-IN-/10.20.10.1] 2014-03-10 15:15:40,363 StreamResultFuture.java (line 215) [Stream #effef960-6efe-11e3-9a75-3f94ec5476e9] All sessions completed INFO [main] 2014-03-10 15:15:40,366 StorageService.java (line 970) Bootstrap completed! for the tokens [...] [...] INFO [main] 2014-03-10 15:15:40,412 StorageService.java (line 1371) Node /10.20.10.2 state jump to normal WARN [main] 2014-03-10 15:15:40,413 StorageService.java (line 1378) Not updating token metadata for /10.20.30.51 because I am replacing it INFO [main] 2014-03-10 15:15:40,419 StorageService.java (line 821) Startup completed! Now serving reads.
And on other nodes :
INFO [GossipStage:1] 2014-03-10 15:15:40,625 StorageService.java (line 1371) Node /10.20.10.2 state jump to normal
Et voilà, dead node has been replaced !
Don’t forget to REMOVE modifications on cassandra-env.sh after the complete bootstrap !