0

Good afternoon.

In a production environment we use Cassandra 2.0.7. Initially we were enough one node (cass-05, the local IP-address 192.168.0.5). There is now a need for a second node (cass-06, the local IP-address 192.168.0.6). For the second node (cass-06) have a separate server. Cassandra settings on cass-06 are completely analogous to the cass-05. Used NetworkTopologyStrategy replication strategy. Each node is configured on it's own rack and data center with 1 copy of the data (rack1, DC1: 1 for cass-05 and rack2, DC2: 1 for cass-06).

1TB of disk space is available for Cassandra on each server. On the server cass-05 have 600Gb of real data.

On the server cass-06 we run utility 'nodetool rebuild':

#./nodetool -h192.168.0.6 rebuild -- DC1

Cassandra on cass-06 begins to create a large number of temporary files for the tables that it, in theory, should be removed. However, for some reason it does not. 9-12 hours through the entire 1TB disk space occupied by these temporary tables, which leads to malfunction node. After restarting the Cassandra on the cass-06 node the disk space is occupied only 150Gb.

During the utility 'nodetool rebuild' node cass-06 is involved in write/read as well as cass-05.

Thanks for any help.

Community
  • 1
  • 1
DmitryKanunnikoff
  • 2,116
  • 1
  • 19
  • 29
  • 2
    Why are you running Cassandra with two nodes, each in a separate DC? This completely violates established advice regarding Cassandra deployments. 3 nodes per DC should be considered a minimum. – rs_atl Jun 12 '14 at 16:48
  • @rs_atl, thank you very much for your comment! We'll look in this direction. – DmitryKanunnikoff Jun 13 '14 at 17:00

0 Answers0