Perfect MVC code: Building The Perfect Cassandra Test Environment

Building The Perfect Cassandra Test Environment

John Berryman — August 31, 2013 | 2 Comments | Filed in: solr Tagged: PlanetCassandra |

A month back, one of our clients asked us to set up 15 individual single-node Cassandra instances, each of which would live in 64MB of RAM and each of which would reside on the same machine. My first response was “Why!?”

Qualities Of An Ideal Cassandra Test Framework

So what are the qualities of an ideal Cassandra test framework?

Light-weight and available — A good test framework will take up as little resources as possible and be accessible right when you want it.

Parity with Production — The test environment should perfectly simulate the production environment. This is a no-brainer. After all what good does it do you to pass a test only to wonder whether or not an error lurks in the differences between the test and production environments?

Stateless — Between running tests, there’s no reason to keep any information around. So why not just throw it all away?

Isolated — Most often there will be several developers on a team, and there’s a good chance they’ll be testing things at the same time. It’s important to keep each developer quarantined from the others.

Fault Resistant — Remember, we’re a little concerned here that Cassandra is going to be a resource hog or otherwise just not work. Being “fault resistant” means striking the right balance so that Cassandra takes up as little resources as possible without actually failing.

Implementing The Ideal Cassandra Test Framework

The first thing to do is to set up the test environment on a per-developer basis. This means changing a few paths. From cassandra.yaml change

data_file_directories:

- /home/jberryman/cassandra/data

commitlog_directory: /home/jberryman/cassandra/commitlog

saved_caches_directory: /home/jberryman/saved_caches

And then in log4j-server.properties change

log4j.appender.R.File=/home/jberryman/cassandra/system.log

Next, it’s a good idea to create a wrapper around whatever client you’re using. This has several benefits. For one thing, creating a wrapper provides a guard against the client changing from under you. This is especially important right now since so many clients are scrambling to be CQL3 compliant. This wrapper is also a great place to stick any safeguards against horking up your production data when you thinkyou’re running a test. Perhaps the easiest way to safeguard against this is to issue the CQL DESCRIBE CLUSTER statement and make sure that the cluster name is “TestCluster”. (If your CQL client doesn’t honor this statement, you can just create a keyspace called “Yes_ThisIsIndeedATestCluster” and test for its existence.) Once the wrapper is complete, it can be used with functional parity on both the test and production cluster.

The simplest way to make Cassandra light weight is to simply declare it so! In cassandra-env.sh, simply change

MAX_HEAP_SIZE="64M"

HEAP_NEWSIZE="12M"

However, just because you have now declared Cassandra to be light weight doesn’t mean that it will JustWork™. Given this little heap space to move in, Cassandra will happily toss you an OutOfMemory error on it’s first SSTable flush or compaction or garbage collection. To guard against this we have a bit of work to do!

The first thing to do is to reduce the number of threads, especially for reading and writing. In cassandra.yaml there are several changes to make:

rpc_server_type: hsha

Here, hsha stands for “half synchronous, half asynchronous.” This makes sure that all thrift clients are handled asynchronously using a small number of threads that do not vary with the amount of thrift clients.

concurrent_reads: 2;

concurrent_writes: 2

rpc_min_threads: 1;

rpc_max_threads: 1

As stated, the first two lines limit the number of reads and writes that can happen at the same time. 2 is the minimum number allowed here. The second two lines limit how many threads are available used for serving requests. Everything to this point serves to make sure that writes and reads can not overpower Cassandra during flushes and compactions. Next up:

concurrent_compactors: 1

If you are using SSD’s then this will limit the number of compactors to 1. If you’re using spinning magnets, then you’re already limited to a single concurrent compactor.

Next we need to make sure that we do everything we can so that compaction is not hindered. One setting here:

compaction_throughput_mb_per_sec: 0

This disables compaction throttling completely so that compaction has full reign over other competing priorities.

Next we turn all the knobs on memory usage as low as possible:

in_memory_compaction_limit_in_mb: 1

This is the minimal limit for allowing compaction to take place in memory. With such a low setting, much of compaction will take place in a 2-pass method that is I/O intensive — but I/O is not the thing we’re worried about!

key_cache_size_in_mb: 0

At the expense of read times, we can do away with key caches. But this may not even be necessary because we can do even better:

reduce_cache_sizes_at: 0

reduce_cache_capacity_to: 0

The first line say “As soon as you’ve used up this much memory, then reduce cache capacity.” And since this is set to 0, cache capacity is reduced just about as soon as Cassandra starts being used. The second line then dictates that the caches should effectively not be used at all.

Finally, on a test cluster, we’re not worried about data durability, so there are plenty of safeguards that we can simply do away with. For one, before starting the test cluster, go ahead and remove everything in the data dir and commitlog directories. Next, in cassandra.yaml set hinted_handoff_enabled: false. When creating a test keyspace, go ahead and set durable_writes = false so that the commit log is never even populated. Finally, when creating test tables, consider settingread_repair_chance = 0 and bloom_filter_fp_chance = 1. Though perhaps these modifications on keyspaces and tables are unnecessary because I was able to get pretty good performance without them.

Testing The Test Framework

Now since all of our changes are in place, let’s fire up Cassandra and see how she performs!

$ rm -fr /home/jberryman/cassandra && bin/cassandra -f

So far good. “Starting listening for CQL clients on localhost/127.0.0.1:9042″ means that we’re alive and ready to service requests. Now it’s time to slam Cassandra:

$ bin/cassandra-stress

total,interval_op_rate,interval_key_rate,latency/95th/99th,elapsed_time

33287,3328,3328,8.0,54.6,277.0,10

85059,5177,5177,7.5,33.3,276.7,20

133153,4809,4809,7.4,34.0,274.8,30

183111,4995,4995,6.9,31.6,165.1,40

233177,5006,5006,6.8,32.0,123.5,51

288998,5582,5582,6.7,26.7,123.5,61

341481,5248,5248,6.7,26.3,129.7,71

391594,5011,5011,6.7,26.7,129.7,81

441645,5005,5005,6.5,29.0,122.5,92

494198,5255,5255,6.3,28.3,122.9,102

539406,4520,4520,6.4,24.4,122.9,112

591272,5186,5186,6.4,26.8,122.9,122

641202,4993,4993,6.6,27.9,122.9,132

696041,5483,5483,6.6,28.2,122.9,143

747078,5103,5103,6.5,26.1,274.4,153

797125,5004,5004,6.4,25.3,274.4,163

839887,4276,4276,6.1,23.9,273.6,173

880678,4079,4079,6.0,22.9,273.6,184

928384,4770,4770,5.8,21.7,273.6,194

979878,5149,5149,5.7,20.2,273.6,204

1000000,2012,2012,5.5,19.4,273.6,208

END

Wow… so not only does it not die, it’s actually pretty darn performant! Looking back at the logs I see a couple warnings:

WARN 17:15:57,030 Heap is 0.5260566963447822 full. You may need to reduce

memtable and/or cache sizes. Cassandra is now reducing cache sizes to free up

memory. Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you don't

want Cassandra to do this automatically

Ah… this has to do with the reduce_cache_sizes_at, reduce_cache_capacity_to bit from earlier. After this warning we hits, we know that caches have been tossed out. Without caches, I wonder how that will affect the read performance. Let’s see!

$ bin/cassandra-stress --operation READ

total,interval_op_rate,interval_key_rate,latency/95th/99th,elapsed_time

34948,3494,3494,8.4,39.9,147.0,10

95108,6016,6016,7.9,19.3,145.2,20

155830,6072,6072,7.8,15.4,144.7,30

213037,5720,5720,7.8,14.6,72.5,40

274021,6098,6098,7.8,13.7,56.8,51

335575,6155,6155,7.7,12.6,56.6,61

396074,6049,6049,7.7,12.6,56.6,71

455660,5958,5958,7.7,12.7,45.8,81

516840,6118,6118,7.7,12.3,45.8,91

576045,5920,5920,7.7,12.3,45.6,102

635237,5919,5919,7.7,12.7,45.6,112

688830,5359,5359,7.7,13.5,45.6,122

740047,5121,5121,7.7,15.1,45.8,132

796249,5620,5620,7.8,14.8,42.4,143

853788,5753,5753,7.9,14.1,37.1,153

906821,5303,5303,7.9,15.1,37.1,163

963981,5716,5716,7.9,14.1,37.1,173

1000000,3601,3601,7.9,13.3,37.1,180

END

Hooray, it works! And it’s still quite performant! I was concerned about the lack of caches killing Cassandra read performance, but it seems to be just fine. Looking back at the log file again, there are several more warnings each look about like this:

WARN 17:16:25,082 Heap is 0.7914885099943694 full. You may need to reduce

memtable and/or cache sizes. Cassandra will now flush up to the two largest

memtables to free up memory. Adjust flush_largest_memtables_at threshold in

cassandra.yaml if you don't want Cassandra to do this automatically

WARN 17:16:25,083 Flushing CFS(Keyspace='Keyspace1', ColumnFamily='Standard1')

to relieve memory pressure

Despite the fact that we’re regularly having these emergency memtable flushes, Cassandra never died!

Popping open jconsole, we can make a couple more observations. The first is that while the unaltered Cassandra process takes up roughly 8GB of memory, this test Cassandra never goes over 64MB. Second, we also see that that the number of threads on the unaltered Cassandra hovers around 120-130 while the test Cassandra remains somewhere between 40 and 50.

Conclusion

So you see, my client’s request was actually quite reasonable and quite a good idea! Now they have a test framework that is able to support 15 developers on a single machine so that each developer has their own isolated test environment. This is a good example of how consultants sometimes learn from the companies they’re consulting.

Perfect MVC code

Thursday, 10 July 2014

Building The Perfect Cassandra Test Environment

No comments:

Post a Comment

Angular Tutorial (Update to Angular 7)

Search This Blog