CloverETL's Blog

May 15, 2009

Light speed sorting with FastSort

Filed under: Using CloverETL — Tags: , — bigpavel @ 2:40 pm

Recently I’ve been struggling to squeeze a little speed increase out of current CloverETL’s sorting component – the ExtSort. Benchmarks show that the performance of ExtSort is very good, yet we again wanted to push things a few steps forward and make them even better. Finally, after a little research and tweaking we came up with a compromise. The development split into two paths – the original ExtSort remained and a new component – FastSort – was introduced in CloverETL 2.7. Let’s have a small peek at what it is about!

FastSort is based on modified merging algorithm and can usually produce double or sometimes even 2.5 times better performance results than ExtSort. No matter how good this sounds it surprisingly doesn’t make ExtSort obsolete or generally inferior sort component! Let’s see why it is and how you will benefit from learning to use the right sort component for the right purpose.

Let’s go back to the beginning. ExtSort is based on merge sort algorithm using fixed number of tapes (temporary disk files). Records in ExtSort are read from input, sorted into groups – chunks – and added onto the end one of the tapes balancing their lengths. The chunks are then merged together and sent out. The number of tapes and chunk size play important role in performance tweaking. ExtSort can work on any size of input data (provided that there is enough disk space for temporary files) and has reasonably low demands on system memory (almost constant relatively to input data set size).

On the other hand, FastSort does not put chunks onto tapes – it creates a new file (i.e. tape) for each sorted chunk instead – we call them “sorted runs” here. Along with parallel processing of multiple chunks at once, larger memory utilization, etc. great speed achievements are possible. But there is a cost for everything and this fast sort approach is no exception. Since FastSort has to keep many open files it slowly increases its resource demands proportionally to the size of the data set. That means there is a theoretical cap which is a trade-off between run size (which needs to fit into memory for sorting) and overhead with keeping open runs (around 10KB each). However, on most production systems, hitting this cap is far beyond practical use. Let’s see a small example:

Let’s have a billion (10^9) data records of average size of 200 bytes, i.e. around 200 GB of data – quite a large set. Ideal run size, which is computed automatically, is around 500 000 records = 100 megs for a single run.. Under ideal conditions there are 3 sort buffers which makes 300 megs of memory for sorting. This setup produces roughly 2000 temporary files, i.e. another 20 megs for keeping track of them. That adds to a total of 320 MB of memory plus some system overhead for sorting a 200 GB file – this surely is acceptable, especially when we take FastSort’s tweaking possibilities into account. There are always ways of sacrificing a little performance to decrease resource requirements – e.g. shrinking run size, limiting buffers and open files, etc. – there’s a lot of parameters to fiddle with if you wish to.

As you can see, FastSort is a bit more greedy that ExtSort but given the resources, it gets the job done significantly faster. In cases where speed is crucial and enough system resources can be dedicated for sorting, FastSort is a great choice. For moderate and resource critical applications ExtSort is less demanding and provides steady performance at very little cost.

For further information please referr to CloverETL’s wiki page http://wiki.cloveretl.org/doku.php?id=components:transformers#fastsort or Documentation page.

Blog at WordPress.com.