CloverETL's Blog

December 9, 2009

ParallelReader Versus Competitors Finish

Filed under: Using CloverETL — Tags: , , , , , — Petr Uher @ 3:09 pm

As I have promised I bring you a complex comparison of ETL tools: CloverETL, Talend and Pentaho.

Short summary of my previous posts: For testing I used two transformations based on TPCH test and the input data generated by dbgen utility. The transformations were run on my laptop with Windows Vista Home Premium. For detail information see part 1 and part 2.

New testing:
To ensure my comparison a full complexity, all tools were tested as “desktop” and “enterprise” ETL tools. The “desktop” tools were running on laptop computer with a small amount of data. The “enterprise” ETL tools were running on server class machine with a large amount of data stored both in flat files and in a database. The transformation executed on server class machine was the same as the one I executed on desktop, only the size of input data was changed:

  • lineitem.tbl – 59,986,052 records, 7.24 GB
  • customers.tbl – 1,500,000 records, 233 MB
  • orders.tbl – 15,000,000 records, 1.62 GB

The results of flat file reading:

TPCH-Q1

TPCH-Q1

TPCH-Q3

TPCH-Q3

The new results of database reading, all previously published results, detailed information about used hardware and a summary are available in this final document.

I also described main features of all tools and my experiences to work with them. This part of the document expresses my opinions so it could be biased since I work mostly with CloverETL. If you don’t agree with anything, please express your opinion in comments. I will be pleased to discuss them with you.

2 Comments »

  1. How many parallel threads were used for Pentaho and CloverETL benchmarking? Is there any difference in performance using a higher number of threads on these 2 platforms?

    Comment by Eddie — December 19, 2009 @ 4:15 am

    • On laptop I set up level of parallelism to 2 in both CloverETL and Pentaho. On server class machine I set up it to 4. This configurations produced the best results.
      The speed of the transformation is very limited by hard disk performance (CPU utilization never gained 100%). The used hard disks aren’t so fast. So I expect that with RAID the performance will increase with a higher level of parallelization.

      Comment by Petr Uher — December 19, 2009 @ 10:40 am


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.