As I have promised I bring you a complex comparison of ETL tools: CloverETL, Talend and Pentaho.
Short summary of my previous posts: For testing I used two transformations based on TPCH test and the input data generated by dbgen utility. The transformations were run on my laptop with Windows Vista Home Premium. For detail information see part 1 and part 2.
New testing:
To ensure my comparison a full complexity, all tools were tested as “desktop” and “enterprise” ETL tools. The “desktop” tools were running on laptop computer with a small amount of data. The “enterprise” ETL tools were running on server class machine with a large amount of data stored both in flat files and in a database. The transformation executed on server class machine was the same as the one I executed on desktop, only the size of input data was changed:
- lineitem.tbl – 59,986,052 records, 7.24 GB
- customers.tbl – 1,500,000 records, 233 MB
- orders.tbl – 15,000,000 records, 1.62 GB
The results of flat file reading:
TPCH-Q1
TPCH-Q3
The new results of database reading, all previously published results, detailed information about used hardware and a summary are available in this final document.
I also described main features of all tools and my experiences to work with them. This part of the document expresses my opinions so it could be biased since I work mostly with CloverETL. If you don’t agree with anything, please express your opinion in comments. I will be pleased to discuss them with you.











