Tag Archives: ETL tool

Usability Improvements in CloverETL 3.1

One of the most noticeable set of changes in CloverETL version 3.1 is the interface improvements, substantially improving Clover’s usability and understandability. These improvements save both new and old users valuable time when creating or manipulating their data transformation graphs and further cement CloverETL’s place as one of the most easy to use ETL tools on the market.

The biggest improvement was the addition of drag-and-drop functionality to a number of different aspects of Clover. You can drag files to the graph, files to components, files to metadata, and metadata to edges, saving innumerable clicks through menus.

We have also made it easier to link your metadata and edges while creating the edges. If you right-click on the Edge tool in your palette, it will give you a list of every metadata you have created on the current graph. If you select one of the metadata, whenever you create an edge with the edge tool, it will automatically assign that metadata to the edge.

Not only is it easier to link metadata and edges, we’ve also made it easier to create and manipulate the edges themselves. Edges can now be created simply by dragging from one component’s out port to another’s in port. If you find you want to change where the edge is connected, that too is now one-click. Simply click and drag an edge’s endpoints to any other port.

The last shortcut that version 3.1 added to CloverETL is an easier way to set the description on a component. Before, the description field was buried in the component’s properties, but now it has been moved to the header of the properties window. This improvement makes it substantially easier to clarify the purpose of your components, making your graph easier to read overall.

Working with CloverETL as a New User

I would like to share my experience with CloverETL as an external person. I study at Georgia Institute of Technology in Atlanta, GA with a major in Computer Science. On my search for interesting internships in Europe I found about Javlin a.s., a company based in Prague. As I wanted to get on hand experience with programming and software development I was immediately interested in this company. It sounded like an appealing opportunity where I could gain a lot of helpful knowledge and work experience.

In January I arrived in Prague and started working with CloverETL immediately when I began my internship with Javlin. When I arrived here, I had no clue exactly what CloverETL ‘did’, much less what date warehousing and extract-transform-load tools were. After the first week or so, I really figured out what ETL tools are, and especially, what CloverETL is. I played around with CloverETL, and was intrigued by what it can do. I really liked what it could – I don’t have any need for it now as a student, but I can see how incredibly powerful it can be for certain people or companies. I really like how simple it is to just connect components with a GUI, you can read a client XLS sheet, compare it with your Outlook address book (exported as a CSV file), removing duplicate records, collecting all data for each client into one record, and then making a new XLS sheet or CSV file for Outlook. Also, I like how the engine itself is open-source, enabling anyone to download and use it – but in my opinion, the Designer is much quicker and easier to use.

Parallel Data Processing Comparison – CloverETL vs. Talend vs. Pentaho (Part 3)

As I have promised I bring you a complex comparison of ETL tools: CloverETL, Talend and Pentaho.

Short summary of my previous posts: For testing I used two transformations based on TPCH test and the input data generated by dbgen utility. The transformations were run on my laptop with Windows Vista Home Premium. For detail information see part 1 and part 2.

New testing:
To ensure my comparison a full complexity, all tools were tested as “desktop” and “enterprise” ETL tools. The “desktop” tools were running on laptop computer with a small amount of data. The “enterprise” ETL tools were running on server class machine with a large amount of data stored both in flat files and in a database. The transformation executed on server class machine was the same as the one I executed on desktop, only the size of input data was changed:

  • lineitem.tbl – 59,986,052 records, 7.24 GB
  • customers.tbl – 1,500,000 records, 233 MB
  • orders.tbl – 15,000,000 records, 1.62 GB

The results of flat file reading:

TPCH-Q1

TPCH-Q1

TPCH-Q3

TPCH-Q3

The new results of database reading, all previously published results, detailed information about used hardware and a summary are available in this final document.

I also described main features of all tools and my experiences to work with them. This part of the document expresses my opinions so it could be biased since I work mostly with CloverETL. If you don’t agree with anything, please express your opinion in comments. I will be pleased to discuss them with you.

CloverETL version 2.7 released

OpenSys released today new version of CloverETL Engine – 2.7.

According to ChangeLog, there have been more then 300 changes – some of them new functionality, some fixes of reported problems.
Together with the new engine, CloverETL Designer (previously CloverETL GUI) version 2.2 is also available.

Details about changes of both CloverETL Engine and Designer can be found on CloverETL’s main site.

CloverETL – more than just extract, transform & load

Can ETL be used for something useful – aside extract, transform & load ?

I played with an idea that common applications, which include data entry & data processing, can be split into simple, web based data entry application (optionally also data presentation) and business logic implemented as an ETL process. With CloverETL, the transformation piece can be also called as a WEB application – thank to Clover’s “launch services” which seamlessly convert any transformation into WEB service where data are passed-in and sent-out through HTTP protocol (POST request).

The whole idea is based on fact, that data processing is easily (visually) designed & debugged using ETL tool. Unfortunately, the data entry is usually out of scope for most ETL tools (are there any which support it ?)  therefore WEB based “glue” with data entry forms is the easiest way.

Other option is to use the modern curse of IT – Excel. Excel is popular data entry & data presentation tool – why not to exploit it ?

I wholeheartedly believe that using this approach, development time can be cut to half as opposed to standard way of coding it in PHP or as a J2EE application. Also further modifications can be done more quickly. Visual programming is here, finally !

PS: I am going to bid for a project using this approach. So if I convince the customer, in two months I should know what this idea stands for….