Tag Archives: CloverETL Designer

CloverETL Visions for 2012: Evolution and Revolution in Data Integration

Part one – celebrating 10 years

In 2012, CloverETL will celebrate its 10th anniversary as an open source project. It all started back in 2002. On October 3rd, 2002, version 0.1 was first announced on the Freshmeat (now Freecode) portal. That day, CloverETL’s official life began.

I don’t want to look into Clover’s history too much, though. I do, however, want to take this time to make a few comments about the principles on which CloverETL was established and how these principles continue to determine its future.

Principle number 1: Elegant and robust architecture guarantees a stable foundation

CloverETL started more as a framework on which other projects could be based, rather than as an end-user product with a “sexy” GUI. As a matter of fact, the real GUI was built in 2005, almost three years after the release of first CloverETL engine, which is now present in every tool of the CloverETL family – the Designer, the Server and also CloverETL Profiler.

Even though we are now on version 3.2, there has, so far, only been one change which significantly broke backward compatibility: when we switched from Java 1.4 to Java 1.5 and changed some key interface definitions.

This particular principle is what gives a certain peace of mind to the projects and software products embedding or otherwise deploying Clover, as they know there won’t be any sudden surprises with future versions. It also proves that the original architecture was robust and flexible enough at the outset to support all the later additions and improvements.

Principle number 2: Less is better

CloverETL is based on idea of cooperating components, each specialized with one certain functionality only. However each component is flexible enough to support various “outer” conditions in which the component works.

For example, our UniversalDataReader is meant for parsing text data. The data can come in variations like fixed-length, delimited, or combined; can be read locally or from remote locations; and can be available in plain form or compressed. All these variations are supported, which means that subtle changes, like data becoming available through a different protocol or perhaps being suddenly compressed, require only slight reconfiguration of our DataReader. Contrast this with other players, whose hundreds of different components require architecture changes in transformation (replacement of one component with other) when small shift in input data happens (e.g. due to moving from DEV to PROD environment) and you’ll notice the difference.

It also means that a programmer or analyst designing data transformations in Clover does not need to carry a dictionary of components; a short list covers all possible scenarios.

Principle number 3: Agility is sexy, but long term planning is wise

CloverETL is used in many applications by many customers. Some of them are large, global corporations that embed Clover in their products. Through our OEM program, we work with many customers with a very agile approach to the development of their applications. Some of them have release cycle as short as two weeks where they must  not only develop & debug, but also release new features. Clover’s development team tries to keep up with this sprint, but we still take our time to plan, architect, and develop new, fundamental features to extend CloverETL’s capabilities and help our customers do their jobs faster and simpler.

The reason we insist on thinking through every new feature request, beyond simple tweaks, is that sometimes relatively small and quick change may break compatibility somewhere or prevent future extensions. Whenever our development team touches the core (engine) we make sure the change is properly evaluated from several points of view, including:

  • Backward compatibility – at least at transformation graph level.
  • Performance – Slowdown of just a few percent on big data can mean extra kW of energy consumed by data crunching servers.
  • Future extensibility – We hate deprecating APIs or components just because we might not be able to continue enhancing and improving them.

This principle is further supported by the fact that CloverETL continues to be developed by the same, stable development team year in and out. Many team members have been around since 2005, when the commercial life of Clover began.

Part two – What will appear on the menu in 2012

In short, there will be evolution and, in certain areas, some revolution. We are always sorting out the dilemma of whether to break from the “past” and come up with something completely new and revolutionary – at least in our minds – or continue to improve the old-faithful engine architecture laid out years ago.

As we weren’t able to choose one or the other, we decided to continue improving what works well (and should continue to, even in future) and overhaul some things that have had occasional hiccups with modern data structures and formats brought to us by the CLOUD.

Evolution

Expanding CloverETL OEM program

As CloverETL attracts new OEM customers, we continue improving our OEM program by making it simpler to embed, modify, white-label, or otherwise enhance our technology stack. This includes better documentation, example projects, and extended training.
We are also investing in our support team, which has always strived to provide timely and accurate answers to all support requests submitted through various channels, from e-mail to the technology forum and hotline.

Our support staff is comprised of experienced consultants and programmers who have real-life experience with our technology—they aren’t just people a few manual pages ahead of a user seeking an answer.

GUI – continuous improvement of the user experience

We will continue our effort to make the Designer more and more user-friendly. Our motto is: CloverETL is built by professionals for professionals and, truly, professional DI experts or Java programmers usually give us high marks. Nonetheless, we want to make our technology accessible to the broadest possible audience seeking solutions to certain data needs.

Enhancing CloverETL Cluster – our BigData recipe

These days, BigData is usually mentioned together with Hadoop as the solution. As much as we like Hadoop for various reasons, we have our own recipe for processing BigData, and we think it’s better suited for classical data integration/ETL tasks. It is based on a split/transform/merge idea, where big input data are partitioned and then processed in parallel on multiple nodes of the CloverETL Cluster. The advantage of this, as opposed to Hadoop, is that the transformation may be developed & debugged locally, then easily deployed onto CloverETL Cluster for fast execution. Even if executed in a cluster environment, all the debugging and monitoring options of our Designer are available. It is also worth mentioning that deploying CloverETL Cluster is much easier than setting up the Hadoop cluster.

Our big enhancement of CloverETL Cluster in 2012 will be the merging of our technology with Hadoop – more precisely HDFS filesystem – which should combine the best from both worlds. HDFS provides some cool features, namely robustness and high performance, and we want to utilize its automated data partitioning to make it easier to grow (or shrink) the storage of data depending on actual needs.

Revolution

Rich data structures – trees, unstructured data, etc.

It has to come with age, but I can’t resist and must admire those who devised Cobol and CopyBook. In those times, every byte of storage counted and CPUs were slow, yet programmers were still able to process rich data structures. Then relational databases came and brought the idea of tables and normal forms. Well, today, we are back to rich structures, but this time, we’ve stopped counting bytes or CPU cycles (which has a huge impact on power consumption of servers, but that’s a different story.) That is why XML, JSON, or other rich structures are becoming the norm today.

In order to support these structures and formats as first class passengers, we decided to overhaul our metadata and record storage model and allow direct support of tree structures, multi-values of fields, and even loosely typed data organized in maps/properties collections.

This independently constitutes as a big adventure, as every single piece of our technology platform will be affected, and thus will have to be adapted. The effort will be huge, and necessary regression testing of the whole platform will be endless. Despite this, the prize is enticing: almost any type of data (and the cloud will be bonanza for this) will be 1:1 representable by Clover. That will include XML, JSON, POJO, and complex properties – and, in the future, who knows what else!

—–

We have always claimed that CloverETL is future-proof. Therefore, in 2012, we will be improving our foundations so they withstand the next 10 years.

If what I’ve talked about above is of interest to you, then please stay tuned. We will be publishing more details on our new functionality as we implement it.

For now, I wish everyone a very successful 2012!

A Look Back: CloverETL and Data Integration in 2011

As 2011 comes to a close, we’d like to take the time to reflect on what this year has brought CloverETL, its users, and our customers.

Since CloverETL is, after all, a data integration platform, the world of integration is at our core. We’re constantly striving to challenge ourselves in new ways and improve how we approach data integration. This year was no different.

Enhancing Our Core – Two Upgrades of CloverETL

In the past six months, we released two upgraded versions of CloverETL. CloverETL 3.1, published in June, brought significant changes to the platform in several areas. With a deeper focus on connectivity and enhanced support of various data formats, CloverETL 3.1 helped users better process data with complex structure, emails, and Lotus documents, to name a few. The latest version of CloverETL, version 3.2, offered further enhancements to the user experience, as well as improved the processing of large data records.

Data Integration Meets Data Quality – CloverETL Profiler

This year was also a year for new products. With Clover, we’ve moved forward with an evolved sense of the data world. Because data integration, data quality, and other data disciplines are becoming more and more intertwined, we developed the CloverETL Profiler, data profiling application. Released in beta back in October, the profiler helps users make informed decisions on how to improve the quality of transformed data, which is particularly useful as precursor to a greater data integration projects. CloverETL also integrates more easily with the AddressDoctor solution to improve the quality of geographical information.

Strengthening CloverETL Presence in the US Market

In June, Javlin, the developer of CloverETL, opened up its new office in the Washington D.C. area, which became the headquarters of Javlin Inc., our US presence. Javlin Inc., with both a dedicated sales and customer service force, brings Clover to a whole new market of possibilities.

Last but not least, we are pleased to see that our OEM data integration offer will have a number of important implementations in the upcoming year. (But more on that later. Stay tuned.)

As we leave 2011, we can say that this past year was a whirlwind of hard work, exciting releases, and interesting customers and stories. We’re looking forward to another great year with CloverETL. Cheers to the New Year.

Usability Improvements in CloverETL 3.1

One of the most noticeable set of changes in CloverETL version 3.1 is the interface improvements, substantially improving Clover’s usability and understandability. These improvements save both new and old users valuable time when creating or manipulating their data transformation graphs and further cement CloverETL’s place as one of the most easy to use ETL tools on the market.

The biggest improvement was the addition of drag-and-drop functionality to a number of different aspects of Clover. You can drag files to the graph, files to components, files to metadata, and metadata to edges, saving innumerable clicks through menus.

We have also made it easier to link your metadata and edges while creating the edges. If you right-click on the Edge tool in your palette, it will give you a list of every metadata you have created on the current graph. If you select one of the metadata, whenever you create an edge with the edge tool, it will automatically assign that metadata to the edge.

Not only is it easier to link metadata and edges, we’ve also made it easier to create and manipulate the edges themselves. Edges can now be created simply by dragging from one component’s out port to another’s in port. If you find you want to change where the edge is connected, that too is now one-click. Simply click and drag an edge’s endpoints to any other port.

The last shortcut that version 3.1 added to CloverETL is an easier way to set the description on a component. Before, the description field was buried in the component’s properties, but now it has been moved to the header of the properties window. This improvement makes it substantially easier to clarify the purpose of your components, making your graph easier to read overall.

Working with CloverETL as a New User

I would like to share my experience with CloverETL as an external person. I study at Georgia Institute of Technology in Atlanta, GA with a major in Computer Science. On my search for interesting internships in Europe I found about Javlin a.s., a company based in Prague. As I wanted to get on hand experience with programming and software development I was immediately interested in this company. It sounded like an appealing opportunity where I could gain a lot of helpful knowledge and work experience.

In January I arrived in Prague and started working with CloverETL immediately when I began my internship with Javlin. When I arrived here, I had no clue exactly what CloverETL ‘did’, much less what date warehousing and extract-transform-load tools were. After the first week or so, I really figured out what ETL tools are, and especially, what CloverETL is. I played around with CloverETL, and was intrigued by what it can do. I really liked what it could – I don’t have any need for it now as a student, but I can see how incredibly powerful it can be for certain people or companies. I really like how simple it is to just connect components with a GUI, you can read a client XLS sheet, compare it with your Outlook address book (exported as a CSV file), removing duplicate records, collecting all data for each client into one record, and then making a new XLS sheet or CSV file for Outlook. Also, I like how the engine itself is open-source, enabling anyone to download and use it – but in my opinion, the Designer is much quicker and easier to use.

Designer-Server Integration: HTTPS made easy

In CloverETL Designer 2.8.0, connecting to CloverETL Server over HTTPS protocol is supported. However, the client requires some configuration including import of client’s certificate to the server. Starting with CloverETL Designer 2.8.1, the situation is much simplified. The HTTPS can be used without any additional client configuration.

The usage scenario is similar to using a web browser – if the Designer detects an unknown server certificate, it asks the user if the certificate should be accepted & imported. A server certificate can be imported either permanently or temporarily for one Designer session.

Connecting to CloverETL Server over HTTPS

Connecting to CloverETL Server over HTTPS

In the above screenshot you can see an example of connecting to the CloverETL Server over HTTPS. The Designer detected an unknown certificate and asks the user whether the certificate should be accepted. The user can of course examine certificate’s content prior to accepting or refusing.

This simple HTTP connection work in case that the application server running CloverETL Server does not require a certificate from its clients. When it requires client certificates, then the Designer must be configured as previously.

Designer-Server Integration Testing

CloverETL’s development team is preparing a new amazing feature, integration of CloverETL Designer with CloverETL Server. This feature shifts work with Clover to a much more comfortable level.

I was asked to participate on testing of it. And I decided to share my impressions.

The main feature of integration allows you to work with CloverETL’s graph located on CloverETL Server in  the same way as if it would be located on your desktop machine. So no more copying of files from desktop to server, no more out-of-date files, all items are located only on server and accesible and editable in the Eclipse with CloverETL on your desktop, transformation graphs are editable in graphic format.

All graphs are run on server machine but you don’t lose any of advantages useful for developing and debugging, you can view debug data on edge, view data on reader without running of the graph, see tracking information in tracking view etc. In addition all runs of graphs are tracked on server so you can see all execution logs in the Executions History tab of server administration interface.

After initial doubts I have realized that it works and now I’m fascinated with it :-) . You can expect it with many other improvement in version 2.8 of CloverETL Designer. So forget Informatica, forget DataStage, use CloverETL :-) .

Hidden Features: Note Properties

Do you want to have CloverETL’s graphs using nice descriptive colorful notes? Just resize your note and put the components into the area of the note as you can see on the picture.

Graph with note

And now try to move the note. All components placed inside the note will move together with the note :-) .
The color and the size of note’s font can be set in a Properties view (it is usually showed in the same part of eclipse windows as Console). In the same place you can also set background color of a note.

Properties view