When Two Become One: CloverETL OEM Embedded White Labeling

In our previous post, we reviewed the three approaches to CloverETL OEM—Here, I’ll discuss the more technical aspects of CloverETL OEM Embedded White Labeling.

OEM White Labeling – What it Takes

When partners needs white labeling, they are mostly considering the seamless integration of the ETL piece with their application. They have to manage things like version control of their application (making sure it works with Clover), the look and feel of the GUI, and a good sense of automation, for example. The Clover team works with the partner to architect this and can do so quickly.

Partners needing this option receive CloverETL Integration documentation to guide them through the process of achieving stable, effective white labeling. Both the CloverETL Designer and CloverETL Server can also be white labeled as part of a company strategy.

Which steps does white labeling consist of in Designer and Server?

CloverETL Designer

CloverETL Designer branding is based on showing the product name, splash screen, welcome screen, etcetera inside the Eclipse environment. Applications embedding or extending CloverETL Designer can use the same branding elements to visually customize the product. Features available via branding:

  • Product naming and description: These are visible in multiple places inside Eclipse (however, not all occurrences can be changed.)
  • Default configuration: Eclipse configuration options can be changed according to users’ needs, e.g. whether a splash screen should be shown at start-up.
  • Splash screen: The initial image shown at start-up and the progress bar can be customized.
  • Welcome screen: An introduction screen shown to users on the application’s first run; it can contain useful links to documentation, examples etc.

CloverETL Server

CloverETL Server is mentioned in multiple messages, web GUI images, directory names, etc. Following steps described in the integration documentation (available to OEM partners), you can get rid of all CloverETL occurrences. White labeling Server thus involves:

  • Replacing images, logos plus related work on graphics
  • Editing a couple of properties stored in server configuration files
  • Tuning the resulting application

CloverETL OEM Advantage: ETL As You Like It

Shakespeare didn’t know the first thing about OEM, or Original Equipment Manufacturers, but he did have quite a grasp on relationships. That being said, we’re here to put what Will preached back into the data world, channeling his penchant for the complexity of true relationships, with our OEM Program. Introducing, “ETL As You Like It: A True OEM Partnership.”

What is a CloverETL OEM Partnership?

Well, it really is a relationship, isn’t it? When CloverETL aligns with a company, we bring together not just our unique products, but also our tested dev teams, solutions, and ideas. It’s actually like a marriage—with joys and delights, compromise and adjustment. We’ve been active in OEM Partnerships for a while now and think it’s a great way for Clover to flourish in all sorts of challenges.

Today, CloverETL is at the core of many data service platforms as a vital data integration piece. As an OEM to many customers’ larger offerings – be it with IBM’s MDM offer, Good Data’s BI platform, or Mulesoft’s ESB service – Clover, with its flexibility, lends itself to match any OEM business strategy. It’s just a matter of deciding which OEM approach works best. CloverETL can work on a “partner basis” (side by side as a toolset), be embedded as a data integration piece, or even be embedded and white labeled.

Let’s take a look at what OEM programs are available for Clover today.

Partner Approach

Some companies use CloverETL alongside their offer to provide an integrated commercial package to their end-user clients. These clients can then have access to the Clover community through our forum to gain knowledge about the best use practice of CloverETL. Often, companies are replacing just the ETL bit because it is either hand-coded or just too cumbersome.

The data services companies that choose this option tend to need a lower volume of licenses. Another reason might also be to simply add a stronger ETL product to an application or service without the need to fully embed Clover. In a sense, the partner approach works like a “power couple” to run compelling applications that provide value to data clients.

Embedded OEM

Many businesses prefer the embedded OEM approach for CloverETL because they only have to manage one offering for their development teams. To embed means that as they make changes and improvements to their application, Clover and its team can adapt with a company’s growth in an evolving market. We offer this option in higher volume scenarios on a per unit basis, or even on an enterprise (one time fee) basis as the case allows.

For example, there are literally hundreds of Clover users for IBM’s Initiate MDM offer; the company, a partner since 2007, even produced a CloverETL user guide to help their clients learn and use Clover more effectively.

You can check out this IBM WorkBench document at: http://publib.boulder.ibm.com/infocenter/initiate/v9r5/topic/com.ibm.initiatepdfs.doc/topics/i46wecug.pdf

Embedded OEM – White Labeling

Lately, we are seeing a trend from customers asking for a white labeled ETL/data integration piece for their service offer. This is part of an interesting strategy where service providers simply want to present their offer with a focus on the application or the power of the company brand. IT and data professionals know that the ETL is there, cranking through and organizing data, but a white label strategy puts the ETL behind the scenes for a complete, branded look: Voila! The service just works, it seems. White labeling has the same licensing strategy as embedded-only OEM approach.

In all, David Pavlis, Javlin’s President put it best: “What’s great about Clover is that we want to work with you, and you can rely on us. We can adapt alongside your offering or become part of it, fully integrated. What we do know is that, in every case, we can support a lifetime partnership–adapting, improving, and aligning to where our customers and their clients are going.”

And there you have it, as you like us.

Interested in the technical nuances of OEM? Our OEM blog series will continue, detailing each approach reviewed today, so check back soon.

CloverETL Visions for 2012: Evolution and Revolution in Data Integration

Part one – celebrating 10 years

In 2012, CloverETL will celebrate its 10th anniversary as an open source project. It all started back in 2002. On October 3rd, 2002, version 0.1 was first announced on the Freshmeat (now Freecode) portal. That day, CloverETL’s official life began.

I don’t want to look into Clover’s history too much, though. I do, however, want to take this time to make a few comments about the principles on which CloverETL was established and how these principles continue to determine its future.

Principle number 1: Elegant and robust architecture guarantees a stable foundation

CloverETL started more as a framework on which other projects could be based, rather than as an end-user product with a “sexy” GUI. As a matter of fact, the real GUI was built in 2005, almost three years after the release of first CloverETL engine, which is now present in every tool of the CloverETL family – the Designer, the Server and also CloverETL Profiler.

Even though we are now on version 3.2, there has, so far, only been one change which significantly broke backward compatibility: when we switched from Java 1.4 to Java 1.5 and changed some key interface definitions.

This particular principle is what gives a certain peace of mind to the projects and software products embedding or otherwise deploying Clover, as they know there won’t be any sudden surprises with future versions. It also proves that the original architecture was robust and flexible enough at the outset to support all the later additions and improvements.

Principle number 2: Less is better

CloverETL is based on idea of cooperating components, each specialized with one certain functionality only. However each component is flexible enough to support various “outer” conditions in which the component works.

For example, our UniversalDataReader is meant for parsing text data. The data can come in variations like fixed-length, delimited, or combined; can be read locally or from remote locations; and can be available in plain form or compressed. All these variations are supported, which means that subtle changes, like data becoming available through a different protocol or perhaps being suddenly compressed, require only slight reconfiguration of our DataReader. Contrast this with other players, whose hundreds of different components require architecture changes in transformation (replacement of one component with other) when small shift in input data happens (e.g. due to moving from DEV to PROD environment) and you’ll notice the difference.

It also means that a programmer or analyst designing data transformations in Clover does not need to carry a dictionary of components; a short list covers all possible scenarios.

Principle number 3: Agility is sexy, but long term planning is wise

CloverETL is used in many applications by many customers. Some of them are large, global corporations that embed Clover in their products. Through our OEM program, we work with many customers with a very agile approach to the development of their applications. Some of them have release cycle as short as two weeks where they must  not only develop & debug, but also release new features. Clover’s development team tries to keep up with this sprint, but we still take our time to plan, architect, and develop new, fundamental features to extend CloverETL’s capabilities and help our customers do their jobs faster and simpler.

The reason we insist on thinking through every new feature request, beyond simple tweaks, is that sometimes relatively small and quick change may break compatibility somewhere or prevent future extensions. Whenever our development team touches the core (engine) we make sure the change is properly evaluated from several points of view, including:

  • Backward compatibility – at least at transformation graph level.
  • Performance – Slowdown of just a few percent on big data can mean extra kW of energy consumed by data crunching servers.
  • Future extensibility – We hate deprecating APIs or components just because we might not be able to continue enhancing and improving them.

This principle is further supported by the fact that CloverETL continues to be developed by the same, stable development team year in and out. Many team members have been around since 2005, when the commercial life of Clover began.

Part two – What will appear on the menu in 2012

In short, there will be evolution and, in certain areas, some revolution. We are always sorting out the dilemma of whether to break from the “past” and come up with something completely new and revolutionary – at least in our minds – or continue to improve the old-faithful engine architecture laid out years ago.

As we weren’t able to choose one or the other, we decided to continue improving what works well (and should continue to, even in future) and overhaul some things that have had occasional hiccups with modern data structures and formats brought to us by the CLOUD.

Evolution

Expanding CloverETL OEM program

As CloverETL attracts new OEM customers, we continue improving our OEM program by making it simpler to embed, modify, white-label, or otherwise enhance our technology stack. This includes better documentation, example projects, and extended training.
We are also investing in our support team, which has always strived to provide timely and accurate answers to all support requests submitted through various channels, from e-mail to the technology forum and hotline.

Our support staff is comprised of experienced consultants and programmers who have real-life experience with our technology—they aren’t just people a few manual pages ahead of a user seeking an answer.

GUI – continuous improvement of the user experience

We will continue our effort to make the Designer more and more user-friendly. Our motto is: CloverETL is built by professionals for professionals and, truly, professional DI experts or Java programmers usually give us high marks. Nonetheless, we want to make our technology accessible to the broadest possible audience seeking solutions to certain data needs.

Enhancing CloverETL Cluster – our BigData recipe

These days, BigData is usually mentioned together with Hadoop as the solution. As much as we like Hadoop for various reasons, we have our own recipe for processing BigData, and we think it’s better suited for classical data integration/ETL tasks. It is based on a split/transform/merge idea, where big input data are partitioned and then processed in parallel on multiple nodes of the CloverETL Cluster. The advantage of this, as opposed to Hadoop, is that the transformation may be developed & debugged locally, then easily deployed onto CloverETL Cluster for fast execution. Even if executed in a cluster environment, all the debugging and monitoring options of our Designer are available. It is also worth mentioning that deploying CloverETL Cluster is much easier than setting up the Hadoop cluster.

Our big enhancement of CloverETL Cluster in 2012 will be the merging of our technology with Hadoop – more precisely HDFS filesystem – which should combine the best from both worlds. HDFS provides some cool features, namely robustness and high performance, and we want to utilize its automated data partitioning to make it easier to grow (or shrink) the storage of data depending on actual needs.

Revolution

Rich data structures – trees, unstructured data, etc.

It has to come with age, but I can’t resist and must admire those who devised Cobol and CopyBook. In those times, every byte of storage counted and CPUs were slow, yet programmers were still able to process rich data structures. Then relational databases came and brought the idea of tables and normal forms. Well, today, we are back to rich structures, but this time, we’ve stopped counting bytes or CPU cycles (which has a huge impact on power consumption of servers, but that’s a different story.) That is why XML, JSON, or other rich structures are becoming the norm today.

In order to support these structures and formats as first class passengers, we decided to overhaul our metadata and record storage model and allow direct support of tree structures, multi-values of fields, and even loosely typed data organized in maps/properties collections.

This independently constitutes as a big adventure, as every single piece of our technology platform will be affected, and thus will have to be adapted. The effort will be huge, and necessary regression testing of the whole platform will be endless. Despite this, the prize is enticing: almost any type of data (and the cloud will be bonanza for this) will be 1:1 representable by Clover. That will include XML, JSON, POJO, and complex properties – and, in the future, who knows what else!

—–

We have always claimed that CloverETL is future-proof. Therefore, in 2012, we will be improving our foundations so they withstand the next 10 years.

If what I’ve talked about above is of interest to you, then please stay tuned. We will be publishing more details on our new functionality as we implement it.

For now, I wish everyone a very successful 2012!

A Look Back: CloverETL and Data Integration in 2011

As 2011 comes to a close, we’d like to take the time to reflect on what this year has brought CloverETL, its users, and our customers.

Since CloverETL is, after all, a data integration platform, the world of integration is at our core. We’re constantly striving to challenge ourselves in new ways and improve how we approach data integration. This year was no different.

Enhancing Our Core – Two Upgrades of CloverETL

In the past six months, we released two upgraded versions of CloverETL. CloverETL 3.1, published in June, brought significant changes to the platform in several areas. With a deeper focus on connectivity and enhanced support of various data formats, CloverETL 3.1 helped users better process data with complex structure, emails, and Lotus documents, to name a few. The latest version of CloverETL, version 3.2, offered further enhancements to the user experience, as well as improved the processing of large data records.

Data Integration Meets Data Quality – CloverETL Profiler

This year was also a year for new products. With Clover, we’ve moved forward with an evolved sense of the data world. Because data integration, data quality, and other data disciplines are becoming more and more intertwined, we developed the CloverETL Profiler, data profiling application. Released in beta back in October, the profiler helps users make informed decisions on how to improve the quality of transformed data, which is particularly useful as precursor to a greater data integration projects. CloverETL also integrates more easily with the AddressDoctor solution to improve the quality of geographical information.

Strengthening CloverETL Presence in the US Market

In June, Javlin, the developer of CloverETL, opened up its new office in the Washington D.C. area, which became the headquarters of Javlin Inc., our US presence. Javlin Inc., with both a dedicated sales and customer service force, brings Clover to a whole new market of possibilities.

Last but not least, we are pleased to see that our OEM data integration offer will have a number of important implementations in the upcoming year. (But more on that later. Stay tuned.)

As we leave 2011, we can say that this past year was a whirlwind of hard work, exciting releases, and interesting customers and stories. We’re looking forward to another great year with CloverETL. Cheers to the New Year.

Lotus Notes Integration – export or exchange data easily

Lotus Domino (formerly Lotus Notes) is an IBM server product that provides enterprise-grade e-mail, collaboration capabilities, and a custom application platform. The platform is used by many large corporations, and most of them also employ other enterprise systems that need to exchange data with the Lotus platform. This is where CloverETL can now be helpful.

Since version 3.1 CloverETL is able to connect to Lotus Domino servers and read and write its data, CloverETL’s Lotus Notes components help you connect the data-record CloverETL world with key-value concepts of Lotus Notes databases. This connectivity allows a lot easier exchange of Lotus data with Excel, relational databases or even with various WebServices. CloverETL Designer enables even less technical persons to setup connectivity to Lotus Domino and create simple data extraction or data ingest apps.

Lotus Notes (Domino) database stores data in documents. A document contains key-value pairs where the key is always a string and the value can be of various formats and types (number, string, image, OLE object, RTF, …). The data in Lotus Notes are therefore structured differently than in classical data-record world of CloverETL. Typically there are no guarantees of what values with what keys will be present in a document.

To overcome this gap, the concept of Lotus Notes Views is used to our advantage. A view in Lotus Notes provides tabular structure for the data in Lotus Notes database. Therefore, to read the data it is necessary to first define a view through which the data will be exported. This task needs to be done by the manager of the Lotus Notes database.

Finally, in the Lotus Domino server, remote access needs to be enabled. This is typically done by issuing the following commands in the server console:

load http load diiop

Connecting to Lotus Notes from CloverETL

The first task on the side of CloverETL is to define a Lotus Notes connection. The connection specifies the address of the Lotus Domino server and user credentials that will be used to log in. The user’s name should be in the format that can be found in the Person document in the Domino Directory. The database can be either specified by its file name (typically a *.nsf file) or by the Replica ID value.

Finally, a path to the Lotus Notes connection drivers (NCSO.jar) needs to be specified. These drivers are not part of CloverETL and can usually be found in your Lotus Domino server installation, e.g. C:\Program Files\Lotus\Domino\data\domino\java\NSCO.jar

Creating metadata

After the connection is prepared, the next step is to create metadata that will be used for reading data from Lotus Notes.

First, choose the connection that you’ve created in the previous steps and also, at the bottom of the dialog, specify the view which will provide the input data. On the following page of the wizard you will see new metadata extracted from the view. Based on the preview of the data you can then manually fine-tune the types of the fields, as initially all field types will be set to string.

Reading from Lotus Notes

LotusReader is the name of the CloverETL component which reads data from Lotus Notes views. Thanks to the preparation steps above, configuring this component is fairly simple. Place this component into your graph, open its configuration and set the Domino connection created in previous steps and select the view from which the data will be extracted. Finally, connect this reader to your graph with an edge and assign metadata we extracted in the steps above.

Writing to Lotus Notes

Complementary to the reader component, there is also a component to import data into Lotus Notes. Basic configuration of the LotusWriter component is quite simple too. In the following paragraphs, we will pay better attention to the different ways the data can be written to Lotus Notes.

The default mode to write the data is insert. In this mode, LotusWriter will create new document in Lotus Notes database for every incoming record. After inserting the document, it will also launch form calculations. This is a procedure defined in the Lotus Notes database by the author of the database that can for example fill-in missing fields, fix formatting of specific string fields, check various constraints, and so on.

The computation can mark a record as invalid in the case of errors during the computation. You can choose to further process these records (e.g. dump them to a log file) by connecting any writer component to the only output port of LotusWriter. You also have the option not to save the invalid records, by enabling the Skip invalid documents option. By default, even documents marked as invalid will be stored into the Lotus Notes database.

The other write mode is update. In this mode, the LotusWriter tries to update existing documents in the database, instead of inserting new ones. The component will look up the documents by searching the view specified in the component configuration (required parameter for update mode). Each view marks certain columns as sorted. These columns will be used too look up the documents to update.

There are two more advanced and in certain occasions quite important concepts in writing to Lotus Notes database with CloverETL. The first one is lazy update replication, the other one is custom field mapping.

For the update mode, after the document fields are updated, save operation will be commenced. This operation may also launch the optional replication of the document. However, in case there were no actual changes in the contents of the document, the replication would be unnecessary. Therefore, the writer component uses lazy update mechanism – it will only launch the save operation on the document, if some of the fields’ values actually changed. This can save significant amount of time for large data manipulations.

Finally, the last feature that will be described here is custom field mapping. By default, the LotusWriter component will create documents with values based on the names of the input metadata fields. However, it might be preferable to create documents with the help of a Form. Forms often specify fields documents should contain and are a common way how to create new documents in Lotus Notes.

The mapping dialog helps you to map fields of input metadata to the fields of selected Form. This allows writing single field multiple times, skip fields, but most importantly choose proper names for the target document fields.

Exporting Data Transformation Projects to CloverETL Server

CloverETL Designer in its full or trial version provides integration with CloverETL Server. The CloverETL Server serves as an ETL runtime environment and brings such enterprise features as automation, workflows, monitoring, user management and many others. The integration allows users to design and maintain data integration Server projects locally with their Designers. However, sometimes you may find yourself in a situation when you need to export and deploy a project originally developed locally on your computer to the Server. A quick how-to is described below.

1. Select File > Export

2. Select Export to CloverETL Server sandbox

3. Select desired projects
4. In case you want to export a project to our demo server you can select it from a combo box, or type in URL of your CloverETL Server. Enter a username and password (clover/clover for the demo server).
5. Click the Reload button to load available sandboxes and select a desired one (playground1 or playground2 for the demo server. Other demo sandboxes are readonly).
6. Click Finish

7. Check the exported project in the CloverETL Server under Sandboxes.

Warning: Graphs, including their parameters are copied to the Server (i.e. file paths.) These parameters needs to be adjusted.

Handling Errors in Heterogeneous Input Data

ComplexDataReader is a powerful new component in CloverETL meant for reading elaborate heterogeneous data. However, all data cannot be read easily even if you spend a lot of time configuring the component. Sometimes you need to think in advance: What if you come across unknown metadata you have not handled? Normally, the graph crashes.

This post will examine a way of preventing that or, more specifically, how to handle errors in input data.

Example Input Data

Input Data

What We Will Do

We can instantly distinguish three kinds of metadata on the input: product, product_range and service. ComplexDataReader is the best component to parse these using three states of a state machine. As you can see, there is one line that does not fit into the data. The magic trick of this example lies in preparing one extra state – the error state. The state will be responsible for “catching” all incorrect data which would cause the component to fail. In order to be able to decide which data are “bad,” or, more precisely, when to switch to the error state, you have to write a custom Selector class in Java. The idea behind the code is very simple and will be explained below:

“Prep Work”

First, we need to prepare metadata for all three states of the state machine plus one extra. The extra metadata will represent error lines on the input we need to “throw away.”

Second, do not forget to connect the component to its succeeding components and assign metadata to output edges.

Third, set the “File URL” property to point the component to the input file.

Here are the three aforementioned metadata:

Metadata: Product

Metadata: Service

Metadata: Product Range

And one extra metadata for error lines:

Metadata for Error Lines

Designing  State Machine

We are going to create four states:

Note: There are no transition edges to be seen in the graph. It is because the Selector itself will decide when to change between states.

Start configuring the component via the “Transform” property. Create four states corresponding to the metadata and set “Initial state” to “Let selector decide”:

Switch to state “$0 product” and define its output mapping. In this state, we will send all fields to the output. Thus, drag state $0 to the “Value” column in the right-hand pane. You will produce the “$0.*” directive. In the “Transition table”, switch “Target state” to “Let selector decide”:

Repeat the same procedure for all remaining states (including the error state). Always send everything to the output port and “Let selector decide” about the target state:

Writing Custom Selector

We are now going to prepare a Java class that will do the magic of this example – switch between states “$0 product”, “$1 service”, “$2 product_range” and the “$3 error” state in case there are errors on reading. This particular prefix Selector will assume there is another record on the following line(s) and will try to read it. If there really is a new record, we can recover from the error line and carry on reading.

You can prepare the Java class in any editor of your choice. After writing it, just remember to place it into the “trans” folder of your project. On that condition, CloverETL will automatically compile the class for you.

The Selector class will look like this:

public class CustomPrefixInputMetadataSelector1 extends com.opensys.cloveretl.component.complexdatareader.PrefixInputMetadataSelector {

	private static final int DEFAULT = 3;

	@Override
	public int select(int prevState) {
		int result = super.select(prevState);
		if(result == org.jetel.component.RecordTransform.ALL) {
			return DEFAULT;
		}
		return result;
	}
}

A few comments concerning the code:

  • int result = super.select(prevState);
    First, we try to call the default selector and store the number of the next state into result.
  • if(result == org.jetel.component.RecordTransform.ALL)
    And if the default selector cannot decide…
  • return DEFAULT;
    We return the default state number – number 3. This is the error state.

Now that you are done with the code, switch to the “Selector” tab in “State transitions”. In “Selector URL”, browse for your custom Selector. Notice that after you specify its location, the “Selector properties” area changes:

Conclusions & Pitfalls

In this article, we have presented a way of handling flaws in the input data. We have been capable of addressing a situation when the selector looks on the following metadata and cannot decide which state goes next.

However, there are numerous cases when you just cannot prevent reading errors from occurring. For instance, even if the selector recognizes the following metadata but then fails on parsing them, we cannot react and the graph fails. You can imagine that as a file whose field types suddenly change, (e.g. from integer to date – the selector starts parsing an integer and crashes). Another known case we cannot handle is changeable number of fields in one record. If new fields occur or their number decreases, the graph execution fails. The only exception to this are fields added at the end of a record. These can be handled with the help of lenient data policy.

Download a complete CloverETL project – error handling in ComplexDataReader

Usability Improvements in CloverETL 3.1

One of the most noticeable set of changes in CloverETL version 3.1 is the interface improvements, substantially improving Clover’s usability and understandability. These improvements save both new and old users valuable time when creating or manipulating their data transformation graphs and further cement CloverETL’s place as one of the most easy to use ETL tools on the market.

The biggest improvement was the addition of drag-and-drop functionality to a number of different aspects of Clover. You can drag files to the graph, files to components, files to metadata, and metadata to edges, saving innumerable clicks through menus.

We have also made it easier to link your metadata and edges while creating the edges. If you right-click on the Edge tool in your palette, it will give you a list of every metadata you have created on the current graph. If you select one of the metadata, whenever you create an edge with the edge tool, it will automatically assign that metadata to the edge.

Not only is it easier to link metadata and edges, we’ve also made it easier to create and manipulate the edges themselves. Edges can now be created simply by dragging from one component’s out port to another’s in port. If you find you want to change where the edge is connected, that too is now one-click. Simply click and drag an edge’s endpoints to any other port.

The last shortcut that version 3.1 added to CloverETL is an easier way to set the description on a component. Before, the description field was buried in the component’s properties, but now it has been moved to the header of the properties window. This improvement makes it substantially easier to clarify the purpose of your components, making your graph easier to read overall.

Launch Services – ETL Transformation as a WebService

Transformations on CloverETL Server can be run by users in a very simple way by using just a web browser and a correct link. When a user click on the link, a transformation is triggered on the Server and requested data are generated. These data can be in several different formats as Excel spreadsheet, CSV file, XML, or HTML. One can even place a form on the web page that serves as a source for parameters for the transformation.

In this blog post, I will demonstrate how Launch Services can be used from user point-of-view. The next part will show how Launch Services can be configured.

Each installation of CloverETL Server contains a sandbox named default that has many examples of Launch Services in. So you can immediately experiment with all the examples described below. If you do not have a licensed version of CloverETL Server, you can use an online server demo running at http://www.cloveretl.com/examples/server-demo

Example 1 – Glossary

There is a database with economic terms which is updated everyday on our online demo server. You can put a web form on a web page where user enters an unknown term and press the execute button. A request is sent to the server where CloverETL transformation is run, then matching and similar terms are returned and displayed as an HTML page.

Note: This example is using HTTP POST request. However, Launch Services can handle also GET requests. So you can place anywhere a normal URL that would link to an explanation of a specific term. Format of such link is: <server-url>/launch/<launch-service-name>?<paramenter-name>=<parameter-value>, e.g.: http://server-demo-ec2.cloveretl.com/clover/launch/glossary?term=trend

Example 2 – Mountains

There is a database with the highest mountains on Earth with their heights. User enters an elevation above sea-level and hits an enter key. Excel sheet is then displayed listing all mountains with the given minimal elevation.

Note: It is necessary to provide credentials to trigger Launch Services (using HTTP basic authentification). A transformation may depend on a logged user or a group, the user belongs to. It is thus possible to configure the service in such a way that it will run under one URL but returned data format will depend on a specific user (e.g. users will get results in an Excel sheet, but one specific user can receive results as CSV file).

Example 3 – Mountains Upload

Launch Service can accept also a file containing data as a parameter. These data can be inserted to the server database or can be used for a transformation run. As a variation to the previous example, it is possible to submit a file with a list of highest mountains that is used for a selection.

Example 4 – Everest Ascents

Though input fields are of text type, Launch Services can parse them, according to how they are configured, into other data types as integer, real, date, boolean etc. They can have specific formats and locale settings, even different for each user. If a user enters data in a wrong format, he is alerted and CloverETL transformation can then receive only correct, parsed data.

In our example, we have a list of mountain climbers that reached Mount Everest without using an oxygen mask. Launch Service enables you to select a climber by date of the first ascent, gender, number of successful ascents, and nationality.

Example 5 – Scripting

Up to now, we have been accessing Launch Services from a web browser. However, they can be easily used also from other systems. It is possible to access a given URL through any application supporting HTTP protocol as wget on Unix. You could write a script downloading all women that climbed Mount Everest and sending them to a given email address:

wget –user clover –password clover -q -O – http://localhost:8080/clover/launch/ascents-everest?sex=F | tail -n +2 | cut -f 1 | mail i@somwhe.re -s “Ladies on Everest”

Example 6 – Client Library

The last possibility how to call Launch Services is via a Java library cloverETL-server-client.zip. This package will come handy if you need to integrate Launch Service with your Java application or if you need to trigger Launch Service from a command line with more parameters than wget allows for.

If you need to run Launch Service called acents-everest running on a server http://localhost:8080/clover as a user clover and all results save to a folder acents-everest-result, you can use this command:
java -jar cloverETL-server-client.jar acents-everest -u http://localhost:8080/clover -l clover -p clover -o acents-everest-result

Read the part two: Launch Services – Configuration

Tips for Integrating CloverETL to Third Party Applications

I am sure you are often finding yourself in your work in a situation when you have  data so-called „in your hands“ and you would like to transform them to a proper form, clean up them or load them to a data destination (e.g. a database). But you do not want to reinvent the wheel again and again. It is the time when you start searching a suitable technology which can help you to avoid developing what has been already developed, tuned and is  stable. Most of the ETL tools often satisfy all these requirements and CloverETL is not an exception. In addition, CloverETL Engine provides a few tricky approaches how to process your data a little bit unconventional.

  1. Transformation graph does not have to be statically defined in a standard XML file with an .grf extension – if you need to generate a graph based on dynamic user settings, it is nothing easier than to create your transformation dynamically (details are described at http://wiki.cloveretl.com/doku.php?id=embedding_clover) in your Java application. If you have your graph already created (you have its instance), no matter if it was created as a result of an .grf file or you created it manually, then you can easily reuse it – run the graph repeatedly without a need to rebuild it again and again. This ability is very useful for transformations that run really short and need to be triggered a couple of times per second (for instance as a reaction on a user ‘click’ in your massive multi-user application).
  2. Next interesting extension of your development tools can be creating your own shell stream filter – a simple application which transforms standard input (stdin) to standard output (stdout) (for example sort, cat) – everything based on CloverETL. You can prepare a whole set of specialized command line applications which can be exploited in your complex shell scripts. It will be enough if your graph contains just one UniversalDataReader and just one UniversalDataWriter with a fileURL parameter set to the dash character ‘-’ which stands for standard input (stdin) or standard output (stdout) stream. I have prepared a simple example that filters out undesirable lines from standard input. A pretty straightforward usage:cat employees.txt | clover -D:metadata=“employees.fmt“ -D:filter=“$salary>10000“ filter.grf > beFired.txt
  3. In the first paragraph, I tackled a possibility to build transformation graphs directly from your Java code. In this approach, you will encounter nature limitation of most of ETL tools very soon – narrow set of data types processed by public set of components. Typically you have java.io.InputStream with the data ready for further processing. If we decide to use classical Clover instruments, we would have to store the data into a temporary file first and prepare a graph which reads the temporary file. We would have a similar issue with handling output data, a temporary file would have to be used again. The issue is obvious – how to avoid needless temporary files? CloverETL engine has a prepared solution for this purpose called Dictionary. It is a sort of shared memory which can be concurrently accessed by all components all the time a graph is running. The dictionary is typically used for passing a data object as an input into the graph and retrieving another data object as an output from the graph. We can look at Dictionary as a classical Map<key, value>. Key is a string identifier of a value which can take a whole set of different dictionary types – integer, boolean, decimal, … and last but not least ReadableChannel and WritableChannel. The two last mentioned dictionary entry types can be exploited for passing and retrieving data to and from a graph. See the following piece of code how dictionary can be populated://prepare channel with the data for ETL processing
    InputStream inputData = getInputDataStream();

    //prepare channel where the resuled data will be formatted
    OutputStream outputData = prepareOutputDataStream();

    //create graph instance based on grf file and initialize it
    TransformationGraph graph = TransformationGraphXMLReaderWriter.read(File);
    EngineInitializer.initGraph(graph);

    //initialize graph dictionary
    //our input channel will be registered under „inputStream“ key
    graph.getDictionary().setValue(„inputStream“, „ReadableChannel“, inputData);

    //our output channel will be registered under „outputStream“ key
    graph.getDictionary().setValue(„outputStream“, „WritableChannel“, outputData);

    //execute graph – output data will be pushed to output stream during graph run
    runGraph.executeGraph(graph);

    Now you probably ask yourself how the graph knows that input data are ready in dictionary under „inputStream“ key and on the other hand how it knows where to write the result output data. The answer is simple – fileURL attribute of UniversalDataReader/Writer has a specialized syntax for dictionary entries. Reader can have fileURL set to “dict:inputStream”. In case of Writer we need to setup fileURL attribute to “dict:outputStream”. That is all – the CloverETL engine takes care of data transmission between your data streams and the CloverETL graph automatically. Data prepared in input stream will be parsed by a dedicated data reader and will be passed as Clover data records for further processing to next components down the graph. And incoming data to a data writer will be formatted into your channel prepared in dictionary under „outputStream“ key.

    As it was already mentioned,  Dictionary can handle various data types. Beside already described data streams, it is possible to store all basic Clover data types – string, integer, long, number (Java equivalent double), decimal (Java equivalent BigDecimal), byte, and boolean. So Dictionary can be used for passing input values to a graph or also for inter-component communication – the first component writes some semi-result into Dictionary and the second component can pick up this value for further processing.

    Probably the most advanced way how to exploit Dictionary is possibility to define your own proprietary dictionary data types. Similarly to components, connections, CTL functions and so on, the dictionary entry types are also fully pluginable. So you can easily introduce your own type that corresponds to your needs. For example, you can extend CTL by your own function that allows you to access this data value from Clover and converts it to a CloverETL data record – the basic data element processed by CloverETL engine. It is certainly possible to create a new set of components that understand your specific data format. In the scope of  component run your custom data format can be retrieved from Dictionary, transformed into a standard CloverETL data record and passed to an output port for following processing. We have been using this approach successfully in several projects where the data format was totally incompatible with CloverETL records.

I hope this little bit technical insight into CloverETL engine inspires you for its usage in situations that seemed inappropriate till now.