Data Profiling with CloverETL Profiler beta

The process of data integration, data migration, consolidation and other data manipulation projects consists of a variety of steps and tasks. Javlin supports many critical tasks within these projects with a versatile ETL tool that provides technical solutions to transform data and connect different systems and data sources with various data formats.

However, there are additional aspects in order to achieve success with data processing projects. One important project phase to ensure future success is the analysis phase. At Javlin, we are now on the verge of introducing new tool that will boost productivity in the early stages of a project. This product is our new data profiling tool – CloverETL Profiler.

CloverETL Profiler helps give you insight into your data. By creating a profile of your data using various metrics, you can get the big picture of the current condition of your data. This information has a large impact on the next stages of your project.

Real World Use – Data Warehousing Project

Imagine you are project manager with a complex task at hand – building a data warehouse. You have couple of data sources – database tables, CSV files, log files you want to analyze and probably some others. After you define the basic requirements for your data warehouse and its structure, you need to check the condition of your source data.

As you use the CloverETL Profiler, you will gain:

  • insight into the quality of your data indicating whether you need the perform a data cleansing phase in your project:
    • detect what data are missing (high percentage of missing or invalid values) looking for a different source
    • you may include data cleansing tools in the project budget and reserve tasks in your project plan
  • information about the structure of data in all your sources:
    • detect the need for ETL tools on specific data sources to transform your input data into a common format

Instead of writing your own tools and testing the data in complex way, you are just a few clicks away from creating detailed profile of your data.

Product Vision

The usage of the CloverETL Profiler tool for the project analysis phase is just the beginning. As part of our vision for the future, we are on the path to create a fully integrated product in CloverETL Designer and CloverETL Server family.

Its main purposes will be to aid:

  • CloverETL Designer: to assist the development of ETL transformation by allowing the creation of profiles of data in any stage of the transformation and checking the results during the debugging process
  • CloverETL Server: to constantly monitor the data condition on the production server  and invoke actions based on the specific condition detected in the profile of your data

Beta Program

Right now, we are excited to announce the beginning of our beta program for CloverETL Profiler. We invite you, people from the data integration community, to join the testing phase; it’s a great opportunity to not only have a look behind the scenes, but to also be a part of the development process by contributing your insights. For more information, please visit www.cloveretl.com/products/profiler.

Exporting Data Transformation Projects to CloverETL Server

CloverETL Designer in its full or trial version provides integration with CloverETL Server. The CloverETL Server serves as an ETL runtime environment and brings such enterprise features as automation, workflows, monitoring, user management and many others. The integration allows users to design and maintain data integration Server projects locally with their Designers. However, sometimes you may find yourself in a situation when you need to export and deploy a project originally developed locally on your computer to the Server. A quick how-to is described below.

1. Select File > Export

2. Select Export to CloverETL Server sandbox

3. Select desired projects
4. In case you want to export a project to our demo server you can select it from a combo box, or type in URL of your CloverETL Server. Enter a username and password (clover/clover for the demo server).
5. Click the Reload button to load available sandboxes and select a desired one (playground1 or playground2 for the demo server. Other demo sandboxes are readonly).
6. Click Finish

7. Check the exported project in the CloverETL Server under Sandboxes.

Warning: Graphs, including their parameters are copied to the Server (i.e. file paths.) These parameters needs to be adjusted.

Handling Errors in Heterogeneous Input Data

ComplexDataReader is a powerful new component in CloverETL meant for reading elaborate heterogeneous data. However, all data cannot be read easily even if you spend a lot of time configuring the component. Sometimes you need to think in advance: What if you come across unknown metadata you have not handled? Normally, the graph crashes.

This post will examine a way of preventing that or, more specifically, how to handle errors in input data.

Example Input Data

Input Data

What We Will Do

We can instantly distinguish three kinds of metadata on the input: product, product_range and service. ComplexDataReader is the best component to parse these using three states of a state machine. As you can see, there is one line that does not fit into the data. The magic trick of this example lies in preparing one extra state – the error state. The state will be responsible for “catching” all incorrect data which would cause the component to fail. In order to be able to decide which data are “bad,” or, more precisely, when to switch to the error state, you have to write a custom Selector class in Java. The idea behind the code is very simple and will be explained below:

“Prep Work”

First, we need to prepare metadata for all three states of the state machine plus one extra. The extra metadata will represent error lines on the input we need to “throw away.”

Second, do not forget to connect the component to its succeeding components and assign metadata to output edges.

Third, set the “File URL” property to point the component to the input file.

Here are the three aforementioned metadata:

Metadata: Product

Metadata: Service

Metadata: Product Range

And one extra metadata for error lines:

Metadata for Error Lines

Designing  State Machine

We are going to create four states:

Note: There are no transition edges to be seen in the graph. It is because the Selector itself will decide when to change between states.

Start configuring the component via the “Transform” property. Create four states corresponding to the metadata and set “Initial state” to “Let selector decide”:

Switch to state “$0 product” and define its output mapping. In this state, we will send all fields to the output. Thus, drag state $0 to the “Value” column in the right-hand pane. You will produce the “$0.*” directive. In the “Transition table”, switch “Target state” to “Let selector decide”:

Repeat the same procedure for all remaining states (including the error state). Always send everything to the output port and “Let selector decide” about the target state:

Writing Custom Selector

We are now going to prepare a Java class that will do the magic of this example – switch between states “$0 product”, “$1 service”, “$2 product_range” and the “$3 error” state in case there are errors on reading. This particular prefix Selector will assume there is another record on the following line(s) and will try to read it. If there really is a new record, we can recover from the error line and carry on reading.

You can prepare the Java class in any editor of your choice. After writing it, just remember to place it into the “trans” folder of your project. On that condition, CloverETL will automatically compile the class for you.

The Selector class will look like this:

public class CustomPrefixInputMetadataSelector1 extends com.opensys.cloveretl.component.complexdatareader.PrefixInputMetadataSelector {

	private static final int DEFAULT = 3;

	@Override
	public int select(int prevState) {
		int result = super.select(prevState);
		if(result == org.jetel.component.RecordTransform.ALL) {
			return DEFAULT;
		}
		return result;
	}
}

A few comments concerning the code:

  • int result = super.select(prevState);
    First, we try to call the default selector and store the number of the next state into result.
  • if(result == org.jetel.component.RecordTransform.ALL)
    And if the default selector cannot decide…
  • return DEFAULT;
    We return the default state number – number 3. This is the error state.

Now that you are done with the code, switch to the “Selector” tab in “State transitions”. In “Selector URL”, browse for your custom Selector. Notice that after you specify its location, the “Selector properties” area changes:

Conclusions & Pitfalls

In this article, we have presented a way of handling flaws in the input data. We have been capable of addressing a situation when the selector looks on the following metadata and cannot decide which state goes next.

However, there are numerous cases when you just cannot prevent reading errors from occurring. For instance, even if the selector recognizes the following metadata but then fails on parsing them, we cannot react and the graph fails. You can imagine that as a file whose field types suddenly change, (e.g. from integer to date – the selector starts parsing an integer and crashes). Another known case we cannot handle is changeable number of fields in one record. If new fields occur or their number decreases, the graph execution fails. The only exception to this are fields added at the end of a record. These can be handled with the help of lenient data policy.

Download a complete CloverETL project – error handling in ComplexDataReader

Usability Improvements in CloverETL 3.1

One of the most noticeable set of changes in CloverETL version 3.1 is the interface improvements, substantially improving Clover’s usability and understandability. These improvements save both new and old users valuable time when creating or manipulating their data transformation graphs and further cement CloverETL’s place as one of the most easy to use ETL tools on the market.

The biggest improvement was the addition of drag-and-drop functionality to a number of different aspects of Clover. You can drag files to the graph, files to components, files to metadata, and metadata to edges, saving innumerable clicks through menus.

We have also made it easier to link your metadata and edges while creating the edges. If you right-click on the Edge tool in your palette, it will give you a list of every metadata you have created on the current graph. If you select one of the metadata, whenever you create an edge with the edge tool, it will automatically assign that metadata to the edge.

Not only is it easier to link metadata and edges, we’ve also made it easier to create and manipulate the edges themselves. Edges can now be created simply by dragging from one component’s out port to another’s in port. If you find you want to change where the edge is connected, that too is now one-click. Simply click and drag an edge’s endpoints to any other port.

The last shortcut that version 3.1 added to CloverETL is an easier way to set the description on a component. Before, the description field was buried in the component’s properties, but now it has been moved to the header of the properties window. This improvement makes it substantially easier to clarify the purpose of your components, making your graph easier to read overall.

Address Cleansing and Transliteration with CloverETL and AddressDoctor

Data quality usually goes hand in hand with data integration. The new version CloverETL 3.1 has enriched its data cleansing capabilities through integration with AddressDoctor. AddressDoctor contains address and geo data for more than 240 countries all over the globe. Along with correcting and fixing mail addresses, AddressDoctor can also be used for transliteration of non-Latin writing systems into Latin characters or enriching addresses with latitude and longitude information.

CloverETL integrates AddressDoctor software through a dedicated AddressDoctor component. In all cases you need to have java library AddressDoctor5.jar, native libraries (they all need to be on the class-path when running a graph) and country databases. You also need unlock codes for your databases.

AddressDoctor component has 4 required parameters:

  • Configuration is driven by configXml or configFileparameter.  The simplest configuration file can look as follows:
           <?xml version="1.0" encoding="UTF-8"?>
           <SetConfig>
             <General MaxAddressObjectCount="10"/>
             <UnlockCode><Your unlock code here></UnlockCode>
             <DataBase CountryISO3="ALL" Path="/home/user/AD/db"/>
           </SetConfig>
  • Parameters can be set by parameterXml or parameterFileattribute. The simplest parameter file can look as follows:
           <?xml version="1.0" encoding="UTF-8"?>
           <Parameters>
             <Process Mode="PARSE"/>
             <Input/>
             <Result/>
           </Parameters>
  • Input mapping defines mapping between Clover input fields and AddressDoctor address properties.
  • Output mapping defines mapping between AddressDoctor output address properties and your output.

Transliteration example

Imagine that you have addresses all over the world saved in their original languages. For some languages, you would not even recognize them– to say nothing of their correct reading. Let’s try with following addresses:

Tomáš Novák;Vohradského 5;Česká Lípa;Czech republic
Hans-Peter Feiertag;Metro-Straße 1;Düsseldorf 4;Deutschland
Michał Dąbrowski;Marszałkowska 142;00-132 Warszawa;Polska
Борис Николаевич Ельцин; Казанская пл., 23;Санкт-Петербург;RUSSIAN FEDERATION
John Smith;100 Main Street;New York NY 10023;USB

To transliterate above addresses to Ascii, we need to set following parameters:

  • Mode to PARSE to transliterate without processing the address, Optimization level to STANDARD as it produces the best results. Both in Process tab of Parameters attribute:
  • Encoding to UTF-16 in Input tab of Parametersattribute:
  • Preferred language to ENGLISH and Preferred script to ASCII_SIMPLIFIED in Result tab of Parameters attribute:

Now we need to feed the Addressdoctor with input mapping. Input metadata:

Corresponds with Contact name, Street complete, Locality complete and Country name address properties:

–>

Similar mapping needs to be provided for output properties:

–>

We can also connect the Error port for invalid (unrecognised) addresses:

Now we can run our graph getting following results:

Tomas Novak;Vohradskeho;Ceska Lipa;CZECH REPUBLIC
Hans-Peter Feiertag;Metro-Strase;Dusseldorf 4;GERMANY
Michal Dabrowski;Marszalkowska;Warsaw;POLAND
Boris Nikolaevic Elcin;Kazanskaa Pl.;Sankt-Peterburg;RUSSIAN FEDERATION

But one address is missing. Look at the error port:

N1 means Validation Error: No validation performed because country was not recognized

The output can be even improved by setting ASCII_EXTENDED as Preferred script in Result tab of Parameters attribute:

Tomash Novak;Vohradskeho;Cheska Lipa;CZECH REPUBLIC
Hans-Peter Feiertag;Metro-Strasse;Duesseldorf 4;GERMANY
Michal Dabrowski;Marszalkowska;Warsaw;POLAND
Boris Nikolaevich Elcin;Kazanskaya Pl.;Sankt-Peterburg;RUSSIAN FEDERATION

Now the transcription corresponds with pronunciation.

Enrichment example

If you have proper database, you can enrich addresses with data you don’t know, e.g. ZIP code, geocoding, certification status.

Let’s consider following addresses:

To validate an address according to the Canada Post SERP rules we would need certified database for Canada. We need to set in Configuration as well as in Parameters:

and

After AddressDoctor transformation we get:

Count ResultNumber ElementItem Country Province Locality PostalCode Street Number SERPStatus SERPCategory ResultProcessStatus ResultModeUsed ResultPreferredScript ResultPreferredLanguage ResultDataMailabilityScore ResultDataElementResultStatus ResultDataElementInputStatus ResultDataElementRelevance
1 1 1 CANADA ON TORONTO M4P 3J5 YONGE STREET 2384 ESE1 C C3 CERTIFIED ASCII_SIMPLIFIED DATABASE 3 80F0F0F0F000004000E0 00606060600000200060 10101010100000100010
1 1 1 CANADA SK LA RONGE FINLAYSON ST 108 ESE1 N I1 CERTIFIED ASCII_SIMPLIFIED DATABASE 0 000000000000000000E0 00601010100000000060 00000000000000000000
1 1 1 CANADA SK PRINCE ALBERT S6V 0C7 15TH ST E 801 ESE1 C C4 CERTIFIED ASCII_SIMPLIFIED DATABASE 4 80F0F0F0F00000F000E0 00606060600000600060 10101010100000100010

Note the fields:

  • PostalCode – for recognized addresses we get valid postal code
  • SERPCategory :
    • C: Corrected
    • N: Incorrect
  • ResultProcessStatus and ResultDataMailabilityScore:
    • C3 (3) : Corrected – but some elements could not be checked (should be fine)
    • I1 (0) : Data could not be corrected and is pretty unlikely to be delivered (futile)
    • C4 (4) : Corrected – all (postally relevant) elements have been checked (almost certain)

Download the transformation graph with data

Processing Heterogeneous Data with ComplexDataReader

ComplexDataReader – Example How-to

ComplexDataReader is a new component for reading heterogeneous data (data which contains multiple types of records that can also depend on each other) without the need of hard coding. Instead, the component is driven by a state machine which can be set-up using the GUI.

The following example will present some of the capabilities of ComplexDataReader, as well as guide you through the design of a simple automaton, which is used for processing a text file containing two types of shipments grouped into batches. Each batch starts with a batch header; the number of items in a batch is variable and it is part of the header.

Input Data

Input Data for ComplexDataReader

What We Want to Achieve

For every parcel and every letter, send to the output the address and the charge to the output, also add the batch ID, customer ID, and the date from the respective batch header.

The first element of a batch header determines the type of its elements, and the third element contains the number of items in the batch.

Preparation

Before starting the configuration of the component, all the required metadata should be defined. Also, the component should be connected to the succeeding component(s) and the output edge(s) should have metadata assigned.

You may also set the “File URL” property of the component to point to the input file.

Internal metadata (used for parsing the input):

Batch Metadata

Parcel Metadata

Letter Metadata

Output metadata (used for output mapping):

Shipment Metadata

ComplexDataReader Configuration

First, we have to design an automaton, which will guide the component through parsing the input. The automaton may look like this:

ComplexDataReader Automaton

The idea behind it is that we start by reading a batch header, therefore the initial state is set to “$0 – Batch”. Then we can decide, depending on the value of the “type” field, whether to proceed to “$1 – Letter” or “$2 – Parcel”. In either of these states, we read as many records as specified in the “count” field of the previous batch header, then return to “$0 – Batch” and expect a new batch header.

To start building the automaton, open the configuration dialog by double clicking the component and then its “Transform” property.

Create three states by dragging the “batch”, “letter” and “parcel” metadata, respectively, from the list of Available Metadata on the left to the list of States on the right. You can also edit the labels of the states. Set the Initial state to “State $0″ by selecting it from the drop-down list.

Optionally, you may switch to the Overview tab and press the Undock button to get an interactive overview of the automaton being built.

Switch to the State $0 tab. This state represents a new batch. Set the automaton to reset the counters for state $1 and $2 by pressing the Actions button and ticking Reset counter for “State $1″ and “State $2″. Add two rows to the Transition table. Set the condition of the first row to $batch.type == "LETTERS" and the condition of the second row to $batch.type == "PARCELS". Set their target states to “State $1″ and “State $2″, respectively. You may also set the target of the default transition to Fail to detect unexpected batch types.

Note that in state $0, no output mapping is defined; hence no data will be sent to the output.

The configuration of state $1 and $2 will be very similar. In these states we want to produce output, therefore we have to define output mapping. For example, in state $1 we need to send to the output “address” and “charge” fields from internal record $1 (last letter record) and “batchID”, “customerID” and “date” from internal record $0 (last batch header record).

For state $1, define Output mapping by dragging row “$1″ from the left table onto “Port 0″ in the right table. Then expand row $0 on the left and Port 0 on the right and drag “batchID”, “customerID” and “date” from the left onto “$0.batchID”, “$0.customerID” and “$0.date” on the right, respectively.

Add one row to the Transition table and set its condition to counter1 < $batch.count and its target to “State $1″. Also set the target of the default transition to “State $0″.

Letter State

Similarly, for state $2, drag row “$2″ onto “Port 0″ and “batchID”, “customerID” and “date” from row $0 onto “$0.batchID”, “$0.customerID” and “$0.date”. Add one row to the Transition table and set its condition to counter2 < $batch.count and its target to “State $2″. Again, set the target of the default transition to “State $0″.

Parcel State

Download the transformation graph with data

Speed-up Installation of Plugins in Eclipse

Installing plug-ins into Eclipse can in some situations take a long time. This can also affect users of CloverETL Designer if they choose the Online or Offline Eclipse Plugin Installation download type. This blog post describes a workaround for the slow install process.

The reason for the long installation time is that by default Eclipse contacts all available update sites to try to resolve dependencies of the plugin being installed. There can be a large number of update sites, some can be not responding or slow and overall the connection can be bad. To disable contacting of all update sites, uncheck the “Contact all update sites during install to find required software” checkbox in the installation dialog:

Contact all update sites checkbox

This workaround can help not only when installing CloverETL Designer, but also when using Eclipse in general. However, it does have a drawback – some dependencies of the plugin being installed might not be resolved. This can also happen when installing CloverETL Designer, because it depends on the GEF and RSE plugins. The plugins are found in the main Eclipse update site which would not be contacted when using the described workaround. Eclipse will detect that some dependecies of CloverETL Designer are not met and will not proceed with the installation:

CloverETL dependencies not met

In case some dependencies are not resolved, there are 2 options:

  •  find and install the dependencies manually (in case of GEF and RSE they can be found in the main Eclipse update site)
  •  accept the long installation time and enable the checkbox back. Eclipse will resolve the dependencies automatically

Hopefully this hint will help some users of CloverETL Designer or Eclipse with the slow installation. However it’s important to understand that it’s NOT mandatory to use the workaround as the installation is quick in many cases – use it only in case of issues.

Handling of JSON Objects in CloverETL

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is language independent but uses conventions that are familiar to programmers. This format is often used for serializing and transmitting structured data over a network connection. It is primarily used to transmit data between a server and a web application, serving as an alternative to XML. CloverETL doesn’t have any JSONReader/Writer components, but has numerous components that can handle XML files. So all you need to do is to convert the JSON structure to an XML one or vice versa.  If you have a XSLT 1.0 stylesheet that transforms XML file into a JSON object, you can use the XSLTransformer component. I downloaded xml2json-xslt and created a transformation graph with only one component:

That converts an XML file into a JSON structure.

Another solution is to use JSON Java API and create a transformation in Java as an attribute of Reformat component. Then simple transformations need to be written:

  • To transform XML to JSON:

import org.jetel.component.DataRecordTransform;
import org.jetel.data.DataRecord;
import org.jetel.exception.ComponentNotReadyException;
import org.jetel.exception.TransformException;
import org.json.JSONException;
import org.json.XML;

public class XML2JSON extends DataRecordTransform {

    int counter;

    @Override
    public void preExecute() throws ComponentNotReadyException {
        super.preExecute();
        counter = 1;
    }

    @Override
    public int transform(DataRecord[] inputRecords, DataRecord[] outputRecords)
            throws TransformException {
        try {
            outputRecords[0].getField(0).setValue(XML.toJSONObject(inputRecords[0].getField(0).toString()).toString());
        } catch (JSONException e) {
            throw new TransformException("Can't convert XML to JSON.", e, counter, 0);
        }
        counter++;
        return 0;
    }

}
  • To transform JSON to XML:

import org.jetel.component.DataRecordTransform;
import org.jetel.data.DataRecord;
import org.jetel.exception.ComponentNotReadyException;
import org.jetel.exception.TransformException;
import org.json.JSONException;
import org.json.JSONObject;
import org.json.XML;

public class JSON2XML extends DataRecordTransform {
    int counter;

    @Override
    public void preExecute() throws ComponentNotReadyException {
        super.preExecute();
        counter = 1;
    }

    @Override
    public int transform(DataRecord[] inputRecords, DataRecord[] outputRecords)
            throws TransformException {
        try {
            outputRecords[0].getField(0).setValue(XML.toString(new JSONObject(inputRecords[0].getField(0).toString())));
        } catch (JSONException e) {
            throw new TransformException("Can't convert XML to JSON.", e, counter, 0);
        }
        counter++;
        return 0;
    }

}

Since CloverETL 3.1 the implementation above is going to be embedded into CloverETL Engine in the form of xml2json and json2xml ctl functions. Then your transformations are easy indeed:

  • To transform XML to JSON:

  • To transform JSON to XML:

Above transformation converts following XML structure:

<?xml version="1.0" encoding="ISO-8859-1"?>
<employees>
    <employee>
        <empID>1</empID>
        <jobID>3</jobID>
           <salary>5000</salary>
        <name>
            <firstname>Mark</firstname>
            <surname>Fish</surname>
        </name>
        <child>
            <chname>Ann</chname>
            <age>4</age>
        </child>
        <child>
            <chname>Mark</chname>
            <age>6</age>
        </child>
        <benefits>
            <car>none</car>
            <mobilephone>yes</mobilephone>
            <financial>
                <monthly_bonus>1000</monthly_bonus>
                <yearly_bonus>0</yearly_bonus>    
            </financial>
        </benefits>
        <project>
            <projName>JSP</projName>
            <projManager>John Smith</projManager>
            <inProjectID>34</inProjectID>
            <Start>06062006</Start>
            <End>in progress</End>
            <customer>
                            <name>Sunny</name>
            </customer>
            <customer>
                <name>Weblea</name>
            </customer>
        </project>
        <project>
            <projName>Data warehouse</projName>
            <projManager>John Major</projManager>
            <inProjectID>51</inProjectID>
            <Start>01062005</Start>
            <End>31052006</End>
            <customer>
                <name>Hanuman</name>
            </customer>
            <customer>
                <name>Weblea</name>
            </customer>
            <customer>
                <name>SomeBank</name>
            </customer>
        </project>
    </employee>
</employees>

into the following JSON object (needs to be formatted manually):

{"employees":
    {"employee":{
        "child":[
            {"chname":"Ann","age":4},
            {"chname":"Mark","age":6}
        ],
        "project":[{
            "End":"in progress",
            "projManager":"John Smith",
            "Start":6062006,
            "inProjectID":34,
            "projName":"JSP",
            "customer":[
                {"name":"Sunny"},
                {"name":"Weblea"}
            ]},{
            "End":31052006,
            "projManager":"John Major",
            "Start":1062005,
            "inProjectID":51,
            "projName":"Data warehouse",
            "customer":[
                {"name":"Hanuman"},
                {"name":"Weblea"},
                {"name":"SomeBank"}
            ]}
        ],
        "jobID":3,
        "empID":1,
        "name":
            {"surname":"Fish","firstname":"Mark"},
        "benefits":{
            "financial":
                {"monthly_bonus":1000,"yearly_bonus":"0"},
            "car":"none",
            "mobilephone":"yes"},
        "salary":5000}
    }
}

Download the example transformation graph

Usage of Internal and External Graph Elements

GRAPH ELEMENTS

In addition to components, edges, notes, etc., any CloverETL graphs may contain the following graph elements:

  • Metadata
  • Connections
  • Lookup tables
  • Sequences
  • Parameters

Each of these graph elements may be either created and written in the graph XML source file, or specified in an external file and linked to the graph. Such elements are called internal, or external, respectively. In case of external elements, graph XML file only contains a link to such file with graph element definition.

Internal Graph Elements

Graph XML file can contain internal graph elements that may look like those in the following examples:

 <Metadata id="Metadata1"> <Record fieldDelimiter="|" name="LUT" recordDelimiter="rn" type="delimited"> <Field name="field1" type="integer"/> <Field name="field2" type="string"/> </Record> </Metadata> <Connection database="POSTGRE" dbURL="jdbc:postgresql://hostname/database" id="JDBC0" jdbcSpecific="POSTGRE" name="NewConnection" password="mypassword" type="JDBC" user="username"/> <LookupTable charset="ISO-8859-1" id="LookupTable0" initialSize="512" key="field1" keyDuplicates="true" metadata="Metadata1" name="simpleLookup0" type="simpleLookup"/> <Sequence cached="1" fileURL="${SEQ_DIR}/seq_withdata.txt" id="Sequence0" name="seq" start="1" step="1" type="SIMPLE_SEQUENCE"/> <Property id="GraphParameter1" name="NUMBER" value="6"/>

(In addition to such internal graph elements, graph contains a link to the workspace.prm file defining project parameters – “CONN_DIR”, “META_DIR”, etc.)

<Property fileURL="workspace.prm" id="GraphParameter0"/>

The resulting Outline pane looks like this:

Internal Graph Elements

External Graph Elements

When you want to link an external graph element, you need to right-click any of the five categories (Metadata, Connections, Parameters, Sequences, or Lookups) in the Outline pane and select in the context menu:

  • New metadata → Link shared definition
  • Connections → Link DB connection, Link JMS connection, or Link QuickBase connection
  • Parameters → Link parameter file
  • Sequences → Link shared sequence
  • Lookup tables → Link shared lookup table

After that, the URL Dialog opens.

URL Dialog

The dialog consists of three tabs: Workspace view, Local files, and Remote files tabs. They serve for browsing local files within your workspace, outside your workspace, or files located on a remote computer, respectively.

Files with graph elements are located in the same project as the graph itself.

To browse your project, you need to use the Workspace view tab of the URL Dialog.

The workspace.prm file in the project itself contains the definition of parameters for the conn, meta, lookup, seq directories (among others). They are designed as “CONN_DIR”, “META_DIR”, “SEQ_DIR”, and “LOOKUP_DIR”, respectively. The value of each of these parameters is resolved to its value using a dollar and curly brackets. For example, “${META_DIR}”.

Once you link a file containing graph element definition located in the project, the “_DIR” termination of these names assures that the path to such files located in the conn, meta, lookup, or seq directories is automatically replaced with one of these parameters.

The graph XML file contains the following links to metadata, db connection, simple sequence, simple lookup table, and parameter files:

<Metadata fileURL="${META_DIR}/LUT.fmt" id="Metadata1"/> <Connection dbConfig="${CONN_DIR}/NewConnection.cfg" id="JDBC0" type="JDBC"/> <LookupTable id="LookupTable0" lookupConfig="${LOOKUP_DIR}/simpleLookup0.cfg"/> <Sequence id="Sequence0" seqConfig="${SEQ_DIR}/seq.cfg"> <attr name="type"><![CDATA[SIMPLE_SEQUENCE]]></attr> </Sequence> <Property fileURL="parameters.prm" id="GraphParameter1"/>

(In addition to the mentioned graph elements, graph contains a link to the workspace.prm file defining project parameters – “CONN_DIR”, “META_DIR”, etc.)

<Property fileURL="workspace.prm" id="GraphParameter0"/>

The resulting Outline pane looks like this:

External Graph Elements

Files with graph elements are located outside the project containing the graph.

To locate external graph elements OUTSIDE the project, you need to use the Local files tab of the URL Dialog. It allows to browse the file system of your local computer.

You can also define a parameter for the whole path with the mentioned “_DIR” termination. If you have defined such parameter (e.g, “PATH_TO_GRAPH_ELEMENTS_DIR”), the whole path will automatically be replaced with this parameter (as ${PATH_TO_GRAPH_ELEMENTS_DIR}).

Remember that you need to define such parameter BEFORE you link a graph element in the specified location, otherwise, the whole path will not be replaced with the parameter name!

The whole path to the graph elements is specified in the graph XML file. The graph XML file contains the following links to metadata, db connection, simple sequence, simple lookup table, and parameter files:

<Metadata fileURL="D:/ExternalGraphElements/meta/HashJoinInput.fmt" id="Metadata0"/> <Metadata fileURL="${PATH_TO_GRAPH_ELEMENTS_DIR}/meta/LUT.fmt" id="Metadata1"/> <Connection dbConfig="${PATH_TO_GRAPH_ELEMENTS_DIR}/NewConnection.cfg" id="JDBC0" type="JDBC"/> <LookupTable id="LookupTable0" lookupConfig="${PATH_TO_GRAPH_ELEMENTS_DIR}/simpleLookup0.cfg"/> <Sequence id="Sequence0" seqConfig="${_TO_GRAPH_ELEMENTS_DIR}/seq.cfg"> <attr name="type"><![CDATA[SIMPLE_SEQUENCE]]></attr> </Sequence> <Property fileURL="parameters.prm" id="GraphParameter1"/>

The last mentioned parameters.prm file defines two parameters:

NUMBER, whose value is 6

and

PATH_TO_GRAPH_ELEMENTS_DIR, whose value is D:/ExternalGraphElements.

(In addition to the mentioned graph elements, graph contains a link to the workspace.prm file defining project parameters – “CONN_DIR”, “META_DIR”, etc.)

<Property fileURL="workspace.prm" id="GraphParameter0"/>

Note:

Note that the value of the PATH_TO_GRAPH_ELEMENTS_DIR parameter is D:/ExternalGraphElements.

  • The first external metadata element (with ”Metadata0” id) was linked BEFORE the PATH_TO_GRAPH_ELEMENTS_DIR parameter was defined. The whole path was NOT converted into ${PATH_TO_GRAPH_ELEMENTS}/…
  • But the next elements (with “Metadata1” id, and all of the other elements) were automatically converted to the ${PATH_TO_GRAPH_ELEMENTS_DIR}/… AS SOON AS they were linked to the graph.

The resulting Outline pane looks like this:

External Outside Project

Files with graph elements are located on a remote computer.

To locate the files on remote computer, you need to use the Remote files tab of the URL Dialog. It allows to specify details of the remote computer file system.

To connect a remote computer, click the Create/Edit URL button at the right side from the Server URL combobox. In the Edit URL Dialog that opens, you need to specify all the authentication details:

Edit URL Dialog

The general structure of a remote path is:

<protocol>://<username>:<password>@<hostname|IP>:<portnumber>/<pathtoexternalelements>

Supported protocols are: http, https, ftp, ftps, sftp. The first two do not allow browsing the remote file system, whereas the other four allow it.

The whole path to the graph elements is specified in the graph XML file. The graph XML file contains the following links to metadata, db connection, simple sequence, simple lookup table, and parameter files:

<Metadata fileURL="sftp://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/meta/LUT.fmt" id="Metadata1"/>

<Connection dbConfig="sftp://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/conn/NewConnection.cfg" id="JDBC0" type="JDBC"/>

<LookupTable id="LookupTable0"
lookupConfig="sftp://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/lookup/simpleLookup0.cfg"/>

<Sequence id="Sequence0" seqConfig="://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/seq/seq.cfg">
<attr name="type"><![CDATA[SIMPLE_SEQUENCE]]></attr>
</Sequence>

<Property fileURL="parameters.prm" id="GraphParameter1"/>

The last mentioned parameters.prm file defines two parameters:

NUMBER, whose value is 6

and

PATH_TO_REMOTE_GRAPH_ELEMENTS_DIR, whose value is sftp://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/ExternalGraphElements.

(In addition to the mentioned graph elements, graph contains a link to the workspace.prm file defining project parameters – “CONN_DIR”, “META_DIR”, etc.)

<Property fileURL="workspace.prm" id="GraphParameter0"/> 

The resulting Outline pane without parameter usage looks like this:

Remote Graph Elements

Note:

Remember that the paths to linked external elements located on a remote computer do NOT use automatically the ${PATH_TO_REMOTE_GRAPH_ELEMENTS_DIR} value instead of sftp://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/ExternalGraphElements.

You may switch to the Source tab of your graph and replace the sftp://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/ExternalGraphElements with ${PATH_TO_REMOTE_GRAPH_ELEMENTS_DIR} by hand.

Thus, after replacing the paths in the Source tab of the Graph editor with the value of graph parameter, the Outline pane looks like this:

Remote Graph Elements With Parameter

COMPARISON OF INTERNAL AND EXTERNAL GRAPH ELEMENTS

All the graph elements, both the internal and the external, may be converted into the other form. Any internal element may become an external one, and vice versa.

For more details consult our documentation.

VARIOUS FORMATS OF GRAPH ELEMENTS MAY BE USED AT THE SAME TIME

Remember that you can use all forms of graph elements in a single graph: internal, external, located on local computer, and external elements, located remotely, accessible via various protocols.

Launch Services – Part 2 – Configuration

In the last blog post, you learned what the Launch Services are. In this post you will see how to configure them.

Let us study an example scenario to become acquainted with configuration. We have a database containing the highest mountains on Earth along with their heights. The user enters an elevation above sea-level and hits the enter key. The Excel sheet is then displayed listing all mountains with the given minimal elevation.

Mountains example of Launch Services

How to Configure It?

First, we must create a transformation graph that uses a dictionary to receive parameters and to store results. Create a new graph in CloverETL Designer. In the outline pane, right click on Dictionary and choose Edit. Add a new entry named heightMin: with the “As Input” field set to true, and “type” set to Integer. Then add another entry named mountains.xls of type writable.channel, content type text/csv, and “As Output” set to true.

Dictionary

Now we may build a transformation graph. Components can use a dictionary in three different ways:

  1. Via file URL: data readers and writers may specify a File URL in the format dictionary:field-name. In our example, we set a data writer File URL to dictionary:mountains.xls.URL Dialog
  2. In CTL: anywhere in CTL code, we can use an expression of type dictionary.field-name to read or write the dictionary. In our example we use Filter expression $0.heightM >= dictionary.heightMin
  3. In Java code: using methods transformationGraph.getDictionary().getValue(String fieldName) and transformationGraph.getDictionary().setValue(String fieldName, Object value)

When a transformation graph is designed and ready, we must publish it as a Launch Service. In CloverETL Server administration, go to section Launch Services and click New launch configuration. Now enter a name, a sandbox and a graph name. Then open the Detail page for the new service, and click on Edit Parameters tab. Create a new parameter with heightMin name.

CloverETL Server Interface

Now we may test it. When we click a test link, the server generates a simple form which executes a launch service. We can copy, customize and use this form in a web site.

Test Page