CloverETL's Blog

January 25, 2010

Iteration through the record fields in CTL

Filed under: Using CloverETL — Tags: , , — Vaclav Matous @ 9:52 am

Recently, I have been facing a very common problem. Imagine this scenario: I have two files – the first one with origin records and the second one with slightly modified new records. Each record had a unique key and aproximately 50 fields. My task was to compare these two files and find out how many fields in every pair differ from each other in the corresponding records.

The simplified graph can be seen in following picture:

A comparison of two records can be processed by the CloverETL DataIntersection component which joins the records with the same keys. In the joined records, you can compare fields that are not part of the key. But remember, you have to write the comparison in the CTL transformation ;-) . Of course you could write the following block of code for each pair of compared fields:

int count = 0;
if(nvl($0.field_N, '') != nvl($1.field_N, '')) {
count++;
//but typically more actions :-(
}
//and imagine this block 50times :-(
//final mapping
$0.key := $0.key;
$0.count := count;

But this solution takes too much time when you have to repeat it for many fields (approx. 50 in my case). It is also very slow, uncomfortable and increases the probability of making a mistake in your code (e.g. omitting some fields). Fortunately, CloverETL allows you to iterate through the fields of processed records! :-) The code is then more briefer and more generic:

//declaration of variables for copies of input records
record(Metadata1) myrec1;
record(Metadata1) myrec2;
function transform() {
int i = 0;
int count = 0;
//asign value of input records to local variables
myrec1 = @0; //myrec1 is a copy of a current record on input port 0
myrec2 = @1; //myrec2 is a copy of a current record on input port 1
//iterate through fields, suppose that field with index 0 is the key
for(i = 1; i < length(myrec1); i++) {
if(nvl(myrec1[i], '') != nvl(myrec2[i], '')) {
count++;
}
}
//final mapping
$0.key := $0.key;
$0.count := count;
}

Someone could object to the necessity of making copies of records. In this case I have good news. CTL in CloverETL version 2.9 introduces the possibility of iterating directly through the fields of input records.

@0[i];//i-th field of the input record on port 0

Moreover new functions for getting field names and data types are introduced in version 2.9. Personally, I am looking forward to such features that will make CTL code simplier and clearer.

August 18, 2009

Hidden features: Mutable delimiter

Filed under: Using CloverETL — Tags: , , , — Petr Uher @ 9:38 am

CloverETL provides a very useful feature: mutable delimiter. When you parse a delimited file (eg. CSV) you can specify different delimiter for each field. This isn’t surprising for daily CloverETL users however for users of other ETL tools it can be. It might not be very well known that in CloverETL you can even define more delimiters for one field (so called “mutable delimiter”) and CloverETL chooses the right one. It reveals new ways of file processing with irregular structure in CloverETL. I believe this functionality isn’t provided by any other ETL tool on the market. If I am wrong you can leave me a message in comments. I’m always happy to find “hidden features” of other ETL tools.

Syntax of a mutable delimiter: delimiters have to be separated by ‘\\|‘. For example if you want to define that field delimiter can be ‘;‘ or ‘,‘ or ‘#‘ you have to write ‘;\\|,\\|#‘.
The simple example of using a mutable delimiter you can download here as a zipped CloverETL project. The import of existing CloverETL project to your CloverETL Designer is described in CloverETL documentation.

August 11, 2009

Hidden features: Environment variables in CloverETL transformation

Filed under: Using CloverETL — Tags: , , — Petr Uher @ 12:43 pm

Using environment variables

“Environment variable is named value that can affect the way running process will behave on a computer.”

In daily praxis we usually use environment variables with different syntax depending on operation system. On UNIX-like systems we use them with the syntax: $variable_name, on DOS and Windows systems the syntax is: %variable_name%. To list the variables on UNIX-like system we can use env shell command, on DOS and Windows systems set cmd command. You can find more general information on environment variables at en.wikipedie.org.
But enough of general information. Now how we can use environment variables in CloverETL transformation? It’s very simple, you can use it in the same way as you routinely use graph parameters. So if you want to add username of the user under whom the transformation is running to your processed data, it’s nothing more than adding a new field to metadata and write following in CTL (Clover Transformation Language):

function transform(){
...
$0.username := '${USER}'; //UNIX-like systems
OR
$0.username := '${USERNAME}'; //DOS and Windows systems
...
}

But be careful, value of environment variables can contain “bad characters” (\,") that have to be escaped by ‘\‘ in CloverETL. The safest way to use env variables in CloverETL is to enclose them in quotation marks  ‘'‘.

Overwriting environment variables & priority of parameter definitions

Often it’s very helpful to use environment variables inside CloverETL transformation. But sometimes you want to define your own graph parameter with the same name as the existing environment variable has. And you may ask the question: “Is it possible?”. I answer: “Yes, it is :-) .” Because there is a hierarchy of graph parameter definitions:

  1. Parameter from external parameter file specified at the start of graph execution by -cfg option
  2. Parameter defined at the start of graph execution by -P option
  3. Parameter from external parameter file that is linked to the graph during the graph development
  4. Internal graph parameter
  5. Environment variable

Parameter definitions from the list are sorted by priority (highest to lowest). So if you have internal graph parameter with the same name as the environment variable, the value from internal parameter is always used in CloverETL.

July 23, 2009

Hidden features: Note properties

Filed under: Using CloverETL — Tags: — Petr Uher @ 1:00 am

Do you want to have CloverETL’s graphs using nice descriptive colorful notes? Just resize your note and put the components into the area of the note as you can see on the picture.

Graph with note

And now try to move the note. All components placed inside the note will move together with the note :-) .
The color and the size of note’s font can be set in a Properties view (it is usually showed in the same part of eclipse windows as Console). In the same place you can also set background color of a note.

Properties view

Blog at WordPress.com.