Speed-up Installation of Plugins in Eclipse

Installing plug-ins into Eclipse can in some situations take a long time. This can also affect users of CloverETL Designer if they choose the Online or Offline Eclipse Plugin Installation download type. This blog post describes a workaround for the slow install process.

The reason for the long installation time is that by default Eclipse contacts all available update sites to try to resolve dependencies of the plugin being installed. There can be a large number of update sites, some can be not responding or slow and overall the connection can be bad. To disable contacting of all update sites, uncheck the “Contact all update sites during install to find required software” checkbox in the installation dialog:

Contact all update sites checkbox

This workaround can help not only when installing CloverETL Designer, but also when using Eclipse in general. However, it does have a drawback – some dependencies of the plugin being installed might not be resolved. This can also happen when installing CloverETL Designer, because it depends on the GEF and RSE plugins. The plugins are found in the main Eclipse update site which would not be contacted when using the described workaround. Eclipse will detect that some dependecies of CloverETL Designer are not met and will not proceed with the installation:

CloverETL dependencies not met

In case some dependencies are not resolved, there are 2 options:

  •  find and install the dependencies manually (in case of GEF and RSE they can be found in the main Eclipse update site)
  •  accept the long installation time and enable the checkbox back. Eclipse will resolve the dependencies automatically

Hopefully this hint will help some users of CloverETL Designer or Eclipse with the slow installation. However it’s important to understand that it’s NOT mandatory to use the workaround as the installation is quick in many cases – use it only in case of issues.

Handling of JSON Objects in CloverETL

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is language independent but uses conventions that are familiar to programmers. This format is often used for serializing and transmitting structured data over a network connection. It is primarily used to transmit data between a server and a web application, serving as an alternative to XML. CloverETL doesn’t have any JSONReader/Writer components, but has numerous components that can handle XML files. So all you need to do is to convert the JSON structure to an XML one or vice versa.  If you have a XSLT 1.0 stylesheet that transforms XML file into a JSON object, you can use the XSLTransformer component. I downloaded xml2json-xslt and created a transformation graph with only one component:

That converts an XML file into a JSON structure.

Another solution is to use JSON Java API and create a transformation in Java as an attribute of Reformat component. Then simple transformations need to be written:

  • To transform XML to JSON:

import org.jetel.component.DataRecordTransform;
import org.jetel.data.DataRecord;
import org.jetel.exception.ComponentNotReadyException;
import org.jetel.exception.TransformException;
import org.json.JSONException;
import org.json.XML;

public class XML2JSON extends DataRecordTransform {

    int counter;

    @Override
    public void preExecute() throws ComponentNotReadyException {
        super.preExecute();
        counter = 1;
    }

    @Override
    public int transform(DataRecord[] inputRecords, DataRecord[] outputRecords)
            throws TransformException {
        try {
            outputRecords[0].getField(0).setValue(XML.toJSONObject(inputRecords[0].getField(0).toString()).toString());
        } catch (JSONException e) {
            throw new TransformException("Can't convert XML to JSON.", e, counter, 0);
        }
        counter++;
        return 0;
    }

}
  • To transform JSON to XML:

import org.jetel.component.DataRecordTransform;
import org.jetel.data.DataRecord;
import org.jetel.exception.ComponentNotReadyException;
import org.jetel.exception.TransformException;
import org.json.JSONException;
import org.json.JSONObject;
import org.json.XML;

public class JSON2XML extends DataRecordTransform {
    int counter;

    @Override
    public void preExecute() throws ComponentNotReadyException {
        super.preExecute();
        counter = 1;
    }

    @Override
    public int transform(DataRecord[] inputRecords, DataRecord[] outputRecords)
            throws TransformException {
        try {
            outputRecords[0].getField(0).setValue(XML.toString(new JSONObject(inputRecords[0].getField(0).toString())));
        } catch (JSONException e) {
            throw new TransformException("Can't convert XML to JSON.", e, counter, 0);
        }
        counter++;
        return 0;
    }

}

Since CloverETL 3.1 the implementation above is going to be embedded into CloverETL Engine in the form of xml2json and json2xml ctl functions. Then your transformations are easy indeed:

  • To transform XML to JSON:

  • To transform JSON to XML:

Above transformation converts following XML structure:

<?xml version="1.0" encoding="ISO-8859-1"?>
<employees>
    <employee>
        <empID>1</empID>
        <jobID>3</jobID>
           <salary>5000</salary>
        <name>
            <firstname>Mark</firstname>
            <surname>Fish</surname>
        </name>
        <child>
            <chname>Ann</chname>
            <age>4</age>
        </child>
        <child>
            <chname>Mark</chname>
            <age>6</age>
        </child>
        <benefits>
            <car>none</car>
            <mobilephone>yes</mobilephone>
            <financial>
                <monthly_bonus>1000</monthly_bonus>
                <yearly_bonus>0</yearly_bonus>    
            </financial>
        </benefits>
        <project>
            <projName>JSP</projName>
            <projManager>John Smith</projManager>
            <inProjectID>34</inProjectID>
            <Start>06062006</Start>
            <End>in progress</End>
            <customer>
                            <name>Sunny</name>
            </customer>
            <customer>
                <name>Weblea</name>
            </customer>
        </project>
        <project>
            <projName>Data warehouse</projName>
            <projManager>John Major</projManager>
            <inProjectID>51</inProjectID>
            <Start>01062005</Start>
            <End>31052006</End>
            <customer>
                <name>Hanuman</name>
            </customer>
            <customer>
                <name>Weblea</name>
            </customer>
            <customer>
                <name>SomeBank</name>
            </customer>
        </project>
    </employee>
</employees>

into the following JSON object (needs to be formatted manually):

{"employees":
    {"employee":{
        "child":[
            {"chname":"Ann","age":4},
            {"chname":"Mark","age":6}
        ],
        "project":[{
            "End":"in progress",
            "projManager":"John Smith",
            "Start":6062006,
            "inProjectID":34,
            "projName":"JSP",
            "customer":[
                {"name":"Sunny"},
                {"name":"Weblea"}
            ]},{
            "End":31052006,
            "projManager":"John Major",
            "Start":1062005,
            "inProjectID":51,
            "projName":"Data warehouse",
            "customer":[
                {"name":"Hanuman"},
                {"name":"Weblea"},
                {"name":"SomeBank"}
            ]}
        ],
        "jobID":3,
        "empID":1,
        "name":
            {"surname":"Fish","firstname":"Mark"},
        "benefits":{
            "financial":
                {"monthly_bonus":1000,"yearly_bonus":"0"},
            "car":"none",
            "mobilephone":"yes"},
        "salary":5000}
    }
}

Download the example transformation graph

Usage of Internal and External Graph Elements

GRAPH ELEMENTS

In addition to components, edges, notes, etc., any CloverETL graphs may contain the following graph elements:

  • Metadata
  • Connections
  • Lookup tables
  • Sequences
  • Parameters

Each of these graph elements may be either created and written in the graph XML source file, or specified in an external file and linked to the graph. Such elements are called internal, or external, respectively. In case of external elements, graph XML file only contains a link to such file with graph element definition.

Internal Graph Elements

Graph XML file can contain internal graph elements that may look like those in the following examples:

 <Metadata id="Metadata1"> <Record fieldDelimiter="|" name="LUT" recordDelimiter="rn" type="delimited"> <Field name="field1" type="integer"/> <Field name="field2" type="string"/> </Record> </Metadata> <Connection database="POSTGRE" dbURL="jdbc:postgresql://hostname/database" id="JDBC0" jdbcSpecific="POSTGRE" name="NewConnection" password="mypassword" type="JDBC" user="username"/> <LookupTable charset="ISO-8859-1" id="LookupTable0" initialSize="512" key="field1" keyDuplicates="true" metadata="Metadata1" name="simpleLookup0" type="simpleLookup"/> <Sequence cached="1" fileURL="${SEQ_DIR}/seq_withdata.txt" id="Sequence0" name="seq" start="1" step="1" type="SIMPLE_SEQUENCE"/> <Property id="GraphParameter1" name="NUMBER" value="6"/>

(In addition to such internal graph elements, graph contains a link to the workspace.prm file defining project parameters – “CONN_DIR”, “META_DIR”, etc.)

<Property fileURL="workspace.prm" id="GraphParameter0"/>

The resulting Outline pane looks like this:

Internal Graph Elements

External Graph Elements

When you want to link an external graph element, you need to right-click any of the five categories (Metadata, Connections, Parameters, Sequences, or Lookups) in the Outline pane and select in the context menu:

  • New metadata → Link shared definition
  • Connections → Link DB connection, Link JMS connection, or Link QuickBase connection
  • Parameters → Link parameter file
  • Sequences → Link shared sequence
  • Lookup tables → Link shared lookup table

After that, the URL Dialog opens.

URL Dialog

The dialog consists of three tabs: Workspace view, Local files, and Remote files tabs. They serve for browsing local files within your workspace, outside your workspace, or files located on a remote computer, respectively.

Files with graph elements are located in the same project as the graph itself.

To browse your project, you need to use the Workspace view tab of the URL Dialog.

The workspace.prm file in the project itself contains the definition of parameters for the conn, meta, lookup, seq directories (among others). They are designed as “CONN_DIR”, “META_DIR”, “SEQ_DIR”, and “LOOKUP_DIR”, respectively. The value of each of these parameters is resolved to its value using a dollar and curly brackets. For example, “${META_DIR}”.

Once you link a file containing graph element definition located in the project, the “_DIR” termination of these names assures that the path to such files located in the conn, meta, lookup, or seq directories is automatically replaced with one of these parameters.

The graph XML file contains the following links to metadata, db connection, simple sequence, simple lookup table, and parameter files:

<Metadata fileURL="${META_DIR}/LUT.fmt" id="Metadata1"/> <Connection dbConfig="${CONN_DIR}/NewConnection.cfg" id="JDBC0" type="JDBC"/> <LookupTable id="LookupTable0" lookupConfig="${LOOKUP_DIR}/simpleLookup0.cfg"/> <Sequence id="Sequence0" seqConfig="${SEQ_DIR}/seq.cfg"> <attr name="type"><![CDATA[SIMPLE_SEQUENCE]]></attr> </Sequence> <Property fileURL="parameters.prm" id="GraphParameter1"/>

(In addition to the mentioned graph elements, graph contains a link to the workspace.prm file defining project parameters – “CONN_DIR”, “META_DIR”, etc.)

<Property fileURL="workspace.prm" id="GraphParameter0"/>

The resulting Outline pane looks like this:

External Graph Elements

Files with graph elements are located outside the project containing the graph.

To locate external graph elements OUTSIDE the project, you need to use the Local files tab of the URL Dialog. It allows to browse the file system of your local computer.

You can also define a parameter for the whole path with the mentioned “_DIR” termination. If you have defined such parameter (e.g, “PATH_TO_GRAPH_ELEMENTS_DIR”), the whole path will automatically be replaced with this parameter (as ${PATH_TO_GRAPH_ELEMENTS_DIR}).

Remember that you need to define such parameter BEFORE you link a graph element in the specified location, otherwise, the whole path will not be replaced with the parameter name!

The whole path to the graph elements is specified in the graph XML file. The graph XML file contains the following links to metadata, db connection, simple sequence, simple lookup table, and parameter files:

<Metadata fileURL="D:/ExternalGraphElements/meta/HashJoinInput.fmt" id="Metadata0"/> <Metadata fileURL="${PATH_TO_GRAPH_ELEMENTS_DIR}/meta/LUT.fmt" id="Metadata1"/> <Connection dbConfig="${PATH_TO_GRAPH_ELEMENTS_DIR}/NewConnection.cfg" id="JDBC0" type="JDBC"/> <LookupTable id="LookupTable0" lookupConfig="${PATH_TO_GRAPH_ELEMENTS_DIR}/simpleLookup0.cfg"/> <Sequence id="Sequence0" seqConfig="${_TO_GRAPH_ELEMENTS_DIR}/seq.cfg"> <attr name="type"><![CDATA[SIMPLE_SEQUENCE]]></attr> </Sequence> <Property fileURL="parameters.prm" id="GraphParameter1"/>

The last mentioned parameters.prm file defines two parameters:

NUMBER, whose value is 6

and

PATH_TO_GRAPH_ELEMENTS_DIR, whose value is D:/ExternalGraphElements.

(In addition to the mentioned graph elements, graph contains a link to the workspace.prm file defining project parameters – “CONN_DIR”, “META_DIR”, etc.)

<Property fileURL="workspace.prm" id="GraphParameter0"/>

Note:

Note that the value of the PATH_TO_GRAPH_ELEMENTS_DIR parameter is D:/ExternalGraphElements.

  • The first external metadata element (with ”Metadata0” id) was linked BEFORE the PATH_TO_GRAPH_ELEMENTS_DIR parameter was defined. The whole path was NOT converted into ${PATH_TO_GRAPH_ELEMENTS}/…
  • But the next elements (with “Metadata1” id, and all of the other elements) were automatically converted to the ${PATH_TO_GRAPH_ELEMENTS_DIR}/… AS SOON AS they were linked to the graph.

The resulting Outline pane looks like this:

External Outside Project

Files with graph elements are located on a remote computer.

To locate the files on remote computer, you need to use the Remote files tab of the URL Dialog. It allows to specify details of the remote computer file system.

To connect a remote computer, click the Create/Edit URL button at the right side from the Server URL combobox. In the Edit URL Dialog that opens, you need to specify all the authentication details:

Edit URL Dialog

The general structure of a remote path is:

<protocol>://<username>:<password>@<hostname|IP>:<portnumber>/<pathtoexternalelements>

Supported protocols are: http, https, ftp, ftps, sftp. The first two do not allow browsing the remote file system, whereas the other four allow it.

The whole path to the graph elements is specified in the graph XML file. The graph XML file contains the following links to metadata, db connection, simple sequence, simple lookup table, and parameter files:

<Metadata fileURL="sftp://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/meta/LUT.fmt" id="Metadata1"/>

<Connection dbConfig="sftp://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/conn/NewConnection.cfg" id="JDBC0" type="JDBC"/>

<LookupTable id="LookupTable0"
lookupConfig="sftp://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/lookup/simpleLookup0.cfg"/>

<Sequence id="Sequence0" seqConfig="://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/seq/seq.cfg">
<attr name="type"><![CDATA[SIMPLE_SEQUENCE]]></attr>
</Sequence>

<Property fileURL="parameters.prm" id="GraphParameter1"/>

The last mentioned parameters.prm file defines two parameters:

NUMBER, whose value is 6

and

PATH_TO_REMOTE_GRAPH_ELEMENTS_DIR, whose value is sftp://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/ExternalGraphElements.

(In addition to the mentioned graph elements, graph contains a link to the workspace.prm file defining project parameters – “CONN_DIR”, “META_DIR”, etc.)

<Property fileURL="workspace.prm" id="GraphParameter0"/> 

The resulting Outline pane without parameter usage looks like this:

Remote Graph Elements

Note:

Remember that the paths to linked external elements located on a remote computer do NOT use automatically the ${PATH_TO_REMOTE_GRAPH_ELEMENTS_DIR} value instead of sftp://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/ExternalGraphElements.

You may switch to the Source tab of your graph and replace the sftp://smithjohn:1a2b3c@192.168.1.12/home/smithjohn/ExternalGraphElements with ${PATH_TO_REMOTE_GRAPH_ELEMENTS_DIR} by hand.

Thus, after replacing the paths in the Source tab of the Graph editor with the value of graph parameter, the Outline pane looks like this:

Remote Graph Elements With Parameter

COMPARISON OF INTERNAL AND EXTERNAL GRAPH ELEMENTS

All the graph elements, both the internal and the external, may be converted into the other form. Any internal element may become an external one, and vice versa.

For more details consult our documentation.

VARIOUS FORMATS OF GRAPH ELEMENTS MAY BE USED AT THE SAME TIME

Remember that you can use all forms of graph elements in a single graph: internal, external, located on local computer, and external elements, located remotely, accessible via various protocols.

Launch Services – Part 2 – Configuration

In the last blog post, you learned what the Launch Services are. In this post you will see how to configure them.

Let us study an example scenario to become acquainted with configuration. We have a database containing the highest mountains on Earth along with their heights. The user enters an elevation above sea-level and hits the enter key. The Excel sheet is then displayed listing all mountains with the given minimal elevation.

Mountains example of Launch Services

How to Configure It?

First, we must create a transformation graph that uses a dictionary to receive parameters and to store results. Create a new graph in CloverETL Designer. In the outline pane, right click on Dictionary and choose Edit. Add a new entry named heightMin: with the “As Input” field set to true, and “type” set to Integer. Then add another entry named mountains.xls of type writable.channel, content type text/csv, and “As Output” set to true.

Dictionary

Now we may build a transformation graph. Components can use a dictionary in three different ways:

  1. Via file URL: data readers and writers may specify a File URL in the format dictionary:field-name. In our example, we set a data writer File URL to dictionary:mountains.xls.URL Dialog
  2. In CTL: anywhere in CTL code, we can use an expression of type dictionary.field-name to read or write the dictionary. In our example we use Filter expression $0.heightM >= dictionary.heightMin
  3. In Java code: using methods transformationGraph.getDictionary().getValue(String fieldName) and transformationGraph.getDictionary().setValue(String fieldName, Object value)

When a transformation graph is designed and ready, we must publish it as a Launch Service. In CloverETL Server administration, go to section Launch Services and click New launch configuration. Now enter a name, a sandbox and a graph name. Then open the Detail page for the new service, and click on Edit Parameters tab. Create a new parameter with heightMin name.

CloverETL Server Interface

Now we may test it. When we click a test link, the server generates a simple form which executes a launch service. We can copy, customize and use this form in a web site.

Test Page

Versatility of the File URL Attribute Used in Readers of CloverETL

CloverETL allows users to read several different kinds of files. These files may have various formats, they can be located on a local or remote computer, they can be accessed through a proxy, and they can also be compressed into zip, gzip, or tar archives. Users can also read data from the Console, from an input Port, or from a selected Dictionary entry.

A File URL must be specified using the URL dialog.

File URL Dialog

  • Workspace view tab serves to specify files within the workspace independently of whether the workspace belongs to CloverETL Project or CloverETL Server Project.
  • Local files tab displays the file structure of a local computer.
  • Remote files tab displays the file structure of a remote computer. It allows the user to specify the protocol, username, password, port, server, proxy, username for proxy, password for proxy, identification of proxy server, and the port for the proxy. After specifying these properties, the file structure of the remote computer is displayed (except for http and https protocols).
  • Port tab displays the input fields that are string, byte, or cbyte data types and allows the user to select one of them and also select a processing type from a combo box
  • Dictionary tab displays declared dictionary entries and allows to select one of them and choose the processing type from a combo

Now I will present the list of supported values of the File URL attribute.

Local Files (without compression)

Examples:
/path/file1.txt – reads one file
/path/file1.txt;/path/file2.txt – reads two files in one directory (semicolon separates files that will be read one after another)
/path1/fileA.txt;/path2/fileB.txt – reads two files in two directories (semicolon separates files that will be read one after another)
/path?/file*.txt – reads files in directories, when both the directories and the files must match the specified pattern.
/path/* – reads all files in the specified directory

Local Files (with compression)

Examples:
zip:(/path/file.zip) – reads the first file added to the zip archive
zip:(/path/file.zip)#innerfolder/innerfile.txt – reads the innerfile.txt contained in the innerfolder which has been compressed into the specified zip archive.
zip:(/path/file??.zip)#innerfolder?/innerfile*.txt – reads files contained in the innerfolders which have been compressed into the specified zip archives (each of these files, innerfolders, archive files must match their respective pattern)
gzip:(/path/file.gz) – reads the file compressed in the gzip archive
gzip:(/path/file??.gz) – reads the files compressed into specified gzip archives (each of these archives must match specified pattern)
tar:(/path/file.tar) – reads the first file added to the tar archive
tar:(/path/file.tar)#innerfolder/innerfile.txt – reads the innerfile.txt contained in the innerfolder which is compressed in the specified tar archive.
tar:(/path/file??.tar)#innerfolder?/innerfile*.txt – reads files contained in the innerfolders which have been compressed in the specified tar archives (each of these files, innerfolders, archive files must match their respective pattern)
zip:((zip:/path/file*.zip)#innerfolder/innerfile.zip)#innermostfolder??/innermostfile*.txt –reads innermost files contained in the innermostfolders which have been compressed into the specified innerfile zip archive contained in the innerfolder which has been compressed into the specified external zip archives (each of these innermostfiles, innermostfolders, external zip archives must match their respective pattern) Remember that innerfile.zip and innerfolder may not contain wildcards.

Remote Files  (without compression)

Unlike locally stored files, files on remote computers are accessible using a set of supported protocols. Sometimes it is also necessary to use a proxy server.

The following protocols are supported for accessing a remote server: sftp, ftp, ftps, http, https.

Access without proxy:

The structure of all remote files that are accessible directly, without a proxy, is as follows:
protocol://username:password@serverpassword@server :port/(whole|relative)path/file

Here, the whole path should be used for the sftp protocol, the other four protocols use relative paths.

Examples:
sftp://johnsmith:mypassword@myserver/home/johnsmith/relativepath/filename.txt
ftp://johnsmith:mypassword@myserver/relativepath/filename.txt
ftps://johnsmith:mypassword@myserver/relativepath/filename.txt
http://johnsmith:mypassword@myserver/relativepath/filename.txt
https://johnsmith:mypassword@myserver/relativepath/filename.txt

In the patterns shown above, username, password, and port may be ommitted if possible, whereas the other parts of such File URL are required.

Example (with username, password, and port ommitted):

http://myserver/relativepath/filename.txt

Access through proxy:

The structure of all remote files that are accessible through a proxy is as follows:
protocol:(proxy:proxyuser:proxypassword@proxyserver:proxyport)//username:password@server:port/(whole|relative)path/file

or with SOCKS V4 or V5 proxy:
protocol:(proxysocks:proxyuser:proxypassword@proxyserver:proxyport)//username:password@server:port/(whole|relative)path/file

Example:
ftp:(proxy:proxyuser:proxypassword@proxyserver:proxyport)//johnsmith:mypassword@myserver/relativepath/filename.txt

Also in this case, proxyuser, proxypassword, and proxyport can be ommitted if possible; the other parts of this pattern are required.

With SOCKS V4 or V5 proxy an example follows:
ftp:(proxysocks:proxyuser:proxypassword@proxyserver:proxyport)//johnsmith:mypassword@myserver/relativepath/filename.txt

Remote Files (with compression)

Remote File URLs may also be combined with archiving protocols in a similar manner to local File URLs.

Example:
zip:(ftp://johnsmith:mypassword@myserver/relativepath/myarchive.zip)#innerfolder/filename.txt

Wildcards may also be used in a similar way:

Example:
zip:(ftp://johnsmith:mypassword@myserver/relativepath/myarchive*.zip)#innerfolder??/filename?.txt

Note:

Remember that http and https protocols do not support wildcards in top level files or archives.

Console Input

File URL for Console input will be: – (hyphen character)

User types the input into Console after the graph starts, types data separated by field delimiters, presses Enter to specify end of records, and finishes the input after the last record by pressing Ctrl+Z.

Input Port Reading

CloverETL also supports reading incoming data through the input port of some Readers. Metadata connected to the input port must contain at least one field of string, byte, or cbyte data type. The user selects the field from which data should be read and parsed according to the output metadata. Three processing types can be selected in CloverETL:

  • discrete (the default value)
  • stream
  • source

File URL pattern is the following:

port:$0.fieldname:discrete|stream|source

Discrete processing type:

When the processing type is discrete, each record is parsed separately, according to the output metadata.

Example:
port:$0.customer:discrete

Note:
The colon and the word discrete can be ommitted.

Example:
port:$0.customer

Stream processing type:

When processing type is stream, all records are concatenated and parsed according to the output metadata. If input metadata contains a null value, this null means eof and separates groups of records. All records before such a null are concatenated, but separately from all records after such a null, which are also concatenated into another data source.

Example:
port:$0.customer:stream

Source processing type:

Example:
port:$0.file:source

When processing type is source, values of the selected field ($0.file) are valid URLs. The Reader to which such input is connected takes the file accessible with this URL and reads the contents. Metadata on the output must match the structure of the files specified with the help of these URLs.

Dictionary Entry Reading

Dictionary tab allows the selection of one of the graph dictionary entries. The processing type in the combo box should also be specified.

File URL pattern is:
dict:myentry:discrete|source

Discrete processing type:

Example:
dict:customer:discrete

Reads contents of dictionary entry whose name is customer.

Source processing type:

Example:
dict:file:source

When processing type is source, the value of the selected dictionary entry (file) is a valid URL. The reader with this File URL takes the contents of the file accessible with the help of this dictionary entry and reads the file contents. Metadata on the component’s output must match the structure of the file specified with the help of this URL.

Data sampling with CloverETL

Testing data transformations is generally not an easy task. When creating and testing a transformation you might want to get a data sample to check if your transformation works properly. In this point a question arises: How to create a representative data probe on the full data set? Obviously, the easiest way is to read just part of data from the beginning. But such data sample can be very unreliable.  I’ve prepared a few simple graphs that create a data probe which can be regarded as representative for the full data set.

All graphs were created based on the sampling methods described in the article Sampling (statistics).

Simple random sampling

In this method each record has the same probability of selection. Filtering is based on double value chosen (approximately) uniformly from the range 0.0d (inclusive) to 1.0d (exclusive): record is selected if the drawn number is lower than required sample set size:

Systematic sampling

Systematic sampling relies on arranging the data set according to some ordering scheme and then selecting elements in regular intervals through that ordered list. Systematic sampling involves a random start and then proceeds with the selection of every k-th element from then onwards:

Sorting can be disabled in this graph. Then it is selected just every k-th element from the full data set, starting from a randomly selected record from the interval [1, k].

Stratified sampling

If the data set embraces a number of distinct categories, the frame can be organized by these categories into separate strata. Each stratum is then sampled as an independent sub-population out of which individual elements can be randomly selected. At least one record from each stratum must be selected:

Probability proportional to size sampling

Probability for each record is set to be proportional to its stratum size, up to a maximum of 1. Strata are defined by the value of the selected field. For each group of records it it is used systematic sampling method:

Methods comparison

Simple random sampling method is the simplest and fastest. It is sufficient in most cases. Systematic sampling with disabled sorting is as fast as simple random sampling and produces also strongly representative data probe. The stratified sampling method is the trickiest one. It is useful only if the data set can be split into the separated groups that have reasonable sizes. In other cases the data probe is a lot of bigger than requested.

Please see the attached CloverETL project with the above graphs. It also contains the graph for comparison of samples created with different sampling methods. I’ve done some tests for the file containing 5,000,000 rows with information about financial transactions. Each row contains unique transaction id, id of a customer, transaction amount and currency info. Total number of customers is 50,001; number of possible currencies is 35. I performed two sets of tests: one for the group defined by customer id and one defined by currency id.

Results for the sampling_field = CustomerId

Stratum is defined by id of customer. All data can be split to 50,001 groups with sizes from 61 to 143 transactions.

Following table shows testing results for some groups. Sorting was enabled for systematic sampling method.

defined sample size ratio: 0.01

sampling field (CustomerId) value simple sampling systematic sampling stratified sampling pps sampling
Sampling time 0 h 0 m 22 s 772 ms 0 h 1 m 34 s 965 ms 0 h 1 m 33 s 831 ms 0 h 1 m 30 s 973 ms
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
0 4 71 0.0563 1 71 0.0140 1 71 0.0140 1 71 0.0140
1 0 94 0.0000 1 94 0.0106 1 94 0.0106 1 94 0.0106
10 2 110 0.0181 1 110 0.0090 2 110 0.0181 1 110 0.0090
100 1 93 0.0107 1 93 0.0107 1 93 0.0107 1 93 0.0107
1000 0 83 0.0000 1 83 0.0120 1 83 0.0120 1 83 0.0120
10000 2 101 0.0198 1 101 0.0099 1 101 0.0099 1 101 0.0099
10001 0 99 0.0000 1 99 0.0101 1 99 0.0101 1 99 0.0101
10002 0 109 0.0000 1 109 0.0091 3 109 0.0275 1 109 0.0091
10003 1 86 0.0116 1 86 0.0116 2 86 0.0232 1 86 0.0116
10004 1 86 0.0116 1 86 0.0116 1 86 0.0116 1 86 0.0116
total 49937 5000000 0.0099 50000 5000000 0.0100 68172 5000000 0.0136 50011 5000000 0.0100
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
CustomerId simple sampling systematic sampling stratified sampling pps sampling
Sampling time 0 h 0 m 28 s 741 ms 0 h 1 m 34 s 474 ms 0 h 1 m 32 s 628 ms 0 h 1 m 33 s 949 ms
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
0 0 71 0.0000 0 71 0.0000 1 71 0.0140 0 71 0.0000
1 0 94 0.0000 1 94 0.0106 1 94 0.0106 1 94 0.0106
10 1 110 0.0090 1 110 0.0090 3 110 0.0272 1 110 0.0090
100 1 93 0.0107 1 93 0.0107 1 93 0.0107 1 93 0.0107
1000 0 83 0.0000 1 83 0.0120 2 83 0.0240 0 83 0.0000
10000 1 101 0.0099 1 101 0.0099 1 101 0.0099 1 101 0.0099
10001 2 99 0.0202 1 99 0.0101 1 99 0.0101 1 99 0.0101
10002 1 109 0.0091 1 109 0.0091 1 109 0.0091 1 109 0.0091
10003 0 86 0.0000 1 86 0.0116 1 86 0.0116 1 86 0.0116
10004 1 86 0.0116 1 86 0.0116 1 86 0.0116 0 86 0.0000
total 49931 5000000 0.0099 50000 5000000 0.0100 68369 5000000 0.0136 50010 5000000 0.0100
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
CustomerId simple sampling systematic sampling stratified sampling pps sampling
Sampling time 0 h 0 m 24 s 975 ms 0 h 1 m 37 s 446 ms 0 h 1 m 29 s 98 ms 0 h 1 m 32 s 857 ms
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
0 0 71 0.0000 1 71 0.0140 1 71 0.0140 1 71 0.0140
1 1 94 0.0106 1 94 0.0106 1 94 0.0106 1 94 0.0106
10 0 110 0.0000 1 110 0.0090 2 110 0.0181 2 110 0.0181
100 0 93 0.0000 1 93 0.0107 1 93 0.0107 1 93 0.0107
1000 1 83 0.0120 1 83 0.0120 1 83 0.0120 1 83 0.0120
10000 2 101 0.0198 1 101 0.0099 1 101 0.0099 1 101 0.0099
10001 1 99 0.0101 1 99 0.0101 1 99 0.0101 1 99 0.0101
10002 0 109 0.0000 1 109 0.0091 3 109 0.0275 1 109 0.0091
10003 1 86 0.0116 1 86 0.0116 1 86 0.0116 1 86 0.0116
10004 0 86 0.0000 1 86 0.0116 1 86 0.0116 1 86 0.0116
total 49983 5000000 0.0099 50000 5000000 0.0100 68258 5000000 0.0136 49900 5000000 0.0099
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
CustomerId simple sampling systematic sampling stratified sampling pps sampling

Results for the same test but with data sorting disabled in systematic sampling method:

defined sample size ratio: 0.01

sampling field (CustomerId) value simple sampling systematic sampling stratified sampling pps sampling
Sampling time 0 h 0 m 28 s 168 ms 0 h 0 m 23 s 117 ms 0 h 1 m 35 s 414 ms 0 h 1 m 30 s 985 ms
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
0 1 71 0.0140 0 71 0.0000 1 71 0.0140 1 71 0.0140
1 1 94 0.0106 0 94 0.0000 2 94 0.0212 1 94 0.0106
10 1 110 0.0090 2 110 0.0181 1 110 0.0090 1 110 0.0090
100 0 93 0.0000 0 93 0.0000 1 93 0.0107 1 93 0.0107
1000 0 83 0.0000 0 83 0.0000 1 83 0.0120 1 83 0.0120
10000 2 101 0.0198 0 101 0.0000 1 101 0.0099 1 101 0.0099
10001 0 99 0.0000 1 99 0.0101 1 99 0.0101 1 99 0.0101
10002 3 109 0.0275 1 109 0.0091 3 109 0.0275 1 109 0.0091
10003 1 86 0.0116 2 86 0.0232 1 86 0.0116 1 86 0.0116
10004 1 86 0.0116 0 86 0.0000 2 86 0.0232 0 86 0.0000
total 50081 5000000 0.0100 50000 5000000 0.0100 68227 5000000 0.0136 49966 5000000 0.0099
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
CustomerId simple sampling systematic sampling stratified sampling pps sampling
Sampling time 0 h 0 m 23 s 78 ms 0 h 0 m 19 s 178 ms 0 h 1 m 33 s 148 ms 0 h 1 m 29 s 261 ms
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
0 0 71 0.0000 0 71 0.0000 1 71 0.0140 0 71 0.0000
1 0 94 0.0000 0 94 0.0000 1 94 0.0106 1 94 0.0106
10 0 110 0.0000 3 110 0.0272 1 110 0.0090 1 110 0.0090
100 3 93 0.0322 1 93 0.0107 1 93 0.0107 1 93 0.0107
1000 1 83 0.0120 0 83 0.0000 2 83 0.0240 1 83 0.0120
10000 0 101 0.0000 1 101 0.0099 1 101 0.0099 1 101 0.0099
10001 1 99 0.0101 1 99 0.0101 3 99 0.0303 1 99 0.0101
10002 1 109 0.0091 0 109 0.0000 1 109 0.0091 1 109 0.0091
10003 1 86 0.0116 0 86 0.0000 1 86 0.0116 1 86 0.0116
10004 3 86 0.0348 0 86 0.0000 1 86 0.0116 1 86 0.0116
total 50056 5000000 0.0100 50000 5000000 0.0100 68528 5000000 0.0137 50033 5000000 0.0100
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
CustomerId simple sampling systematic sampling stratified sampling pps sampling
Sampling time 0 h 0 m 28 s 244 ms 0 h 0 m 27 s 52 ms 0 h 1 m 35 s 49 ms 0 h 1 m 27 s 725 ms
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
0 1 71 0.0140 0 71 0.0000 2 71 0.0281 0 71 0.0000
1 1 94 0.0106 0 94 0.0000 2 94 0.0212 1 94 0.0106
10 0 110 0.0000 2 110 0.0181 4 110 0.0363 1 110 0.0090
100 2 93 0.0215 2 93 0.0215 1 93 0.0107 1 93 0.0107
1000 2 83 0.0240 0 83 0.0000 1 83 0.0120 1 83 0.0120
10000 0 101 0.0000 0 101 0.0000 1 101 0.0099 1 101 0.0099
10001 0 99 0.0000 4 99 0.0404 1 99 0.0101 1 99 0.0101
10002 1 109 0.0091 2 109 0.0183 1 109 0.0091 1 109 0.0091
10003 1 86 0.0116 1 86 0.0116 2 86 0.0232 0 86 0.0000
10004 0 86 0.0000 0 86 0.0000 1 86 0.0116 1 86 0.0116
total 50116 5000000 0.0100 50000 5000000 0.0100 68470 5000000 0.0136 50010 5000000 0.0100
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
CustomerId simple sampling systematic sampling stratified sampling pps sampling

Since the groups are really small, there should be selected none or one record from each group and for the smaller groups we should have more often zero selected records. In relation to this criteria the PPS sampling method and systematic sampling method with sorting data enabled give the best results. Data sample created with stratified method is always oversized.

Results for the sampling_field = CurrencyId

Stratum is defined by id of currency. All data can be split to 35 groups with very similar sizes from 142,042 to 143,572 transactions.

The following table shows testing results for some groups. Sorting was enabled for systematic sampling method.

defined sample size ratio: 0.01

sampling field (CurrencyId) value simple sampling systematic sampling stratified sampling pps sampling
Sampling time 0 h 0 m 21 s 681 ms 0 h 1 m 26 s 859 ms 0 h 1 m 25 s 970 ms 0 h 1 m 27 s 85 ms
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
0 1450 142623 0.0101 1427 142623 0.0100 1447 142623 0.0101 1426 142623 0.0099
1 1371 142925 0.0095 1429 142925 0.0099 1430 142925 0.0100 1429 142925 0.0099
10 1420 142897 0.0099 1429 142897 0.0100 1432 142897 0.0100 1429 142897 0.0100
11 1448 142896 0.0101 1429 142896 0.0100 1443 142896 0.0100 1429 142896 0.0100
12 1383 142522 0.0097 1425 142522 0.0099 1488 142522 0.0104 1425 142522 0.0099
13 1468 142461 0.0103 1425 142461 0.0100 1395 142461 0.0097 1424 142461 0.0099
14 1449 142997 0.0101 1430 142997 0.0100 1479 142997 0.0103 1430 142997 0.0100
15 1401 142697 0.0098 1426 142697 0.0099 1438 142697 0.0100 1427 142697 0.0100
16 1396 143137 0.0097 1432 143137 0.0100 1387 143137 0.0096 1431 143137 0.0099
17 1464 142517 0.0102 1425 142517 0.0099 1413 142517 0.0099 1425 142517 0.0099
total 49959 5000000 0.0099 50000 5000000 0.0100 50075 5000000 0.0100 49997 5000000 0.0099
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
CurrencyId simple sampling systematic sampling stratified sampling pps sampling
Sampling time 0 h 0 m 22 s 949 ms 0 h 1 m 25 s 726 ms 0 h 1 m 27 s 629 ms 0 h 1 m 24 s 537 ms
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
0 1449 142623 0.0101 1427 142623 0.0100 1496 142623 0.0104 1426 142623 0.0099
1 1468 142925 0.0102 1429 142925 0.0099 1442 142925 0.0100 1429 142925 0.0099
10 1436 142897 0.0100 1429 142897 0.0100 1406 142897 0.0098 1429 142897 0.0100
11 1436 142896 0.0100 1429 142896 0.0100 1402 142896 0.0098 1429 142896 0.0100
12 1410 142522 0.0098 1425 142522 0.0099 1454 142522 0.0102 1425 142522 0.0099
13 1438 142461 0.0100 1425 142461 0.0100 1414 142461 0.0099 1425 142461 0.0100
14 1420 142997 0.0099 1430 142997 0.0100 1450 142997 0.0101 1430 142997 0.0100
15 1412 142697 0.0098 1427 142697 0.0100 1400 142697 0.0098 1427 142697 0.0100
16 1453 143137 0.0101 1431 143137 0.0099 1442 143137 0.0100 1431 143137 0.0099
17 1431 142517 0.0100 1425 142517 0.0099 1372 142517 0.0096 1425 142517 0.0099
total 50163 5000000 0.0100 50000 5000000 0.0100 49709 5000000 0.0099 50000 5000000 0.0100
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
CurrencyId simple sampling systematic sampling stratified sampling pps sampling
Sampling time 0 h 0 m 27 s 716 ms 0 h 1 m 26 s 865 ms 0 h 1 m 26 s 657 ms 0 h 1 m 26 s 254 ms
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
0 1488 142623 0.0104 1426 142623 0.0099 1416 142623 0.0099 1426 142623 0.0099
1 1353 142925 0.0094 1429 142925 0.0099 1434 142925 0.0100 1429 142925 0.0099
10 1417 142897 0.0099 1429 142897 0.0100 1390 142897 0.0097 1429 142897 0.0100
11 1448 142896 0.0101 1429 142896 0.0100 1438 142896 0.0100 1429 142896 0.0100
12 1448 142522 0.0101 1425 142522 0.0099 1408 142522 0.0098 1425 142522 0.0099
13 1412 142461 0.0099 1425 142461 0.0100 1432 142461 0.0100 1424 142461 0.0099
14 1440 142997 0.0100 1430 142997 0.0100 1471 142997 0.0102 1430 142997 0.0100
15 1445 142697 0.0101 1427 142697 0.0100 1530 142697 0.0107 1427 142697 0.0100
16 1436 143137 0.0100 1431 143137 0.0099 1456 143137 0.0101 1432 143137 0.0100
17 1381 142517 0.0096 1425 142517 0.0099 1365 142517 0.0095 1426 142517 0.0100
total 50089 5000000 0.0100 50000 5000000 0.0100 49707 5000000 0.0099 49999 5000000 0.0099
sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio sample size dataset size sample size ratio
CurrencyId simple sampling systematic sampling stratified sampling pps sampling

With such large groups all the methods give very good results. Although no doubt we get the best results using the systematic sampling or PPS sampling methods where the sample size is always within the limits 0.0099 to 0.0100.

Download the transformation graph with data

Launch Services – ETL Transformation as a WebService

Transformations on CloverETL Server can be run by users in a very simple way by using just a web browser and a correct link. When a user click on the link, a transformation is triggered on the Server and requested data are generated. These data can be in several different formats as Excel spreadsheet, CSV file, XML, or HTML. One can even place a form on the web page that serves as a source for parameters for the transformation.

In this blog post, I will demonstrate how Launch Services can be used from user point-of-view. The next part will show how Launch Services can be configured.

Each installation of CloverETL Server contains a sandbox named default that has many examples of Launch Services in. So you can immediately experiment with all the examples described below. If you do not have a licensed version of CloverETL Server, you can use an online server demo running at http://www.cloveretl.com/examples/server-demo

Example 1 – Glossary

There is a database with economic terms which is updated everyday on our online demo server. You can put a web form on a web page where user enters an unknown term and press the execute button. A request is sent to the server where CloverETL transformation is run, then matching and similar terms are returned and displayed as an HTML page.

Note: This example is using HTTP POST request. However, Launch Services can handle also GET requests. So you can place anywhere a normal URL that would link to an explanation of a specific term. Format of such link is: <server-url>/launch/<launch-service-name>?<paramenter-name>=<parameter-value>, e.g.: http://server-demo-ec2.cloveretl.com/clover/launch/glossary?term=trend

Example 2 – Mountains

There is a database with the highest mountains on Earth with their heights. User enters an elevation above sea-level and hits an enter key. Excel sheet is then displayed listing all mountains with the given minimal elevation.

Note: It is necessary to provide credentials to trigger Launch Services (using HTTP basic authentification). A transformation may depend on a logged user or a group, the user belongs to. It is thus possible to configure the service in such a way that it will run under one URL but returned data format will depend on a specific user (e.g. users will get results in an Excel sheet, but one specific user can receive results as CSV file).

Example 3 – Mountains Upload

Launch Service can accept also a file containing data as a parameter. These data can be inserted to the server database or can be used for a transformation run. As a variation to the previous example, it is possible to submit a file with a list of highest mountains that is used for a selection.

Example 4 – Everest Ascents

Though input fields are of text type, Launch Services can parse them, according to how they are configured, into other data types as integer, real, date, boolean etc. They can have specific formats and locale settings, even different for each user. If a user enters data in a wrong format, he is alerted and CloverETL transformation can then receive only correct, parsed data.

In our example, we have a list of mountain climbers that reached Mount Everest without using an oxygen mask. Launch Service enables you to select a climber by date of the first ascent, gender, number of successful ascents, and nationality.

Example 5 – Scripting

Up to now, we have been accessing Launch Services from a web browser. However, they can be easily used also from other systems. It is possible to access a given URL through any application supporting HTTP protocol as wget on Unix. You could write a script downloading all women that climbed Mount Everest and sending them to a given email address:

wget –user clover –password clover -q -O – http://localhost:8080/clover/launch/ascents-everest?sex=F | tail -n +2 | cut -f 1 | mail i@somwhe.re -s “Ladies on Everest”

Example 6 – Client Library

The last possibility how to call Launch Services is via a Java library cloverETL-server-client.zip. This package will come handy if you need to integrate Launch Service with your Java application or if you need to trigger Launch Service from a command line with more parameters than wget allows for.

If you need to run Launch Service called acents-everest running on a server http://localhost:8080/clover as a user clover and all results save to a folder acents-everest-result, you can use this command:
java -jar cloverETL-server-client.jar acents-everest -u http://localhost:8080/clover -l clover -p clover -o acents-everest-result

Read the part two: Launch Services – Configuration

2010 in Review

Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

Building DWH with CloverETL: Slowly Changing Dimension Type 2 May 2010

2

Export from a database to Excel March 2010

3

DataDirect’s OracleDB JDBC driver speed test January 2010
1 comment

4

Integration of Clover with PHP June 2010
1 comment

5

How to easily enrich data using CloverETL’s Auto-filling feature June 2010

Spell Checking for Better Data Quality – AspellLookup Table

AspellLookupTable is a commercial lookup table which has been around since CloverETL 2.6. Because Aspell is a free software spell checker, you might  be wondering what it is used for in CloverETL. In fact, AspellLookupTable does not perform any spell checking at all, it “just” allows you to lookup data records with keys similar to the one you provide. This may be useful e.g. when looking for a street whose name is misspelled to a certain extent.

Similarity of String Keys

The similarity of two string keys is measured in terms of edit operations used to “transform” a misspelled key to the correct one. At the moment, CloverETL supports five edit operations: change of character case, character transposition, deletion of a character, insertion of a character, and finally character replacement. Each edit operation is assigned an edit cost which reflects its “seriousness”. By default, edit costs are set to 10, 90, 95, 95, and 100, respectively. As you can see, the difference in character case is not considered as serious as the character transposition. However, it is possible to tweak the edit costs as you wish.

If  you add all edit costs of the minimum number of edit operations required to “transform” the misspelled key to the correct one, you get an edit distance of the two keys. In other words, you get the similarity measure of the keys. However, the edit distance itself is just a number which will vary for different keys. Therefore you need to somehow determine which data records should be included in the result set and which should be thrown away. And that’s why you also need to specify a spelling threshold. This threshold is 230 by default but may be changed to best suit your needs.

An example is worth a thousand words, so let’s assume that one of the keys is "Hello". If you use "helo" for lookup, the edit distance will be 105 because you had to change the case of 'h' (cost = 10) and to insert another 'l' (cost = 95). If you used "Halllo" instead, the edit distance would be 195 ('a' was replaced by 'e' and one 'l' had to be removed). Both these queries would return a non-empty result set as their edit distance is below the spelling threshold. On the other hand, "Bye" as a lookup key would result in an empty result set as its edit distance is way above the threshold.

AspellLookupTable in Action

AspellLookupTable is a great candidate for address cleansing. Imagine you have a huge list of incorrectly spelled street names and you need to determine whether these streets actually exist in a certain town or city. You also have a directory of correctly spelled street names with several additional information, e.g. the town name, its region, etc. In the end, you want a list of all the streets, and corresponding towns, that do exist. Does it seem difficult? No, it’s a piece of cake with CloverETL:

First you need to set up your AspellLookupTable instance to load the street directory. Then you just load the invalid addresses and “pair” them with existing streets using the lookup table. When you have all the streets joined with whatever data you need, you may perform some further processing and then store the result in a flat file. Pretty simple, huh? :-)

Download the transformation graph with data

Connecting to JMS Queue on Glassfish v2 with CloverETL

JMS (Java Message Service) is a Java Enterprise Edition (EE) technology used to allow messages to be sent in a decentralized manner between web clients or end-users. In order to check for new messages or to send outgoing ones, a Java program must successfully connect to the JMS queue in order to proceed with the application. CloverETL graph can be driven or can send such messages. You can connect to any JMS broker from CloverETL using JMSReader or JMSWriter component. In this article you can find instruction, how to create JMS Connection and how to configure the graph to connect to JMS queue on Glassfish application server from CloverETL.

Let’s suppose we’ve defined JMS connection factory test on the Glassfish application server as follows:Jms Connection Factory settings:

and JMS queue jms/testQueue:

JMS Queue settings

Now we need to define JMS connection in CloverETL graph:

JMS Connection in CloverDesigner

“Connection factory JNDI name” and “Destination JNDI” refers to the objects we have just created on our application server.

Library appserv_rt.jar contains com.sun.enterprise.naming.SerialInitContextFactory class. When you add the library to the connection settings, you can choose the factory class from the combo box. But when you know the class name, you don’t need to do it, as the Glassfish server has this library on the class-path or you will add it to the project class-path (see below). Remove the library after connection edition; having it in connection definition and in the class-path can lead to unpredictable errors.

If the graph is running on CloverETL Server within the same Glassfish server instance, you don’t need any further settings as all needed jar files are contained in application server.

When you run the graph as a “client” (from CloverETL Designer, command line or other CloverETL Server) you need to set following jars on the jvm class path:

  • /lib/appserv-rt.jar
  • /lib/appserv-admin.jar
  • /lib/javaee.jar
  • /lib/j2ee.jar
  • /imq/lib/imqjmsra.jar

(You find them in your Glassfish home directory).

In CloverETL Designer you can do it by selecting the project. After right click go to Properties –> Java Build Path:

Project properties

You can add the libraries with “Add JARs…” button if you have copied the files to the Workspace or with “Add External JARs…” if the files are outside the Workspace.

You also need to set the -Dorg.omg.CORBA.ORBInitialHost property if you run the graph on the different host and the naming service port property, if necessary. The default naming service port in the app server is 3700. If the naming service is running on a different port, you’ll need to set it via the -Dorg.omg.CORBA.ORBInitialPort property. You can double-check the actual naming service port for a given server instance by looking in the server instace’s domain.xml for “orb-listener-1″. Alternatively, check the Configuration –> ORB –> IIOP Listeners –> orb-listener-1 in Glassfish Administration Console.

To set these properties, when running the graph from CloverETL Designer, you need to create new graph launch configuration:

  1. Go to Run –> Run Configurations
  2. Create new CloverETL graph configuration
  3. Switch to Arguments tab and set the properties:CloverETL graph run configuration