<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>CloverETL&#039;s Blog &#187; CloverETL</title>
	<atom:link href="http://blog.cloveretl.com/tag/cloveretl/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cloveretl.com</link>
	<description>Life, the Universe, CloverETL and everything ...</description>
	<lastBuildDate>Thu, 15 Jul 2010 14:12:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.cloveretl.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/dd4c2411bcdf90b36e88bda58e3fce7c?s=96&#038;d=http://s2.wp.com/i/buttonw-com.png</url>
		<title>CloverETL&#039;s Blog &#187; CloverETL</title>
		<link>http://blog.cloveretl.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.cloveretl.com/osd.xml" title="CloverETL&#039;s Blog" />
	<atom:link rel='hub' href='http://blog.cloveretl.com/?pushpress=hub'/>
		<item>
		<title>Building DWH with CloverETL: Slowly Changing Dimension Type 2</title>
		<link>http://blog.cloveretl.com/2010/05/27/building-dwh-with-cloveretl-scd2/</link>
		<comments>http://blog.cloveretl.com/2010/05/27/building-dwh-with-cloveretl-scd2/#comments</comments>
		<pubDate>Thu, 27 May 2010 01:39:40 +0000</pubDate>
		<dc:creator>Petr Uher</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[CloverETL]]></category>
		<category><![CDATA[dwh]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[scd2]]></category>
		<category><![CDATA[slowly changing dimension]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=668</guid>
		<description><![CDATA[In the last part of our data warehouse (DWH) tutorial, I showed you how to load a dimension table that stores historical data according to the Slowly Changing Dimension Type 1 (SCD1). In today’s post, I will focus on a Slowly Changing Dimension Type 2 (SCD2) dimension table. I think that SCD2 is the most [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=668&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/">In the last part</a> of our data warehouse (DWH) tutorial, I showed you how to load a dimension table that stores historical data according to the Slowly Changing Dimension Type 1 (SCD1). In today’s post, I will focus on a Slowly Changing Dimension Type 2 (SCD2) dimension table. I think that SCD2 is the most challenging sub-task of ETL part of DWH design and each ETL architect should be able to deal with it.</p>
<p>In contrast to SCD1, SCD2 table stores preserves history of attributes. So once the value of attribute is changed in external system  (OLTP) we have to create a new record in SCD2 dimension table with the actual value but we also have to mark the old record in SCD2 table as obsolete. The most common way to obsolete the record is to maintain two additional attributes: valid_from and valid_to. Then the record is considered valid at particular date D when valid_from &lt; D ≤ valid_to. You can find a detailed explanation of SCD2 principles in <a href="http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1274661375&amp;sr=8-1">Kimball’s DWH bible</a> or on <a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2">wikipedia.org</a>.</p>
<p>Let us show how SCD2 works in real in a small example. We will use DWH schema introduced in SCD1’s post.</p>
<p style="text-align:left;"><a href="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png"><img class="size-full wp-image-218 aligncenter" title="DB schema of sample DWH" src="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png?w=648&#038;h=273" alt="" width="648" height="273" /></a></p>
<p>It consists of four dimensions (Customer, Product, Store and Date), one degenerate dimension (invoice number) and one fact table (Sales). Fact table stores two additive facts: units and total price.</p>
<p>Store table is populated as SCD1 and we will load Customer table that was marked as SCD2 dimension table. Let’s imagine that Customer changed his email. What will happen in OLTP and DWH Customer table named D_CUSTOMER?</p>
<p><strong>OLTP:</strong></p>
<p>C0001;John;Newman;john.newman@hotmail.com <span style="color:#0000ff;">=&gt;</span> C0001;John;Newman;newman.john@gmail.com</p>
<p><strong>DWH:</strong></p>
<p>0001;C0001;John;Newman;john.newman@hotmail.com;2009-10-10;<span style="color:#ff0000;">null</span> <span style="color:#0000ff;">=&gt;</span><br />
0001;C0001;John;Newman;john.newman@hotmail.com;2009-10-10;<span style="color:#ff0000;">2010-05-20</span><br />
0002;C0001;John;Newman;newman.john@gmail.com;2010-05-21;null</p>
<p>Notice especially the first two attributes (columns) and the last two attributes of DWH table. The first attribute is a surrogate key, it is a unique identifier of the record in D_CUSTOMER table. It is generated by ETL process.  The second one (C0001) is a natural key, a unique identifier of customer in OLTP. When you list all records of the same natural key in D_CUSTOMER you will get a complete history of one customer.</p>
<p>The principle how SCD2 works is explained now I will describe an implementation of SCD2 in CloverETL. See the CloverETL’s graph bellow.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/05/d_customer_scd2.png"><img class="aligncenter size-full wp-image-670" title="D_CUSTOMER_SCD2" src="http://cloveretl.files.wordpress.com/2010/05/d_customer_scd2.png?w=799&#038;h=166" alt="" width="799" height="166" /></a></p>
<p>The basic data-flow of the graph is very simple: in the lower branch we read the data from OLTP (for us as in previous DWH post it’s a CSV file), in the upper branch we read data that is already stored in our SCD2 table. We have to use the Dedup component, as we want only one actual record for each customer. The two branches intersect in DataIntersection component that processes the records according to the natural key. The component has three output ports, as there are three possible outcomes:</p>
<ol>
<li>The record exists only in DWH. This should not happen, it means that the record was deleted in OLTP. The “normal” OLTPs do not allow delete of records. That kind of records end in Trash component.</li>
<li>In DWH table there exists at least one record with the same natural key as the record coming from OLTP. That record goes through the second output port to the component that identifies whether the record was changed (ExtFilter component). And then the record is copied to two records: the first one that obsoletes the current record in D_CUSTOMER (identified by surrogate key) and  the second one that is inserted to D_CUSTOMER and stores the new values read from OLTP. The first one set column valid_to = today()-1 and the second record is inserted with valid_from = today() and valid_to = null.</li>
<li>The record coming from OLTP is a new one, there is no record with the same natural key in DWH. In that case the record is sent to the third output port and in following components is inserted to D_CUSTOMER table with valid_from = today() and valid_to = null.</li>
</ol>
<p>If you want to verify that your CloverETL SCD2 graph works correctly or if you are looking for sample data, you can simply import example project to your Clover installation. It is embedded to your CloverETL Designer as a DWHExample project. For more information how to import example project see <a href="http://www.cloveretl.com/documentation/UserGuide/topic/com.cloveretl.gui.docs/docs/cloveretl-examples-project.html">online documentation</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/668/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=668&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/05/27/building-dwh-with-cloveretl-scd2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cec19d3aab248f6f4591a97277323f2c?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">puher</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png" medium="image">
			<media:title type="html">DB schema of sample DWH</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/05/d_customer_scd2.png" medium="image">
			<media:title type="html">D_CUSTOMER_SCD2</media:title>
		</media:content>
	</item>
		<item>
		<title>Loop execution of transformation</title>
		<link>http://blog.cloveretl.com/2010/04/28/loop-execution-of-transformation/</link>
		<comments>http://blog.cloveretl.com/2010/04/28/loop-execution-of-transformation/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 15:05:56 +0000</pubDate>
		<dc:creator>mvarecha</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[CloverETL]]></category>
		<category><![CDATA[groovy]]></category>
		<category><![CDATA[ISIR]]></category>
		<category><![CDATA[listener]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[SOA]]></category>
		<category><![CDATA[soap]]></category>
		<category><![CDATA[web service]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=627</guid>
		<description><![CDATA[Case study description Czech Insolvency Registry (http://isir.justice.cz) basically contains data about economic subjects that entered insolvency and have financial difficulties with paying off their debts. The registry allows everybody to download data using public SOAP Web Service. It can be done manually or automatically with the right software. https://isir.justice.cz:8443/isir_ws/services/IsirPub001?wsdl CloverETL can easily help with the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=627&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<h4>Case study description</h4>
<p>Czech Insolvency Registry (<a href="http://isir.justice.cz/">http://isir.justice.cz</a>) basically contains data about economic subjects that entered insolvency and have financial difficulties with paying off their debts. The registry allows everybody to download data using public SOAP Web Service. It can be done manually or automatically with the right software.<br />
<a href="https://isir.justice.cz:8443/isir_ws/services/IsirPub001?wsdl">https://isir.justice.cz:8443/isir_ws/services/IsirPub001?wsdl</a></p>
<p>CloverETL can easily help with the automatically download that would save time and technical difficulties. CloverETL graph can get required data by calling the web service, processes data and store it in required format. Unfortunately the Registry’s web service is very poorly designed. The service doesn&#8217;t give you current status of each of the economic subjects, but provides the whole history of the required company. Therefore we have to download not only the current information we need but the whole information since the year 2008 (the registry foundation). That is a lot of data to process – actually thousands of log records for each company! Moreover the Registry’s Web service „GetIsirPub0012“ provides only maximum of 1000 records per one call. If one company has few thousands of records you have to undertake more calls.  So we have to download data in thousand-records bunches, but we don&#8217;t know in advance exactly how many of these bunches (records) there are for each company. That makes the whole process quiet difficult.</p>
<p>But solution with CloverETL is simple. CloverETL Server provides features “graph event listener” and “groovy task” that help us with all the above described challenges. Firstly, we will of course design a CloverETL graph that processes for the beginning just one thousand –record bunch of data (see picture bellow).</p>
<h4>Transformation graph</h4>
<p>This graph has a parameter „startID“ which has value “0” by default. If we want to process 1000 records starting let’s say from no. 2541, then  start ID will be startID=2541, and the first downloaded record will be identified by. If we run graph without parameters, it&#8217;ll download and process first thousand of records (no. 0 – 999).</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/04/isir_graph.png"><img class="aligncenter size-full wp-image-628" title="ISIR_graph" src="http://cloveretl.files.wordpress.com/2010/04/isir_graph.png?w=800&#038;h=345" alt="" width="800" height="345" /></a></p>
<p>Graph also contains couple of components to store ID of last downloaded record so that  the next bunch to download may use the last ID  as the startID. It will be automatically stored to graph parameters as lasted parameter. It can be done by in-line Java code in Reformat component:<br />
<code>String id = GetVal.getString(source[0],"id");<br />
getGraph().getGraphProperties().setProperty("lastID", id );</code></p>
<h4>The loop</h4>
<p>The graph we designed must be executed n-times to download and process all records. At first we don&#8217;t know how many times, but we know, that we can stop the downloading process as soon as there are no more records to read. It means we can stop the process as soon as  “started” and “lasted” are equal.</p>
<p>How to achieve such loop?</p>
<h5>Graph event listener</h5>
<p>To achieve the automatic loop, for the graph that we designed and described previously, we&#8217;ll define graph event listener for “FINISHED_OK” graph event on CloverETL Server. So every time transformation finishes without error („FINISHED_OK“), listener will trigger task that we selected. We need to specify this tasks now. Since we want to  execute the same graph repeatedly, we have to specify “execute graph” task. This task will repeat executing the graph indefinitely. However we need to stop this loop at some point. We need to “break the loop” when the startID and the lastID parameters are equal. Therefore it is actually better to create “groovy task” instead of „execute graph“.</p>
<h5>Groovy task</h5>
<p>Groovy is scripting language with Java syntax. In addition, groovy scripts may access java objects and use java libraries. See Groovy project site <a href="http://groovy.codehaus.org/">http://groovy.codehaus.org/</a> for more details.</p>
<p>We&#8217;ll create a simple groovy script which decides whether execute the graph again or not. To decide it, we&#8217;ll need to get graph properties from the finished graph. These properties are accessible by calling method event.getProperties().</p>
<p>Then, we&#8217;ll need to execute graph using CloverETL Server Java API. It&#8217;s done by calling method <code>serverFacade.executeGraph()</code>.</p>
<p>Script may return String value which is stored in „Task history log“.</p>
<p><code>// these variables are predefined:<br />
// sessionToken<br />
// event<br />
// serverFacade</code></p>
<p><code>import com.cloveretl.server.persistent.RunRecord;<br />
import org.apache.log4j.Logger;<br />
import com.cloveretl.server.api.*;<br />
import org.springframework.web.context.WebApplicationContext;<br />
import org.springframework.web.context.support.WebApplicationContextUtils;</code></p>
<p><code>Logger log = Logger.getLogger("groovy-ISIR-graphEventListener");</code></p>
<p><code>Properties eventProps = event.getProperties();<br />
log.info("event properties: " + eventProps);</code></p>
<p><code>// get lastID and startID from previous graph execution<br />
String lastIDString = eventProps.getProperty("lastID");<br />
String startIDString = eventProps.getProperty("startID");<br />
long lastID = Long.valueOf(lastIDString);<br />
long startID = Long.valueOf(startIDString);</code></p>
<p><code>// lastID and startID from last graph execution are equal – break the loop<br />
if (lastID == startID)<br />
return "no more records to download";</code></p>
<p><code>// prepare startID which will be passed to next graph execution<br />
Properties properties = new Properties();<br />
properties.setProperty("startID", lastIDString);</code></p>
<p><code>String SANDBOX = eventProps.getProperty("SANDBOX_CODE");<br />
String GRAPH = eventProps.getProperty("GRAPH_FILE");<br />
GraphExecutionCommand graphExecutionCommand = new GraphExecutionCommand(<br />
null, SANDBOX, GRAPH, null, null, null, true, properties, null, null);<br />
Response respExec = serverFacade.executeGraph(sessionToken, graphExecutionCommand);<br />
String result = "graph "+SANDBOX+"/"+GRAPH+" executed: "+respExec.getBean();<br />
log.info(result);<br />
return result;</code></p>
<h4>Graph results</h4>
<p>All graph results for each bunch of data are stored to only one CSV file. They are always added, so don’t worry there is no danger that some of them will be overwritten <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . So when the whole batch of transformations is finished, we have only one CSV file with all processed records. Or if somebody wishes we can consolidate records and store them directly into database where the data can be stored in more friendly and usable format.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/627/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/627/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/627/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/627/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/627/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/627/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/627/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/627/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/627/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/627/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=627&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/04/28/loop-execution-of-transformation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/998f04e59afe9c312019cab4da3f99be?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">mvarecha</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/04/isir_graph.png" medium="image">
			<media:title type="html">ISIR_graph</media:title>
		</media:content>
	</item>
		<item>
		<title>CloverETL File-URL Dialog</title>
		<link>http://blog.cloveretl.com/2010/02/24/cloveretl-file-url-dialog/</link>
		<comments>http://blog.cloveretl.com/2010/02/24/cloveretl-file-url-dialog/#comments</comments>
		<pubDate>Wed, 24 Feb 2010 15:36:22 +0000</pubDate>
		<dc:creator>jausperger</dc:creator>
				<category><![CDATA[Developing Clover]]></category>
		<category><![CDATA[CloverETL]]></category>
		<category><![CDATA[file]]></category>
		<category><![CDATA[ftp]]></category>
		<category><![CDATA[ftps]]></category>
		<category><![CDATA[http]]></category>
		<category><![CDATA[sftp]]></category>
		<category><![CDATA[url]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=447</guid>
		<description><![CDATA[The CloverETL Designer has a brand new File URL Dialog, which was introduced in the version 2.9. The newly designed file dialog is very friendly and intuitive to navigate. There are a lot of new features and improvements. The dialog is separated into several tabs to simplify navigation. They enable users to easily specify resources [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=447&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>The CloverETL Designer has a brand new File URL Dialog, which was introduced in the version 2.9. The newly designed file dialog  is very friendly and intuitive to navigate. There are a lot of new features and improvements. The dialog is separated into several tabs to simplify navigation. They enable users  to easily specify resources such as local files, remote files or shared memory (dictionary). The new dialog is more comfortable to use and has simplified clear design as you can see in the picture bellow. The dialog window adjusts itself according to the context.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/02/url-dialog1.png"><img class="size-full wp-image-454 alignnone" title="url-dialog" src="http://cloveretl.files.wordpress.com/2010/02/url-dialog1.png?w=800&#038;h=600" alt="" width="800" height="600" /></a></p>
<h3><strong>Clover Server</strong></h3>
<p>In the new File dialog you can also find a new CloverETL Server tab specially designed to work with files located on CloverETL Server. It is only visible if you have opened the dialog from existing CloverETL Server project. It looks very similar to the tab you work with on your local computer but you can browse remote CloverETL sandboxes. All names of sandboxes for which you have permissions are in the bookmarks. So you can easily access them.</p>
<h3><strong>File URLs</strong></h3>
<p>This tab handles all types of URLs but it’s mainly designed to browse remote file system via http/https/ftp/ftps/sftp protocols. It also brings special dialog where you can specify advanced parameters of connection like proxy server, HTTP properties.</p>
<h3><strong>Port / Dictionary</strong></h3>
<p>The port and dictionary tabs are specific to CloverETL. The Port tab  is visible only if the component or graph element allows reading/writing data from/to the port. Dictionary is  a shared memory between parts of the graph. It is identified by name and processing type parameter. Both tabs help you to specify the URLs in a visual way so you don’t have to know the exact syntax of CloverETL’s URLs and your work will be easier and more productive.</p>
<h3><strong>Extensibility</strong></h3>
<p>Due to new modular dialog architecture, the dialog itself can be extended for specific tabs if needed.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/447/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/447/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/447/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/447/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/447/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/447/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/447/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/447/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/447/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/447/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=447&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/02/24/cloveretl-file-url-dialog/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/8bd3751f7ffabc9a8bff132f9671bce4?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">jausperger</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/02/url-dialog1.png" medium="image">
			<media:title type="html">url-dialog</media:title>
		</media:content>
	</item>
		<item>
		<title>InfobrightDataWriter component</title>
		<link>http://blog.cloveretl.com/2010/02/03/infobrightdatawriter-component/</link>
		<comments>http://blog.cloveretl.com/2010/02/03/infobrightdatawriter-component/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 08:40:50 +0000</pubDate>
		<dc:creator>Agata Vackova</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[CloverETL]]></category>
		<category><![CDATA[column-oriented]]></category>
		<category><![CDATA[columnar]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Infobright]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=376</guid>
		<description><![CDATA[In response to customer requests we created a new component that writes data into Infobright software, a very popular column-oriented relational database. The CloverETL’s just released version 2.9 offers this new InfobrightDataWriter component. Infobright is a highly compressed column-oriented database, based on MySql engine. In this database data are stored column-by-column instead of more typically [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=376&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>In response to customer requests we created a new component that writes data into Infobright software, a very popular column-oriented relational database. The CloverETL’s just released version 2.9 offers this new InfobrightDataWriter component.</p>
<p><a title="Infobright" href="http://www.infobright.com/Products/">Infobright</a> is a highly compressed column-oriented database, based on MySql engine. In this database  data are stored column-by-column instead of more typically row-by-row. There are many advantages of column-orientation, including the ability to do more efficient data compression and allowing compression to be optimized for each particular data type. The higher efficiency can be achieved because each column stores a single data type as opposed to rows that typically contain several data type. However the main purpose of Infobright software is to deliver a scalable data warehouse database optimized for analytic queries.</p>
<p>Our new InfobrightDataWriter component allows to easily load data into Infobright database directly from CloverETL transformations. It can be used both with Infobright Community Edition (ICE) as well as with Infobright Enterprise Edition (IEE). With IEE it is possible to use fast data loading in binary format. In binary format particular rows aren’t separated by any special character and there are no values’ delimiters or qualifiers either.</p>
<p>The connector uses named pipe for data transfer. In Linux systems it is system pipe, on Windows you need native library placed on the Java library path. The library is part of  infobright-core-vX_X package that can be downloaded from Infobright site (<a title="Download Contributed Software" href="http://www.infobright.org/Downloads/Contributed-Software/">Download Contributed Software</a>).</p>
<p>For remote load it is necessary to start Infobright remote load agent on the server where Infobright is running (part of infobright-core-v3).</p>
<h3><strong>Example graph:</strong></h3>
<p><a href="http://cloveretl.files.wordpress.com/2010/01/infobright_graph.png"><img class="alignnone size-medium wp-image-377" title="infobright_graph" src="http://cloveretl.files.wordpress.com/2010/01/infobright_graph.png?w=600&#038;h=410" alt="" width="600" height="410" /></a></p>
<p><!-- 		@page { margin: 2cm } 		P { margin-bottom: 0.21cm } 		H3 { margin-bottom: 0.21cm } 		H3.western { font-family: "Arial", sans-serif } 		H3.cjk { font-family: "Arial" } 		H3.ctl { font-family: "Arial" } --></p>
<h3><strong>InfobrightDataWriter configuration:</strong></h3>
<p><a href="http://cloveretl.files.wordpress.com/2010/01/infobright_writer_remote.png"><img class="alignnone size-full wp-image-399" title="infobright_writer_remote" src="http://cloveretl.files.wordpress.com/2010/01/infobright_writer_remote.png?w=742&#038;h=627" alt="" width="742" height="627" /></a></p>
<p>The above graph loads data from selected input fields (“Clover fields” attribute) to database table <code>test </code>(“Database table” attribute) on remote server. It uses fast binary method of data loading (“Data format” attribute). Data loaded into database can be sent to output port. Input metadata has free format but the output one has to meet certain conditions:</p>
<ul>
<li>comma as a field delimiter</li>
<li>system new line (<code>'\n'</code> for Linux, <code>'\r\n'</code> for Windows) as a record delimiter</li>
<li>date/time formatting: <em>yyyy-MM-dd HH:mm:ss </em>(can be date only, time only or both)</li>
</ul>
<p>As we load data to the remote server, the Infobright agent must run on the server. To start the agent we need to call:</p>
<p><em>java -jar infobright-core-3.0-remote.jar -p 6666 -l all</em></p>
<p>The port the agent is listening to must be the same as the agent port set on the component (“Remote agent port” attribute).</p>
<p>Full description of the component can be found on CloverETL&#8217;s wiki pages (<a href="http://wiki.clovergui.net/doku.php?id=components:bulkloaders:infobright_data_writer">InfobrightDataWriter component</a>).</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/376/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/376/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/376/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/376/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/376/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/376/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/376/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/376/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/376/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/376/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=376&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/02/03/infobrightdatawriter-component/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/934c88184df6c0034450ae00a1695ee8?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">agad</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/01/infobright_graph.png?w=300" medium="image">
			<media:title type="html">infobright_graph</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/01/infobright_writer_remote.png" medium="image">
			<media:title type="html">infobright_writer_remote</media:title>
		</media:content>
	</item>
		<item>
		<title>Iteration through the record fields in CTL</title>
		<link>http://blog.cloveretl.com/2010/01/25/iteration-through-the-record-fields-in-ctl/</link>
		<comments>http://blog.cloveretl.com/2010/01/25/iteration-through-the-record-fields-in-ctl/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 09:52:19 +0000</pubDate>
		<dc:creator>Vaclav Matous</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[CloverETL]]></category>
		<category><![CDATA[CTL]]></category>
		<category><![CDATA[Hidden features]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=368</guid>
		<description><![CDATA[Recently, I have been facing a very common problem. Imagine this scenario: I have two files – the first one with origin records and the second one with slightly modified new records. Each record had a unique key and aproximately 50 fields. My task was to compare these two files and find out how many [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=368&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } -->Recently, I have been facing a very common problem. Imagine this scenario: I have two files – the first one with origin records and the second one with slightly modified new records. Each record had a unique key and aproximately 50 fields. My task was to compare these two files and find out how many fields in every pair differ from each other in the corresponding records.</p>
<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } -->The simplified graph can be seen in following picture:<a href="http://cloveretl.files.wordpress.com/2010/01/data_intersection.png"><img class="aligncenter size-full wp-image-369" title="data_intersection" src="http://cloveretl.files.wordpress.com/2010/01/data_intersection.png?w=663&#038;h=279" alt="" width="663" height="279" /></a></p>
<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } -->A comparison of two records can be processed by the CloverETL DataIntersection component which joins the records with the same keys. In the joined records, you can compare fields that are not part of the key. But remember, you have to write the comparison in the CTL transformation <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  . Of course you could write the following block of code for each pair of compared fields:<br />
<code><br />
int count = 0;<br />
if(nvl($0.field_N, '') != nvl($1.field_N, '')) {<br />
count++;<br />
//but typically more actions <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' /><br />
}<br />
//and imagine this block 50times <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' /><br />
//final mapping<br />
$0.key := $0.key;<br />
$0.count := count;</code></p>
<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } -->But this solution takes too much time when you have to repeat it for many fields (approx. 50 in my case). It is also very slow, uncomfortable and increases the probability of making a mistake in your code (e.g. omitting some fields). Fortunately, CloverETL allows you to  iterate through the fields of processed records! <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  The code is then more briefer and more generic:<br />
<code><br />
//declaration of variables for copies of input records<br />
record(Metadata1) myrec1;<br />
record(Metadata1) myrec2;<br />
function transform() {<br />
int i = 0;<br />
int count = 0;<br />
//asign value of input records to local variables<br />
myrec1 = @0; //myrec1 is a copy of a current record on input port 0<br />
myrec2 = @1; //myrec2 is a copy of a current record on input port 1<br />
//iterate through fields, suppose that field with index 0 is the key<br />
for(i = 1; i &lt; length(myrec1); i++) {<br />
if(nvl(myrec1[i], '') != nvl(myrec2[i], '')) {<br />
count++;<br />
}<br />
}<br />
//final mapping<br />
$0.key := $0.key;<br />
$0.count := count;<br />
}</code></p>
<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } -->Someone could object to the necessity of making copies of records. In this case I have good news. CTL in CloverETL version 2.9 introduces the possibility of iterating directly through the fields of input records.<br />
<code><br />
@0[i];//i-th field of the input record on port 0</code></p>
<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } -->Moreover new functions for getting field names and data types are introduced in version 2.9. Personally, I am looking forward to such features that will make CTL code simplier and clearer.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/368/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/368/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/368/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/368/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/368/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/368/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/368/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/368/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/368/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/368/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=368&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/01/25/iteration-through-the-record-fields-in-ctl/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/73f56f1267c1896b11e3c6df97499559?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">vmatous</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/01/data_intersection.png" medium="image">
			<media:title type="html">data_intersection</media:title>
		</media:content>
	</item>
		<item>
		<title>DataDirect&#8217;s OracleDB JDBC driver speed test</title>
		<link>http://blog.cloveretl.com/2010/01/12/data-direct-oracle-driver-speed-test/</link>
		<comments>http://blog.cloveretl.com/2010/01/12/data-direct-oracle-driver-speed-test/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 12:39:36 +0000</pubDate>
		<dc:creator>Agata Vackova</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[bulkloader]]></category>
		<category><![CDATA[CloverETL]]></category>
		<category><![CDATA[data direct]]></category>
		<category><![CDATA[jdbc]]></category>
		<category><![CDATA[oracle]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[sqlldr]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=345</guid>
		<description><![CDATA[Purpose Compare the speed of data loading into Oracle database (Oracle Database 11g Release 11.1.0.6.0 – Enterprise Edition) with Oracle corp. JDBC driver, DataDirect JDBC Oracle driver and direct data loading (OracleDataWriter component – sqlldr utility) in CloverETL. Test description Graph used for testing: The above graph loads data into database table that contains 3 [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=345&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<h2>Purpose</h2>
<p><!-- 		@page { margin: 2cm } 		P { margin-bottom: 0.21cm } -->Compare the speed of data loading into Oracle database (Oracle Database 11g Release 11.1.0.6.0 – Enterprise Edition) with Oracle corp. JDBC driver,<a href="http://www.datadirect.com/products/jdbc/oracle/index.ssp"> DataDirect JDBC Oracle driver</a> and direct data loading (OracleDataWriter component – <strong>sqlldr</strong> utility) in CloverETL.</p>
<p><!-- 		@page { margin: 2cm } 		P { margin-bottom: 0.21cm } 		H2 { margin-bottom: 0.21cm } 		H2.western { font-family: "Arial", sans-serif; font-size: 14pt; font-style: italic } 		H2.cjk { font-family: "Arial", sans-serif; font-size: 14pt; font-style: italic } 		H2.ctl { font-family: "Arial", sans-serif; font-size: 14pt; font-style: italic } --></p>
<h2>Test description</h2>
<p><!-- 		@page { margin: 2cm } 		P { margin-bottom: 0.21cm } 		H3 { margin-bottom: 0.21cm } 		H3.western { font-family: "Arial", sans-serif } 		H3.cjk { font-family: "Arial", sans-serif } 		H3.ctl { font-family: "Arial", sans-serif } --></p>
<h3>Graph used for testing:</h3>
<p><a href="http://cloveretl.files.wordpress.com/2010/01/dddbload.png"><img class="alignnone size-medium wp-image-346" title="DDdbLoad" src="http://cloveretl.files.wordpress.com/2010/01/dddbload.png?w=286&#038;h=300" alt="DDdbLoad.grf" width="286" height="300" /></a></p>
<p><!-- 		@page { margin: 2cm } 		P { margin-bottom: 0.21cm } 		H2 { margin-bottom: 0.21cm } 		H2.western { font-family: "Arial", sans-serif; font-size: 14pt; font-style: italic } 		H2.cjk { font-family: "Arial", sans-serif; font-size: 14pt; font-style: italic } 		H2.ctl { font-family: "Arial", sans-serif; font-size: 14pt; font-style: italic } -->The above graph loads data into database table that contains 3 number columns and 127 varchar columns.</p>
<p>Database table for storing data is truncated before each data loading – DBExecute components, each with query: TRUNCATE TABLE dd_test1 REUSE STORAGE</p>
<p><em>Phase 1:</em> loading data with DDBulkLoad (DataDirect) object from csv file (loader.load(file))</p>
<p><em>Phase 3:</em> loading data with DDBulkLoad (DataDirect) object from ResultSet (loader.load(resultSet)) – created ResultSet implementation, that reads data from DataRecord (read from the edge).</p>
<p><em>Phase 4:</em> loading data with DBOutputTable with Oracle corp&#8217;s  JDBC driver:</p>
<pre>Manifest-Version: 1.0
Specification-Title:    Oracle JDBC driver classes for use with JDK14
Sealed: true
Created-By: 1.4.2_08 (Sun Microsystems Inc.)
Implementation-Title:   ojdbc14.jar
Specification-Vendor:   Oracle Corporation
Specification-Version:  Oracle JDBC Driver version - "10.2.0.1.0XE"
Implementation-Version: Oracle JDBC Driver version - "10.2.0.1.0XE"
Implementation-Vendor:  Oracle Corporation
Implementation-Time:    Wed Jan 25 01:28:31 2006</pre>
<p><em>Phase 7:</em> loading data with DBOutputTable with DataDirect Oracle JDBC driver – enabled bulk load feature</p>
<p><em>Phase 9</em>: loading data with OracleDataWriter component from csv file (sqlldr utility)</p>
<p><em>Phase 11: </em>loading data with OracleDataWriter component from edge (sqlldr utility)</p>
<h2>Test processing</h2>
<p>Graph run 3 times for 10,000,000 records with default DataDirect settings.</p>
<p>Graph run 3 times for 1,000,000 records with default DataDirect settings.</p>
<p>Graph run 3 times for 1,000,000 records with following settings:</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/01/connectionproperties.png"><img class="alignnone size-medium wp-image-347" title="Connection Properties" src="http://cloveretl.files.wordpress.com/2010/01/connectionproperties.png?w=215&#038;h=300" alt="" width="215" height="300" /></a></p>
<p><!-- 		@page { margin: 2cm } 		P { margin-bottom: 0.21cm } 		H2 { margin-bottom: 0.21cm } 		H2.western { font-family: "Arial", sans-serif; font-size: 14pt; font-style: italic } 		H2.cjk { font-family: "Arial", sans-serif; font-size: 14pt; font-style: italic } 		H2.ctl { font-family: "Arial", sans-serif; font-size: 14pt; font-style: italic } 		H3 { margin-bottom: 0.21cm } 		H3.western { font-family: "Arial", sans-serif } 		H3.cjk { font-family: "Arial", sans-serif } 		H3.ctl { font-family: "Arial", sans-serif } --></p>
<h2>Test results</h2>
<p>Results in seconds.</p>
<h3>1,000,000 records with default DataDirect settings:</h3>
<p><em>Phase 1:</em> 178, 167, 132 – min: 132, max: 178, average:  159</p>
<p><em>Phase 3:</em> 128, 166, 152 – min: 128, max: 166, average:  149</p>
<p><em>Phase 5:</em> 228, 246, 290 – min: 228, max: 290, average:  255</p>
<p><em>Phase 7</em>: 176, 170, 239 – min: 170, max: 239, average:  195</p>
<p><em>Phase 9:</em> 44, 45, 56 – min: 44, max: 56, average: 48</p>
<p><em>Phase 11:</em> 104, 95, 106 – min: 95, max: 104, average: 102</p>
<h3>1,000,000 records with custom settings:</h3>
<p><em>Phase 1:</em> 163, 152, 142 – min: 142, max: 163, average:  152</p>
<p><em>Phase 3:</em> 166, 133, 134 – min: 133, max: 166, average:  144</p>
<p><em>Phase 5:</em> 278, 263, 260 – min: 260, max: 278, average:  267</p>
<p><em>Phase 7: </em>239, 172, 209 – min: 172, max: 239, average:  207</p>
<h3>10,000,000 records with default DataDirect settings:</h3>
<p><em>Phase 1:</em> 1553, 1818, 1352 – min: 1352, max: 1818, average:  1574</p>
<p><em>Phase 3: </em>1475, 1299, 1298 – min: 1298, max: 1475, average:  1357</p>
<p><em>Phase 5:</em> 3041, 2592, 2550 – min: 2550, max: 3041, average:  2728</p>
<p><em>Phase 7: </em>1824, 1623, 1722 – min: 1722, max: 1824, average:  1723</p>
<p><em>Phase 9: </em>404, 432, 472 – min: 404, max: 472, average: 436</p>
<p><em>Phase 11: </em>1096, 975, 1012 – min: 975, max: 1096, average: 1028</p>
<h2><strong>Summary</strong></h2>
<p>Loading data was slowest when DBOutputTable with original Oracle corp. driver was used. All loadings with DataDirect driver were faster than with Oracle corp. driver but the usage of DDBulkLoad object (DataDirect) plainly increases the speed of loading data in comparison with setting <em>EnableBulkLoad=true</em> and using DBOutputTable. The results for loading data from csv file and edge (result set) are very similar with slight predomination of ResultSet method. All three methods with DataDirect driver usage, get to more steady execution times with number of records to load.</p>
<p>The fastest way of loading data is unquestionably direct data loading with <strong>sqlldr</strong> utility. Even when inter-storing data in pipe, the <strong>sqlldr</strong> utility is about 50% faster than any other method, but is less convenient.<br />
<a href="http://cloveretl.files.wordpress.com/2010/01/graph1.png"><img class="alignnone size-medium wp-image-346" title="1,000,000 records" src="http://cloveretl.files.wordpress.com/2010/01/graph1.png?w=744&#038;h=196" alt="1,000,000 records" width="744" height="196" /></a><br />
﻿﻿<a href="http://cloveretl.files.wordpress.com/2010/01/graph2.png"><img class="alignnone size-medium wp-image-346" title="10,000,000 records" src="http://cloveretl.files.wordpress.com/2010/01/graph2.png?w=744&#038;h=196" alt="10,000,000 records" width="744" height="196" /></a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/345/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/345/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/345/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/345/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/345/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/345/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/345/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/345/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/345/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/345/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=345&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/01/12/data-direct-oracle-driver-speed-test/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/934c88184df6c0034450ae00a1695ee8?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">agad</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/01/dddbload.png?w=286" medium="image">
			<media:title type="html">DDdbLoad</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/01/connectionproperties.png?w=215" medium="image">
			<media:title type="html">Connection Properties</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/01/graph1.png" medium="image">
			<media:title type="html">1,000,000 records</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/01/graph2.png?w=286" medium="image">
			<media:title type="html">10,000,000 records</media:title>
		</media:content>
	</item>
		<item>
		<title>CloverETL extends ETL Price/Performance Leadership with Launch of CloverETL Cluster</title>
		<link>http://blog.cloveretl.com/2009/12/18/cloveretl-extends-etl-priceperformance-leadership-with-launch-of-cloveretl-cluster/</link>
		<comments>http://blog.cloveretl.com/2009/12/18/cloveretl-extends-etl-priceperformance-leadership-with-launch-of-cloveretl-cluster/#comments</comments>
		<pubDate>Fri, 18 Dec 2009 15:54:59 +0000</pubDate>
		<dc:creator>Lucie Felixova</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[CloverETL]]></category>
		<category><![CDATA[cluster]]></category>
		<category><![CDATA[parallel]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=338</guid>
		<description><![CDATA[On December 9, 2009 CloverETL Cluster Edition was launched at PriceWaterhouseCoopers premises. CloverETL Cluster intelligently partitions data and distributes them evenly across multiple nodes in a cluster for execution in parallel. CloverETL Cluster’s ability to load balance large data transformations increases throughput, fault tolerance and flexibility. CloverETL Cluster can be deployed on premise in a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=338&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>On December 9, 2009 CloverETL Cluster Edition was launched at PriceWaterhouseCoopers premises. CloverETL Cluster intelligently partitions data and distributes them evenly across multiple nodes in a cluster for execution in parallel. CloverETL Cluster’s ability to load balance large data transformations increases throughput, fault tolerance and flexibility.</p>
<p>CloverETL Cluster can be deployed on premise in a customer’s own data center or in a variety of cloud configurations, such as Amazon EC2, which can drive costs even lower. During the launch in Prague, CloverETL Cluster was demonstrated running on four Amazon EC2 servers.</p>
<p>The following table shows the time CloverETL requires to execute a moderately complex transformation of six million records (725 MB) in a variety of local and cloud configurations:</p>
<ul>
<li>CloverETL Desktop Designer on a MacBook:                            150 seconds</li>
<li>CloverETL Cluster Load Balanced Across Two EC2 servers:        60 seconds</li>
<li>CloverETL Cluster Load Balanced Across Three EC2 servers:      43 seconds</li>
<li>CloverETL Cluster Load Balanced Across Four EC2 servers:        31 seconds</li>
</ul>
<p><span style="text-align:center; display: block;"><a href="http://blog.cloveretl.com/2009/12/18/cloveretl-extends-etl-priceperformance-leadership-with-launch-of-cloveretl-cluster/"><img src="http://img.youtube.com/vi/Vj8z-quq6r8/2.jpg" alt="" /></a></span></p>
<p><span style="text-align:center; display: block;"><a href="http://blog.cloveretl.com/2009/12/18/cloveretl-extends-etl-priceperformance-leadership-with-launch-of-cloveretl-cluster/"><img src="http://img.youtube.com/vi/b2LsEzkzRiQ/2.jpg" alt="" /></a></span></p>
<p><span style="text-align:center; display: block;"><a href="http://blog.cloveretl.com/2009/12/18/cloveretl-extends-etl-priceperformance-leadership-with-launch-of-cloveretl-cluster/"><img src="http://img.youtube.com/vi/1Nmd9h9N46c/2.jpg" alt="" /></a></span></p>
<p><span style="text-align:center; display: block;"><a href="http://blog.cloveretl.com/2009/12/18/cloveretl-extends-etl-priceperformance-leadership-with-launch-of-cloveretl-cluster/"><img src="http://img.youtube.com/vi/7uYezGiKiKM/2.jpg" alt="" /></a></span></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/338/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/338/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/338/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/338/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/338/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/338/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/338/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/338/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/338/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/338/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=338&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/12/18/cloveretl-extends-etl-priceperformance-leadership-with-launch-of-cloveretl-cluster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/36840395aaef2cf9186f9dcdc1cb947e?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">Lucie Felixova</media:title>
		</media:content>

		<media:content url="http://img.youtube.com/vi/Vj8z-quq6r8/2.jpg" medium="image" />

		<media:content url="http://img.youtube.com/vi/b2LsEzkzRiQ/2.jpg" medium="image" />

		<media:content url="http://img.youtube.com/vi/1Nmd9h9N46c/2.jpg" medium="image" />

		<media:content url="http://img.youtube.com/vi/7uYezGiKiKM/2.jpg" medium="image" />
	</item>
		<item>
		<title>ParallelReader Versus Competitors Finish</title>
		<link>http://blog.cloveretl.com/2009/12/09/parallelreader-versus-competitors-finish/</link>
		<comments>http://blog.cloveretl.com/2009/12/09/parallelreader-versus-competitors-finish/#comments</comments>
		<pubDate>Wed, 09 Dec 2009 15:09:09 +0000</pubDate>
		<dc:creator>Petr Uher</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[comparison]]></category>
		<category><![CDATA[ParallelReader]]></category>
		<category><![CDATA[talend]]></category>
		<category><![CDATA[pentaho]]></category>
		<category><![CDATA[CloverETL]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=319</guid>
		<description><![CDATA[As I have promised I bring you a complex comparison of ETL tools: CloverETL, Talend and Pentaho. Short summary of my previous posts: For testing I used two transformations based on TPCH test and the input data generated by dbgen utility. The transformations were run on my laptop with Windows Vista Home Premium. For detail [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=319&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>As I have promised I bring you a complex comparison of ETL tools: CloverETL, Talend and Pentaho.</p>
<p>Short summary of my previous posts: For testing I used two transformations based on TPCH test and the input data generated by dbgen utility. The transformations were run on my laptop with Windows Vista Home Premium. For detail information see <a href="http://blog.cloveretl.com/2009/10/26/parallelreader-versus-competitors/">part 1</a> and <a href="http://blog.cloveretl.com/2009/11/11/parallelreader-versus-competitors-part-2/">part 2</a>.</p>
<p><strong>New testing:</strong><br />
To ensure my comparison a full complexity, all tools were tested as &#8220;desktop&#8221; and &#8220;enterprise&#8221; ETL tools. The &#8220;desktop&#8221; tools were running on laptop computer with a small amount of data. The &#8220;enterprise&#8221; ETL tools were running on server class machine with a large amount of data stored both in flat files and in a database. The transformation executed on server class machine was the same as the one I executed on desktop, only the size of input data was changed:</p>
<ul>
<li>lineitem.tbl &#8211;  59,986,052 records,  7.24 GB</li>
<li>customers.tbl – 1,500,000 records, 233 MB</li>
<li>orders.tbl &#8211;  15,000,000 records, 1.62 GB</li>
</ul>
<p>The results of flat file reading:</p>
<p style="text-align:center;font-size:large;"><strong>TPCH-Q1</strong></p>
<p><a href="http://cloveretl.files.wordpress.com/2009/12/tpch11.png"><img src="http://cloveretl.files.wordpress.com/2009/12/tpch11.png?w=936&#038;h=334" alt="TPCH-Q1" title="TPCH-Q1" width="936" height="334" class="aligncenter size-full wp-image-328" /></a></p>
<p style="text-align:center;font-size:large;"><strong>TPCH-Q3</strong></p>
<p><a href="http://cloveretl.files.wordpress.com/2009/12/tpch22.png"><img src="http://cloveretl.files.wordpress.com/2009/12/tpch22.png?w=936&#038;h=334" alt="TPCH-Q3" title="TPCH-Q3" width="936" height="334" class="aligncenter size-full wp-image-329" /></a></p>
<p>The new results of database reading, all previously published results, detailed information about used hardware and a summary are available in <a href="http://www.cloveretl.com/_upload/clover-etl/Comparison_CloverETL_vs_Talend_Pentaho.pdf">this final document</a>.</p>
<p>I also described main features of all tools and my experiences to work with them. This part of the document expresses my opinions so it could be biased since I work mostly with CloverETL. If you don&#8217;t agree with anything, please express your opinion in comments. I will be pleased to discuss them with you.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/319/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/319/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/319/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=319&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/12/09/parallelreader-versus-competitors-finish/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cec19d3aab248f6f4591a97277323f2c?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">puher</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/12/tpch11.png" medium="image">
			<media:title type="html">TPCH-Q1</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/12/tpch22.png" medium="image">
			<media:title type="html">TPCH-Q3</media:title>
		</media:content>
	</item>
		<item>
		<title>ParallelReader Versus Competitors Part 2</title>
		<link>http://blog.cloveretl.com/2009/11/11/parallelreader-versus-competitors-part-2/</link>
		<comments>http://blog.cloveretl.com/2009/11/11/parallelreader-versus-competitors-part-2/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 16:08:24 +0000</pubDate>
		<dc:creator>Petr Uher</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[comparison]]></category>
		<category><![CDATA[ParallelReader]]></category>
		<category><![CDATA[talend]]></category>
		<category><![CDATA[pentaho]]></category>
		<category><![CDATA[CloverETL]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=271</guid>
		<description><![CDATA[Before we will release a complete comparison of open source ETL tools and after a success of my previous blog post I decided to publish the second transformation that we used in the comparison. The second transformation is also based on SQL query that I rewrote to ETL transformation. I chose Query 3 from http://www.tpc.org/tpch. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=271&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>Before we will release a complete comparison of open source ETL tools and after a success of <a href="http://blog.cloveretl.com/2009/10/26/parallelreader-versus-competitors/">my previous blog post</a> I decided to publish the second transformation that we used in the comparison.</p>
<p>The second transformation is also based on SQL query that I rewrote to ETL transformation. I chose <strong>Query 3</strong> from <a href="http://www.tpc.org/tpch" target="_blank">http://www.tpc.org/tpch</a>.</p>
<p><code><strong>select</strong><br />
l_orderkey,<br />
<strong>sum</strong>(l_extendedprice*(1-l_discount)) <strong>as</strong> revenue,<br />
o_orderdate,<br />
o_shippriority<br />
<strong>from</strong> customer, orders, lineitem<br />
<strong>where</strong> c_mktsegment = ‘BUILDING’<br />
<strong>and</strong> c_custkey = o_custkey<br />
<strong>and</strong> l_orderkey = o_orderkey<br />
<strong>and</strong> o_orderdate &lt; date ‘1995-03-15’<br />
<strong>and</strong> l_shipdate &gt; date ‘1995-03-15’<br />
<strong>group by</strong> l_orderkey, o_orderdate, o_shippriority<br />
<strong>order by</strong> revenue desc, o_orderdate</code></p>
<p>Input data are generated by <code>dbgen</code> utility and stored in CSV files.</p>
<ul>
<li>lineitem.tbl &#8211;  6,001,215 records, 724 MB</li>
<li>customers.tbl &#8211;  15,000 records, 23.2 MB</li>
<li>orders.tbl &#8211;  1,500,000 records, 163 MB</li>
</ul>
<p>Expected output should contain 11,620 records.</p>
<p>There is a new item in the results. After a discussion in <a href="http://blog.cloveretl.com/2009/10/26/parallelreader-versus-competitors/">my previous post</a> I added „Pentaho parallel“, Pentaho transformation that reads data in parallel mode. Thanks Matt for your transformation <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  without it I wasn&#8217;t able to set it up.</p>
<p>Matt Caster also presented an opinion that Pentaho is discriminated because the transformation sorts the data before aggregation in Pentaho transformation. Yes, I agree that sorting of 6,000,000 records takes a significant amount of execution time of the transformation. But I have no choice, Pentaho aggregate component requires sorted input. Today&#8217;s transformation is more fair in this aspect. The number of records flowing to aggregate component is smaller (30,519 records) so they can be easily sorted in memory and the sorting doesn&#8217;t influence the total execution time in such volume.</p>
<p>The versions of used ETL tools stay the same ones: CloverETL Designer 2.8.1, Talend Open Studio 3.1.3 and Pentaho Data Integration 3.2.0.</p>
<p>Also the hardware configuration and Java runtime parameters are the same:</p>
<ul>
<li>Intel Core 2 Duo @ 1666 Mhz, 2048 MB RAM, 200GB SATA 5400 RPM, Windows Vista Home Premium 32bit.</li>
<li><code>-server -Xmx256m -Xmx1536m</code></li>
</ul>
<p><strong>Results:</strong></p>
<ol>
<li>CloverETL ParallelReader</li>
<li>Talend</li>
<li>Pentaho parallel</li>
<li>CloverETL UniversalDataReader</li>
<li>Pentaho</li>
</ol>
<p><img class="aligncenter size-full wp-image-276" title="Results" src="http://cloveretl.files.wordpress.com/2009/11/results.png?w=937&#038;h=335" alt="Results" width="937" height="335" /></p>
<p>Transformations and the input data are available on <a href="http://www.filefactory.com/file/a09gc3b/n/ParallelReaderComparisonPart2.zip" target="_blank">filefactory.com</a>. Today&#8217;s transformation are named TPCH2. The transformation from my previous post are named TPCH1.</p>
<p>Please give me a feedback, especially on Talend transformation if it&#8217;s correct.</p>
<h2>Transformation graphs</h2>
<div id="attachment_278" class="wp-caption aligncenter" style="width: 710px"><img class="size-full wp-image-278" title="CloverETL ParallelReader &amp; UniversalDataReader" src="http://cloveretl.files.wordpress.com/2009/11/cloveretl_tpch2_parallelreader.png?w=700&#038;h=158" alt="CloverETL ParallelReader &amp; UniversalDataReader" width="700" height="158" /><p class="wp-caption-text">CloverETL ParallelReader &amp; UniversalDataReader</p></div>
<div id="attachment_279" class="wp-caption aligncenter" style="width: 710px"><img class="size-full wp-image-279" title="Talend" src="http://cloveretl.files.wordpress.com/2009/11/talend_tpch2.png?w=700&#038;h=302" alt="Talend" width="700" height="302" /><p class="wp-caption-text">Talend</p></div>
<div id="attachment_280" class="wp-caption aligncenter" style="width: 710px"><img class="size-full wp-image-280" title="Pentaho" src="http://cloveretl.files.wordpress.com/2009/11/pentaho_tpch2.png?w=700&#038;h=304" alt="Pentaho" width="700" height="304" /><p class="wp-caption-text">Pentaho</p></div>
<div id="attachment_281" class="wp-caption aligncenter" style="width: 710px"><img class="size-full wp-image-281" title="Pentaho parallel" src="http://cloveretl.files.wordpress.com/2009/11/pentaho_tpch2_parallel.png?w=700&#038;h=258" alt="Pentaho parallel" width="700" height="258" /><p class="wp-caption-text">Pentaho parallel</p></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/271/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/271/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/271/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/271/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/271/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/271/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/271/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/271/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/271/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/271/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=271&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/11/11/parallelreader-versus-competitors-part-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cec19d3aab248f6f4591a97277323f2c?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">puher</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/11/results.png" medium="image">
			<media:title type="html">Results</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/11/cloveretl_tpch2_parallelreader.png" medium="image">
			<media:title type="html">CloverETL ParallelReader &#38; UniversalDataReader</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/11/talend_tpch2.png" medium="image">
			<media:title type="html">Talend</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/11/pentaho_tpch2.png" medium="image">
			<media:title type="html">Pentaho</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/11/pentaho_tpch2_parallel.png" medium="image">
			<media:title type="html">Pentaho parallel</media:title>
		</media:content>
	</item>
		<item>
		<title>New level of parallelism in CloverETL</title>
		<link>http://blog.cloveretl.com/2009/11/04/new-level-of-parallelism-in-cloveretl/</link>
		<comments>http://blog.cloveretl.com/2009/11/04/new-level-of-parallelism-in-cloveretl/#comments</comments>
		<pubDate>Wed, 04 Nov 2009 12:28:07 +0000</pubDate>
		<dc:creator>mvarecha</dc:creator>
				<category><![CDATA[Developing Clover]]></category>
		<category><![CDATA[CloverETL]]></category>
		<category><![CDATA[cluster]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[load-balancing]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[Server]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=264</guid>
		<description><![CDATA[For the upcoming release of CloverETL 2.9, we are working on improvements in CloverETL Server which will allow run transformations in parallel on multiple cluster nodes. CloverETL Server already supports clustering, so more instances may cooperate to each other. Current stable version already implements common cluster features: fail-over/high-availability and scalability of lots of requests which [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=264&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>For the upcoming release of CloverETL 2.9, we are working on improvements in CloverETL Server which will allow run transformations in parallel on multiple cluster nodes.</p>
<p>CloverETL Server already supports clustering, so more instances may cooperate to each other. Current stable version already implements common cluster features: fail-over/high-availability and scalability of lots of requests which are load-balanced on available cluster nodes. These features are actually implemented since version 1.3.</p>
<p><strong>The basic concept of new parallelism</strong><br />
Transformation may be automatically executed in parallel on more cluster nodes according to configuration and each of these &#8220;worker&#8221; transformations processes just its part of data. Because there is one &#8220;master&#8221; transformation, which manages the other transformations and which gathers tracking data from &#8220;worker&#8221; transformations, the parallelism is transparent for CloverETL Server client. Client by default &#8220;sees&#8221; just one (master) execution and aggregated tracking data. However there are still logs and tracking data for each of &#8220;worker&#8221; transformations, so it&#8217;s still possible to inspect details of this parallel execution. &#8220;Worker&#8221; transformations outputs are gathered to the &#8220;master&#8221;, thus client has one single transformation output which may be processed further.</p>
<p><strong>So how to get parts of input data?</strong><br />
Basically, transformation can process data which is already partitioned, which is the best case and there is no overhead with partitioning of data, or CloverETL Server itself can partition input data from one single source and distribute data on the fly (during the transformation) to several cluster nodes using the network connection. Overhead of this operation depends on the speed of network communication and other conditions.</p>
<p><strong>Design changes in the graph</strong><br />
We aim to keep the transformation graph almost the same as it would be for &#8220;standalone&#8221; execution. Thus there will be just a couple of extra components in the graph which is intended to run in parallel. These components will handle partitioning/departitioning of data in case it&#8217;s not already partitioned.</p>
<p><strong>Scalability</strong><br />
The new parallelism in CloverETL Server is a giant leap for scalability of the transformations. Ever since the graph is designed for paraller run, the number of computers which run this transformation depends just on cluster configuration. Graph itself is still the same. Configuration of the parallelism includes:</p>
<ul>
<li>working CloverETL Server cluster, thus standalone server instances won&#8217;t be able to handle such execution</li>
<li>&#8220;partitioned&#8221; sandbox(see below) with list of locations</li>
</ul>
<p><strong>New sandbox types</strong><br />
On server side, graphs and related files are organized in so-called sandboxes. Until version 2.8, there was just one type: &#8220;shared&#8221; sandbox. It means that it contains the same files and directory structure on all cluster nodes. Since version 2.9 there will be two more types:</p>
<ul>
<li>&#8220;local&#8221; sandbox &#8211; is (locally) accessible on just one cluster node. It&#8217;s intended for huge input/output data which is not intended to be shared/replicated among multiple cluster nodes.</li>
<li>&#8220;partitioned&#8221; sandbox &#8211; each of its physical location contains just part of data. It&#8217;s intended as a storage for partitioned input/output data of transformations which are supposed to run in parallel. List of physical locations actually specifies nodes which will run &#8220;worker&#8221; transformations.</li>
</ul>
<p><strong>Master &#8211; worker responsibilities</strong><br />
Master observes all related workers and when some transformation phase is finished on all workers, it&#8217;s master&#8217;s responsibility to allow the workers to process next phase. When any of the workers fails from any reason, it&#8217;s master&#8217;s responsibility to abort all the other workers and select whole execution as failed. Master/worker &#8211; These terms have meaning only in the scope of one transformation. Since 2.9 there is no privileged node configured as &#8220;master&#8221; in the cluster, but it doesn&#8217;t mean that all the nodes are equal. There may be differences between nodes in accessibility to physical sources. Configuration of sandboxes should reflect it.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/264/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/264/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/264/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/264/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/264/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/264/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/264/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/264/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/264/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/264/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=264&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/11/04/new-level-of-parallelism-in-cloveretl/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/998f04e59afe9c312019cab4da3f99be?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">mvarecha</media:title>
		</media:content>
	</item>
	</channel>
</rss>