<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>CloverETL&#039;s Blog &#187; ETL</title>
	<atom:link href="http://blog.cloveretl.com/tag/etl/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cloveretl.com</link>
	<description>Life, the Universe, CloverETL and everything ...</description>
	<lastBuildDate>Thu, 15 Jul 2010 14:12:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.cloveretl.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/dd4c2411bcdf90b36e88bda58e3fce7c?s=96&#038;d=http://s2.wp.com/i/buttonw-com.png</url>
		<title>CloverETL&#039;s Blog &#187; ETL</title>
		<link>http://blog.cloveretl.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.cloveretl.com/osd.xml" title="CloverETL&#039;s Blog" />
	<atom:link rel='hub' href='http://blog.cloveretl.com/?pushpress=hub'/>
		<item>
		<title>Building DWH with CloverETL: Slowly Changing Dimension Type 2</title>
		<link>http://blog.cloveretl.com/2010/05/27/building-dwh-with-cloveretl-scd2/</link>
		<comments>http://blog.cloveretl.com/2010/05/27/building-dwh-with-cloveretl-scd2/#comments</comments>
		<pubDate>Thu, 27 May 2010 01:39:40 +0000</pubDate>
		<dc:creator>Petr Uher</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[CloverETL]]></category>
		<category><![CDATA[dwh]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[scd2]]></category>
		<category><![CDATA[slowly changing dimension]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=668</guid>
		<description><![CDATA[In the last part of our data warehouse (DWH) tutorial, I showed you how to load a dimension table that stores historical data according to the Slowly Changing Dimension Type 1 (SCD1). In today’s post, I will focus on a Slowly Changing Dimension Type 2 (SCD2) dimension table. I think that SCD2 is the most [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=668&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/">In the last part</a> of our data warehouse (DWH) tutorial, I showed you how to load a dimension table that stores historical data according to the Slowly Changing Dimension Type 1 (SCD1). In today’s post, I will focus on a Slowly Changing Dimension Type 2 (SCD2) dimension table. I think that SCD2 is the most challenging sub-task of ETL part of DWH design and each ETL architect should be able to deal with it.</p>
<p>In contrast to SCD1, SCD2 table stores preserves history of attributes. So once the value of attribute is changed in external system  (OLTP) we have to create a new record in SCD2 dimension table with the actual value but we also have to mark the old record in SCD2 table as obsolete. The most common way to obsolete the record is to maintain two additional attributes: valid_from and valid_to. Then the record is considered valid at particular date D when valid_from &lt; D ≤ valid_to. You can find a detailed explanation of SCD2 principles in <a href="http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1274661375&amp;sr=8-1">Kimball’s DWH bible</a> or on <a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2">wikipedia.org</a>.</p>
<p>Let us show how SCD2 works in real in a small example. We will use DWH schema introduced in SCD1’s post.</p>
<p style="text-align:left;"><a href="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png"><img class="size-full wp-image-218 aligncenter" title="DB schema of sample DWH" src="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png?w=648&#038;h=273" alt="" width="648" height="273" /></a></p>
<p>It consists of four dimensions (Customer, Product, Store and Date), one degenerate dimension (invoice number) and one fact table (Sales). Fact table stores two additive facts: units and total price.</p>
<p>Store table is populated as SCD1 and we will load Customer table that was marked as SCD2 dimension table. Let’s imagine that Customer changed his email. What will happen in OLTP and DWH Customer table named D_CUSTOMER?</p>
<p><strong>OLTP:</strong></p>
<p>C0001;John;Newman;john.newman@hotmail.com <span style="color:#0000ff;">=&gt;</span> C0001;John;Newman;newman.john@gmail.com</p>
<p><strong>DWH:</strong></p>
<p>0001;C0001;John;Newman;john.newman@hotmail.com;2009-10-10;<span style="color:#ff0000;">null</span> <span style="color:#0000ff;">=&gt;</span><br />
0001;C0001;John;Newman;john.newman@hotmail.com;2009-10-10;<span style="color:#ff0000;">2010-05-20</span><br />
0002;C0001;John;Newman;newman.john@gmail.com;2010-05-21;null</p>
<p>Notice especially the first two attributes (columns) and the last two attributes of DWH table. The first attribute is a surrogate key, it is a unique identifier of the record in D_CUSTOMER table. It is generated by ETL process.  The second one (C0001) is a natural key, a unique identifier of customer in OLTP. When you list all records of the same natural key in D_CUSTOMER you will get a complete history of one customer.</p>
<p>The principle how SCD2 works is explained now I will describe an implementation of SCD2 in CloverETL. See the CloverETL’s graph bellow.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/05/d_customer_scd2.png"><img class="aligncenter size-full wp-image-670" title="D_CUSTOMER_SCD2" src="http://cloveretl.files.wordpress.com/2010/05/d_customer_scd2.png?w=799&#038;h=166" alt="" width="799" height="166" /></a></p>
<p>The basic data-flow of the graph is very simple: in the lower branch we read the data from OLTP (for us as in previous DWH post it’s a CSV file), in the upper branch we read data that is already stored in our SCD2 table. We have to use the Dedup component, as we want only one actual record for each customer. The two branches intersect in DataIntersection component that processes the records according to the natural key. The component has three output ports, as there are three possible outcomes:</p>
<ol>
<li>The record exists only in DWH. This should not happen, it means that the record was deleted in OLTP. The “normal” OLTPs do not allow delete of records. That kind of records end in Trash component.</li>
<li>In DWH table there exists at least one record with the same natural key as the record coming from OLTP. That record goes through the second output port to the component that identifies whether the record was changed (ExtFilter component). And then the record is copied to two records: the first one that obsoletes the current record in D_CUSTOMER (identified by surrogate key) and  the second one that is inserted to D_CUSTOMER and stores the new values read from OLTP. The first one set column valid_to = today()-1 and the second record is inserted with valid_from = today() and valid_to = null.</li>
<li>The record coming from OLTP is a new one, there is no record with the same natural key in DWH. In that case the record is sent to the third output port and in following components is inserted to D_CUSTOMER table with valid_from = today() and valid_to = null.</li>
</ol>
<p>If you want to verify that your CloverETL SCD2 graph works correctly or if you are looking for sample data, you can simply import example project to your Clover installation. It is embedded to your CloverETL Designer as a DWHExample project. For more information how to import example project see <a href="http://www.cloveretl.com/documentation/UserGuide/topic/com.cloveretl.gui.docs/docs/cloveretl-examples-project.html">online documentation</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/668/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=668&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/05/27/building-dwh-with-cloveretl-scd2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cec19d3aab248f6f4591a97277323f2c?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">puher</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png" medium="image">
			<media:title type="html">DB schema of sample DWH</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/05/d_customer_scd2.png" medium="image">
			<media:title type="html">D_CUSTOMER_SCD2</media:title>
		</media:content>
	</item>
		<item>
		<title>CloverETL as a high-throughput XML processor</title>
		<link>http://blog.cloveretl.com/2010/05/20/cloveretl-as-a-high-throughput-xml-processor/</link>
		<comments>http://blog.cloveretl.com/2010/05/20/cloveretl-as-a-high-throughput-xml-processor/#comments</comments>
		<pubDate>Thu, 20 May 2010 15:19:34 +0000</pubDate>
		<dc:creator>jlehotsky</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[DOM]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[SAX]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[XMLSchema]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=654</guid>
		<description><![CDATA[XML is a markup language that has been around for some years now. Originally, it comes from the world of documents &#8211; used in web hypertext, word processors and other representations. Today, it is very popular in many areas, including the world of data exchange. The reasons are simple &#8211; the format is straightforward, well [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=654&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>XML is a markup language that has been around for some years now. Originally, it comes from the world of documents &#8211; used in web hypertext, word processors and other representations. Today, it is very popular in many areas, including the world of data exchange. The reasons are simple &#8211; the format is straightforward, well defined, and easily transferable accross platforms. XML can be easily read and modified by users in contrast to proprietary and binary formats. It also represents structured hierarchical data, which can be very difficult to express in plain CSV format. XML is self-descriptive, which heavily increases the user&#8217;s ability to understand data and eliminates the need of data format description and parsing instructions.</p>
<p>XML is often used to transport data between potentionally incompatible systems, resulting in a task to parse and store data of this format and eventually to process this data. CloverETL provides powerful tools to accomplish this task.</p>
<p>One of the components that provides XML parsing is XMLXPathReader. The user simply defines the mapping of each data element or attribute to a given CloverETL field. In the background of the component there is a DOM parser which allows the user to include general XPath expressions in the mapping definition.</p>
<p>In practice, users will often encounter vast XML files, which typically follow a standard structure. This structure contains records which represent a given entity (company, person, etc.) that can be repeated many times in a large XML data source. It is quite common that these sources of data come in sizes of 10s or even 100s of gigabytes. When this happens, DOM parsing is greatly inappropriate as all this data cannot be contained in memory. For this reason, another CloverETL XML parsing component becomes handy &#8211; XMLExtract. This handles records individually which are usually quite small, at least small enough to be processed in memory.</p>
<p>In XMLExtract, the user is able to define how each element can be mapped to a CloverETL record at every level of the XML structure . XMLExtract also provides the possibility of including a parent key at each structure level, thus allowing later complete reconstructions of the entire data structure. If the XML does not contain the unique key itself, it can also be easily generated using a CloverETL sequence object.</p>
<p>XML data and their basic integrity rules can be very well specified using XML Schema which today is a standard part of well defined data exchange. If you use XML Schema, CloverETL provides a very convenient visual drag&amp;drop editor which helps the user build an XML mapping:</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/05/attach1.jpg"><img class="aligncenter size-full wp-image-657" title="Mapping" src="http://cloveretl.files.wordpress.com/2010/05/attach1.jpg?w=802&#038;h=561" alt="" width="802" height="561" /></a></p>
<p>This screenshot represents an XML mapping which defines how XML and Clover fields are mapped. This mapping can also be displayed as text:</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/05/attach2.jpg"><img class="aligncenter size-full wp-image-658" title="Mapping definition" src="http://cloveretl.files.wordpress.com/2010/05/attach2.jpg?w=802&#038;h=561" alt="" width="802" height="561" /></a></p>
<p>To provide an example where these methods were essential, CloverETL successfully completed a master data consolidation and matching project for an international insurance company. The XML Schemas were very complex, containing hundreds of different XML element types in its structure. The volume of data was over a hundred GBs describing tens of millions of customers as organizations and 4-5 million customers as persons. One of the many tasks assigned to CloverETL was to read and store the vast amount of data in XML in which it performed substantially greater due to XML&#8217;s fast sequential processing.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/654/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/654/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/654/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/654/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/654/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/654/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/654/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/654/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/654/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/654/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=654&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/05/20/cloveretl-as-a-high-throughput-xml-processor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f39a297af252727614a56914e6c234a4?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">jlehotsky</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/05/attach1.jpg" medium="image">
			<media:title type="html">Mapping</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/05/attach2.jpg" medium="image">
			<media:title type="html">Mapping definition</media:title>
		</media:content>
	</item>
		<item>
		<title>Data profiling</title>
		<link>http://blog.cloveretl.com/2010/03/31/data-profiling/</link>
		<comments>http://blog.cloveretl.com/2010/03/31/data-profiling/#comments</comments>
		<pubDate>Wed, 31 Mar 2010 12:15:15 +0000</pubDate>
		<dc:creator>Agata Vackova</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[data profiling]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[profiler]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=537</guid>
		<description><![CDATA[Before you start to develop any data transformation you should explore your data (make data profiling). There are a lot of tools on the market that can help you. But why to install and learn another software when you can use the tool you are familiar with? CloverETL is mainly data transformation tool but it [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=537&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>Before you start to develop any data transformation you should explore your data (make data profiling). There are a lot of tools on the market that can help you. But why to install and learn another software when you can use the tool you are familiar with? CloverETL is mainly data transformation tool but it can be easily used for data profiling as well (as I will show you in this blog post).</p>
<p>It is very easy to do data statistic with latest version of CloverETL. You can find <em>DataProfiling</em> project in <a title="CloverETL Examples Project" href="http://www.cloveretl.com/documentation/UserGuide/topic/com.cloveretl.gui.docs/docs/cloveretl-examples-project.html">CloverETL Examples Projects</a>. The project consists of two graphs: <em>BasicStatistic</em> and <em>AdvancedStatistic</em>.</p>
<p>The first one finds basic statistic for input data file:</p>
<ul>
<li>minimum value for numeric fields or minimum length of data for string and byte fields</li>
<li>maximum value for numeric fields or maximum length of data for string and byte fields</li>
<li>average value for numeric fields or average length of data for string and byte fields</li>
<li>number of records in data file</li>
<li>number of not null values for each data field</li>
<li>number of null values for each data field</li>
</ul>
<p>Additionally, for string data fields, it finds:</p>
<ul>
<li>first not null value</li>
<li>if all values are ASCII</li>
</ul>
<p>The second one calculates for each data field:</p>
<ul>
<li>number of records in data file</li>
<li>number of not null values</li>
<li>number of unique values</li>
<li>minimum value</li>
<li>maximum value</li>
<li>average value for numeric fields</li>
<li>median value</li>
<li>modus value</li>
</ul>
<p>It also finds frequency counts for fields with not many (the threshold is defined by a parameter <code>HISTOGRAM_THRESHOLD</code>) unique values.</p>
<h4><strong>BasicStatistic graph</strong></h4>
<p>The graphs in the project are prepared to analyze data from the excel file <code>ORDERS.xls</code> placed in <code>data-in</code> directory. But for purpose of this post we will analyze data stored in a flat file <code>employees.list.dat</code> (also placed in <code>data-in</code> directory).</p>
<p>To do that we need to set following parameters:</p>
<p>input_file=${DATAIN_DIR}/employees.list.dat</p>
<p>metadata=${META_DIR}/employees.fmt</p>
<p>READER_TYPE=DATA_READER</p>
<p>Metadata file (<code>employees.fmt</code>) has to contain metadata for <code>employees.list.dat</code>:</p>
<pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;Record name="EMPLOYEE" recordDelimiter="\n" recordSize="-1" type="delimited"&gt;
&lt;Field delimiter="," format="#" name="EMP_NO" nullable="true" shift="0" type="integer"/&gt;
&lt;Field delimiter="," name="FIRST_NAME" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="LAST_NAME" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="PHONE_EXT" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," format="dd/MM/yyyy" name="HIRE_DATE" nullable="true" shift="0" type="date"/&gt;
&lt;Field delimiter="," name="DEPT_NO" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="JOB_CODE" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="JOB_GRADE" nullable="true" shift="0" type="numeric"/&gt;
&lt;Field delimiter="," name="JOB_COUNTRY" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="SALARY" nullable="true" shift="0" type="numeric"/&gt;
&lt;Field name="FULL_NAME" nullable="true" shift="0" type="string"/&gt;
&lt;/Record&gt;</pre>
<p>Lets see the graph with mid-results and output:</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/03/basicstatistic.png"><img class="alignnone size-full wp-image-538" title="BasicStatistic" src="http://cloveretl.files.wordpress.com/2010/03/basicstatistic.png?w=1024&#038;h=668" alt="" width="1024" height="668" /></a></p>
<p>DataReader parses data from the input file. Normalizer creates record that consists of three basic fields: original field name, field type and &#8220;normalized&#8221; value, which is original value for numeric fields,  time in milliseconds for date fields and data length for string or byte fields. Moreover this component collects basic information about string data e.g.: finds first not null value, checks if string contains only ASCII characters and if it can be converted to number. The next component (Rollup – Statistic) calculates minimum, maximum and average value for each group of records (with the same field name). It also propagates first not null value, checks if all isAscii and isNumber fields are not false and sets result value for the whole group. Fourth and fifth component has only “cosmetic” aim – they convert times in milliseconds back to user friendly form and sort output records. The writer converts results to report.</p>
<p>By default the final report is a plain html file:</p>
<p><strong>Data statistic for data-in/employees.list.dat</strong></p>
<table border="1" cellspacing="3" cellpadding="2" width="1143">
<col span="1" width="122"></col>
<col span="1" width="73"></col>
<col span="1" width="138"></col>
<col span="1" width="138"></col>
<col span="1" width="146"></col>
<col span="1" width="44"></col>
<col span="1" width="100"></col>
<col span="1" width="74"></col>
<col span="1" width="102"></col>
<col span="1" width="53"></col>
<col span="1" width="73"></col>
<tbody>
<tr>
<th width="122">Field name</th>
<th width="73">Field type</th>
<th width="138">min</th>
<th width="138">max</th>
<th width="146">average</th>
<th width="44">count</th>
<th width="100">count not null</th>
<th width="74">count null</th>
<th width="102">first not null</th>
<th width="53">is Ascii</th>
<th width="73">is number</th>
</tr>
<tr>
<td width="122">DEPT_NO</td>
<td width="73">string</td>
<td width="138">3.0</td>
<td width="138">3.0</td>
<td width="146">3.0</td>
<td width="44">51</td>
<td width="100">43</td>
<td width="74">8</td>
<td width="102">600</td>
<td width="53">true</td>
<td width="73">true</td>
</tr>
<tr>
<td width="122">EMP_NO</td>
<td width="73">integer</td>
<td width="138">0.0</td>
<td width="138">145.0</td>
<td width="146">58.392156862745104</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102"> </td>
<td width="53">true</td>
<td width="73">true</td>
</tr>
<tr>
<td width="122">FIRST_NAME</td>
<td width="73">string</td>
<td width="138">3.0</td>
<td width="138">11.0</td>
<td width="146">5.549019607843137</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">Robert</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">FULL_NAME</td>
<td width="73">string</td>
<td width="138">10.0</td>
<td width="138">22.0</td>
<td width="146">14.549019607843137</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">Nelson, Robert</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">HIRE_DATE</td>
<td width="73">date</td>
<td width="138">28/12/1988 00:00:00</td>
<td width="138">15/11/1994 00:00:00</td>
<td width="146"> </td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102"> </td>
<td width="53"> </td>
<td width="73"> </td>
</tr>
<tr>
<td width="122">JOB_CODE</td>
<td width="73">string</td>
<td width="138">2.0</td>
<td width="138">24.0</td>
<td width="146">6.1568627450980395</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">VP</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">JOB_COUNTRY</td>
<td width="73">string</td>
<td width="138">2.0</td>
<td width="138">11.0</td>
<td width="146">3.6470588235294117</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">USA</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">JOB_GRADE</td>
<td width="73">number</td>
<td width="138">0.0</td>
<td width="138">5.0</td>
<td width="146">3.4117647058823524</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102"> </td>
<td width="53">true</td>
<td width="73">true</td>
</tr>
<tr>
<td width="122">LAST_NAME</td>
<td width="73">string</td>
<td width="138">3.0</td>
<td width="138">12.0</td>
<td width="146">6.8431372549019605</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">Nelson</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">PHONE_EXT</td>
<td width="73">string</td>
<td width="138">1.0</td>
<td width="138">5.0</td>
<td width="146">2.8627450980392157</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">250</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">SALARY</td>
<td width="73">number</td>
<td width="138">0.0</td>
<td width="138">9.9E7</td>
<td width="146">2267125.137254902</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102"> </td>
<td width="53">true</td>
<td width="73">true</td>
</tr>
</tbody>
</table>
<p>but it can be easily changed to excel file (just adjust graph parameter <code>WRITER_TYPE=XLS_WRITER</code>).</p>
<h4><strong>AdvancedStatistic graph</strong></h4>
<p><a href="http://cloveretl.files.wordpress.com/2010/03/advancedstatistic.png"><img class="alignnone size-full wp-image-539" title="AdvancedStatistic" src="http://cloveretl.files.wordpress.com/2010/03/advancedstatistic.png?w=1024&#038;h=561" alt="" width="1024" height="561" /></a></p>
<p>The phase 0 of <em>AdvancedStatistic</em> graph works similarly as graph <em>BasicStatistic</em>, but it uses Aggregators for statistic calculations instead of Rollup. Be particular about component called Simplification in phase 0 of the graph: it stores number of records in file and names of fields with number of unique values under threshold in dictionary (marked by red ellipses on the picture above). Then Histogram filter component can read this field&#8217;s names and skip the records that aren&#8217;t between fields for frequency calculations (green eclipses on the picture). Phase 1 Aggregators count frequencies for fields that were filtered out by previous component.</p>
<p>Resulting file looks as follows:</p>
<p><strong>Advanced data statistic and histograms for ./data-in/employees.list.dat</strong></p>
<p>Statistics</p>
<table border="1">
<tbody>
<tr>
<th>Field name</th>
<th>Field type</th>
<th>min</th>
<th>max</th>
<th>average number</th>
<th>count</th>
<th>count not null</th>
<th>count unique</th>
<th>median</th>
<th>modus</th>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>000</td>
<td>900</td>
<td> </td>
<td>51</td>
<td>43</td>
<td>20</td>
<td>600</td>
<td>623</td>
</tr>
<tr>
<td>EMP_NO</td>
<td align="center">integer</td>
<td>0.0</td>
<td>145.0</td>
<td>58.3921568627451</td>
<td>51</td>
<td>51</td>
<td>47</td>
<td>45.0</td>
<td>2.0</td>
</tr>
<tr>
<td>FIRST_NAME</td>
<td align="center">string</td>
<td>Andrew</td>
<td>Yuki</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>44</td>
<td>Mark</td>
<td>Robert</td>
</tr>
<tr>
<td>FULL_NAME</td>
<td align="center">string</td>
<td>Baldwin, Janet</td>
<td>Young, Katherine</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>50</td>
<td>Lee, Terri</td>
<td>Sutherland, Claudia</td>
</tr>
<tr>
<td>HIRE_DATE</td>
<td align="center">date</td>
<td>28/12/1988 00:00:00</td>
<td>15/11/1994 00:00:00</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>44</td>
<td>20/04/1992 00:00:00</td>
<td>02/01/1994 00:00:00</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Admin</td>
<td>Vice President</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>16</td>
<td>Mktg</td>
<td>Eng</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Canada</td>
<td>USA</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>9</td>
<td>USA</td>
<td>USA</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>0.0</td>
<td>5.0</td>
<td>3.411764705882353</td>
<td>51</td>
<td>51</td>
<td>6</td>
<td>4.0</td>
<td>4.0</td>
</tr>
<tr>
<td>LAST_NAME</td>
<td align="center">string</td>
<td>Baldwin</td>
<td>Young</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>48</td>
<td>Lambert</td>
<td>Johnson</td>
</tr>
<tr>
<td>PHONE_EXT</td>
<td align="center">string</td>
<td>1</td>
<td>null</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>46</td>
<td>3355</td>
<td>null</td>
</tr>
<tr>
<td>SALARY</td>
<td align="center">number</td>
<td>0.0</td>
<td>9.9E7</td>
<td>2267125.137254902</td>
<td>51</td>
<td>51</td>
<td>40</td>
<td>61637.8125</td>
<td>0.0</td>
</tr>
</tbody>
</table>
<p>Histograms</p>
<table border="1">
<tbody>
<tr>
<th>Field name</th>
<th>Field type</th>
<th>value</th>
<th>count</th>
<th>count %</th>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td> </td>
<td>8</td>
<td>
<code style="color:#333333;">###############·····················································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>000</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>100</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>110</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>115</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>120</td>
<td>3</td>
<td>
<code style="color:#333333;">#####·······························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>121</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>123</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>125</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>130</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>140</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>180</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>600</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>621</td>
<td>4</td>
<td>
<code style="color:#333333;">#######·····························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>622</td>
<td>3</td>
<td>
<code style="color:#333333;">#####·······························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>623</td>
<td>5</td>
<td>
<code style="color:#333333;">#########···························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>670</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>671</td>
<td>3</td>
<td>
<code style="color:#333333;">#####·······························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>672</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>900</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Admin</td>
<td>4</td>
<td>
<code style="color:#333333;">#######·····························································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>CEO</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>CFO</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Dir</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Doc</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Eng</td>
<td>15</td>
<td>
<code style="color:#333333;">#############################·······································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Finan</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Inside Sales Coordinator</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Mktg</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Mngr</td>
<td>4</td>
<td>
<code style="color:#333333;">#######·····························································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>PRel</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>SRep</td>
<td>9</td>
<td>
<code style="color:#333333;">#################···················································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Sales</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Sales Representative</td>
<td>6</td>
<td>
<code style="color:#333333;">###########·························································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>VP</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Vice President</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Canada</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>England</td>
<td>3</td>
<td>
<code style="color:#333333;">#####·······························································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>France</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Italy</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Japan</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Sales</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Switzerland</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>UK</td>
<td>4</td>
<td>
<code style="color:#333333;">#######·····························································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>USA</td>
<td>36</td>
<td>
<code style="color:#333333;">######################################################################······························</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>0.0</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>1.0</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>2.0</td>
<td>8</td>
<td>
<code style="color:#333333;">###############·····················································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>3.0</td>
<td>14</td>
<td>
<code style="color:#333333;">###########################·········································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>4.0</td>
<td>16</td>
<td>
<code style="color:#333333;">###############################·····································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>5.0</td>
<td>10</td>
<td>
<code style="color:#333333;">###################·················································································</code>
</td>
</tr>
</tbody>
</table>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/537/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=537&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/03/31/data-profiling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/934c88184df6c0034450ae00a1695ee8?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">agad</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/03/basicstatistic.png" medium="image">
			<media:title type="html">BasicStatistic</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/03/advancedstatistic.png" medium="image">
			<media:title type="html">AdvancedStatistic</media:title>
		</media:content>
	</item>
		<item>
		<title>CloverETL version 2.9 was released. It adds Infobright Data Writer, Web Services component and other new features.</title>
		<link>http://blog.cloveretl.com/2010/02/01/cloveretl-version-2-9-was-released-it-adds-infobright-data-writer-web-services-component-and-other-new-features/</link>
		<comments>http://blog.cloveretl.com/2010/02/01/cloveretl-version-2-9-was-released-it-adds-infobright-data-writer-web-services-component-and-other-new-features/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 10:18:59 +0000</pubDate>
		<dc:creator>Lucie Felixova</dc:creator>
				<category><![CDATA[Developing Clover]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[LDAP]]></category>
		<category><![CDATA[release]]></category>
		<category><![CDATA[webservice]]></category>
		<category><![CDATA[XLS]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=412</guid>
		<description><![CDATA[New CloverETL version 2.9. was just released. This version brings a new Infobright Data Writer component, enhances the connectivity by adding Web Services component and adds features that simplify common data transformation tasks. New Features and Components: Infobright Data Writer In response to customer requests, this component writes data into Infobright software, a column-oriented relational database. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=412&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>New CloverETL version 2.9. was just released. This version brings a new Infobright Data Writer component, enhances the connectivity by adding Web Services component and adds features that simplify common data transformation tasks.<br />
<strong></strong></p>
<p><strong><span style="font-size:larger;text-decoration:underline;">New Features and Components:</span></strong><br />
<strong>Infobright Data Writer </strong><br />
In response to customer requests, this component writes data into Infobright software, a column-oriented relational database. Infobright is a provider of solutions designed to deliver a scalable data warehouse optimized for analytic queries.</p>
<p><strong>Web Services component </strong><br />
The new component makes communication with Web Services easier than ever. It provides user friendly graphical interface for mapping your data into Web Service fields, automatically generates requests and process responses. It offers faster, easier and more comfortable way to interact with remote Web Services.<br />
<strong></strong></p>
<p><strong>Reading formatted values from XLS</strong><br />
Additionally to reading plain data from Microsoft<sup>TM</sup> Excel<sup>TM</sup> sheets, the Excel component is now also capable of reading user-formatted values such as currencies, dates or numbers.<br />
<strong></strong></p>
<p><strong>New tracking option</strong><br />
Customers can now see all absolute speed rates for finished data transformations, facilitating comparative analysis in pursuit of process improvements.<br />
<strong></strong></p>
<p><strong>New Aspell Lookup table</strong><br />
Brand new implementation of this component brings better performance, improved configuration and better customization.<br />
<strong></strong></p>
<p><strong>Improved treatment of empty (NULL) values</strong><br />
Developers can now specify special strings that should be treated as empty (NULL) when data is being parsed. This feature simplifies processing of typical application export files which often contain values insignificant for ETL processing. Additionally it may lead to improved processing throughput and lower memory consumption of data transformation.</p>
<p>More user friendly <strong>File URL dialog</strong> and improved <strong>LDAP </strong>functionality.</p>
<p>Customers can evaluate these new features along with CloverETL’s other leading capabilities with a free 30-day trial of the CloverETL Designer Pro evaluation, which is available at <a href="http://www.cloveretl.com/">www.cloveretl.com</a> Information management professionals can also evaluate the enterprise integration features of CloverETL Server via an online demo, which is also available at <a href="http://www.cloveretl.com/">www.cloveretl.com</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/412/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/412/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/412/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/412/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/412/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/412/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/412/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/412/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/412/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/412/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=412&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/02/01/cloveretl-version-2-9-was-released-it-adds-infobright-data-writer-web-services-component-and-other-new-features/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/36840395aaef2cf9186f9dcdc1cb947e?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">Lucie Felixova</media:title>
		</media:content>
	</item>
		<item>
		<title>Parallel reader</title>
		<link>http://blog.cloveretl.com/2009/10/23/parallel-reader/</link>
		<comments>http://blog.cloveretl.com/2009/10/23/parallel-reader/#comments</comments>
		<pubDate>Fri, 23 Oct 2009 11:36:51 +0000</pubDate>
		<dc:creator>Martin Zatopek</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[clover]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[parallel]]></category>
		<category><![CDATA[ParallelReader]]></category>
		<category><![CDATA[parsing]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=252</guid>
		<description><![CDATA[In October release 2.8.1 of Clover we introduced a new component which definitely should attract your attention – the Parallel Reader. The name itself already suggests the goal of the component – improve reading speed by going parallel. The component is very similar to Universal Data Reader in function – it reads delimited flat files [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=252&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>In October release 2.8.1 of Clover we introduced a new component which definitely should attract your attention – the Parallel Reader. The name itself already suggests the goal of the component – improve reading speed by going parallel. The component is very similar to Universal Data Reader in function – it reads delimited flat files like CSV, tab<br />
delimited, etc. &#8211; much hasn&#8217;t changed here. But the real difference comes from under the hood.</p>
<p>There are two major optimizations which allow Parallel Reader to exhibit excellent performance results, especially on server-class machines with fast modern disks or better yet, disk arrays. The first optimization we have done is – of course – reading the file in parallel. The input file is divided into a set of virtual data chunks which are fed into reading threads. These work all together at the same time &#8211; each one parsing data records just from its own file part. The number of threads can be specified by component parameter “Level Of Parallelism” and should reflect the hardware setup – e.g. number of disks in a stripped RAID – to harness the maximum power of Parallel Reader. Another great performance gain we achieved is merely by just simplifying the data parser inside. This parser is as simple as possible – although with limited validation, error handling, and some functionality &#8211; but really, really fast.</p>
<p>Although the new reader has a few limitations coming from its nature, extreme speed in common use cases compensates all these drawbacks. If you are processing big amounts of data (hundreds of megabytes and more) and your transformation does not depend on data records being read in original order, Parallel Reader is here and it might just be the right choice for you – why not give it a try?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/252/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=252&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/10/23/parallel-reader/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/a09e97c9dfaf07365d2353d0ed474b28?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">mzatopek</media:title>
		</media:content>
	</item>
		<item>
		<title>Building DWH with CloverETL: Slowly Changing Dimension Type 1</title>
		<link>http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/</link>
		<comments>http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 10:21:56 +0000</pubDate>
		<dc:creator>Petr Uher</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[clover]]></category>
		<category><![CDATA[data warehousing]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[SCD1]]></category>
		<category><![CDATA[slowly changing dimension]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=213</guid>
		<description><![CDATA[The very typical usage of ETL tools is loading the data warehouse (DWH). So I decided to write a tutorial that will describe typical data warehouse tasks (slowly changing dimensions, date dimension, filling fact tables) and propose solutions with using of CloverETL. If you are a newbie in data warehousing I recommend you reading some [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=213&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>The very typical usage of ETL tools is loading the data warehouse (DWH). So I decided to write a tutorial that will describe typical data warehouse tasks (slowly changing dimensions, date dimension, filling fact tables) and propose solutions with using of CloverETL.</p>
<p>If you are a newbie in data warehousing I recommend you reading some of the books by <a href="http://eu.wiley.com/WileyCDA/Section/id-302479.html?query=Ralph+Kimball">Ralph Kimball</a> or <a href="http://eu.wiley.com/WileyCDA/Section/id-302479.html?query=W.+H.+Inmon">H. I. Imon</a>.</p>
<p>Sample data warehouse collects the information about sales for a small store chain that offers electronics like iPod, MP3, laptops etc.</p>
<p>The DB schema of my data warehouse is very simple. It consists of four dimensions (Customer, Product, Store and Date), one degenerate dimension (invoice number) and one fact table (Sales). Fact table stores two additive facts: units and total price. you can see complete DB scheme on figure below.</p>
<div id="attachment_218" class="wp-caption aligncenter" style="width: 658px"><img class="size-full wp-image-218" title="DB schema of sample DWH" src="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png?w=648&#038;h=273" alt="DB schema of sample DWH" width="648" height="273" /><p class="wp-caption-text">DB schema of sample DWH</p></div>
<h2>Store dimension</h2>
<p>One thing you will surely face when you build data warehouse is working with several types of slowly changing dimension (SCD). In this part of the tutorial I used the simplest SCD type.</p>
<p>The simplest and surely the most popular SCD type among ETL developers is <a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_1">slowly changing dimension type 1</a>. It doesn&#8217;t store any history, so once the value in online transaction processing system (OLTP) has been changed, the value in DWH is immediately overwritten as well.</p>
<p>I decided to use SCD1 for store dimension which collects basic information about stores: address, store identifier, store manager etc. Each store is identified in OLTP by unique store number (natural key). But for DWH I have to generate an own surrogate identifier ID_D_STORE.</p>
<p>The basic idea of processing SCD1 is very simple: compare records in DWH and OLTP, insert  missing records into DWH, update the DWH records according to OLTP. For all these tasks the attribute that helps us to find corresponding records is the natural key – STORE_NUMBER.</p>
<p>So let&#8217;s go to develop CloverETL graph. For better portability all inputs and output data are stored in csv files, thus you don&#8217;t have to configure any database. The store dimension of DWH is stored in D_STORE.tbl file, the actual data from OLTP are stored in Store_25092009.csv. In both of these files we have to read, sort on natural key STORE_NUMBER and find the records that aren&#8217;t in D_STORE (third output of DataIntersection). In this last step we will use DataIntersection component. Simultaneously (by second output of DataIntersection) we get the potential records that can be different in OLTP and DWH. These records are then filtered and only the records having any different value of any attribute are processed and new values are stored to D_STORE_update.tbl file. New records are written to D_STORE_insert.tbl file once ID_D_STORE attribute is added. ID_D_STORE attribute gains its value from sequence that we have already defined in CloverETL in advance. And that&#8217;s all. You can see the resulting graph below.</p>
<div id="attachment_217" class="wp-caption aligncenter" style="width: 1034px"><img class="size-full wp-image-217" title="D_STORE_SCD1" src="http://cloveretl.files.wordpress.com/2009/09/d_store_scd1_cut1.png?w=1024&#038;h=260" alt="CloverETL graph D_STORE_SCD1" width="1024" height="260" /><p class="wp-caption-text">CloverETL graph D_STORE_SCD1</p></div>
<p>If you want to read/write data from/to database easily replace UniversalDataReaders with DBInputTables and UniversalDataWriters with DBOutputTable components.</p>
<p>You can download complete CloverETL project <a href="http://drop.io/cbhirdh/asset/scd1-example-zip">here</a>.</p>
<p>To be continued. In the next part we will deal with <a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2">slowly changing dimension type 2</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/213/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/213/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/213/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/213/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/213/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=213&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cec19d3aab248f6f4591a97277323f2c?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">puher</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png" medium="image">
			<media:title type="html">DB schema of sample DWH</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/09/d_store_scd1_cut1.png" medium="image">
			<media:title type="html">D_STORE_SCD1</media:title>
		</media:content>
	</item>
		<item>
		<title>New QuickBase components</title>
		<link>http://blog.cloveretl.com/2009/09/02/new-quickbase-components/</link>
		<comments>http://blog.cloveretl.com/2009/09/02/new-quickbase-components/#comments</comments>
		<pubDate>Wed, 02 Sep 2009 08:34:40 +0000</pubDate>
		<dc:creator>Martin Zatopek</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[clover]]></category>
		<category><![CDATA[connector]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[QuickBase]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=161</guid>
		<description><![CDATA[We have great news for users of on-line database QuickBase from Intuit . CloverETL became a next tool which can be used to manipulate the data in this database. Now you can work with data in QuickBase without restraints and with full power of CloverETL. We introduced several specialized components dedicated for this purpose. QuickBaseQueryReader [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=161&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>We have great news for users of on-line database <a href="http://quickbase.intuit.com/">QuickBase</a> from Intuit . CloverETL became a next tool which can be used to manipulate the data in this database. Now you can work with data in QuickBase without restraints and with full power of CloverETL. We introduced several specialized components dedicated for this purpose.</p>
<p><a href="http://wiki.cloveretl.org/doku.php?id=components:readers#quickbasequeryreader">QuickBaseQueryReader</a> serves as a reader of records queried from a table. You only need to specify a table name which you want to read from and a query in <a href="https://www.quickbase.com/up/6mztyxu8/g/rc7/en/#_Toc126580074">QuickBase query language</a>. It is that simple. The second reader, <a href="http://wiki.cloveretl.org/doku.php?id=components:readers#quickbaserecordreader">QuickBaseRecordReader</a>, is even simpler. Again, you specify a table name and instead of the little bit confusing query, you just specify the particular record identifiers you require.</p>
<p>Data writing is provided also by a set of two components &#8211; <a href="http://wiki.cloveretl.org/doku.php?id=components:writers#quickbaserecordwriter">QuickBaseRecordWriter</a> and <a href="http://wiki.cloveretl.org/doku.php?id=components:writers#quickbaseimportcsv">QuickBaseImportCSV</a>. First one is a complex writer of incoming data records to a given table and second one is a bit simpler, however much faster bulk data loader to QuickBase.</p>
<p>So we have prepared the whole family of components fully supporting the data manipulation in QuickBase &#8211; do not hesitate and try it.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/161/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/161/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/161/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/161/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/161/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/161/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/161/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/161/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/161/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/161/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=161&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/09/02/new-quickbase-components/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/a09e97c9dfaf07365d2353d0ed474b28?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">mzatopek</media:title>
		</media:content>
	</item>
		<item>
		<title>CloverETL x Talend evaluation by Axege</title>
		<link>http://blog.cloveretl.com/2009/03/31/cloveretl-x-talend-evaluation-by-axege/</link>
		<comments>http://blog.cloveretl.com/2009/03/31/cloveretl-x-talend-evaluation-by-axege/#comments</comments>
		<pubDate>Tue, 31 Mar 2009 19:48:00 +0000</pubDate>
		<dc:creator>dpavlis</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[comparison]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://cloveretl.wordpress.com/?p=16</guid>
		<description><![CDATA[For those looking for some independent comparison of CloverETL versus Talend Open Studio, look at this one conducted by French organization Axege [in French ] We have prepared English translation of the original document with comments.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=16&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>For those looking for some independent comparison of CloverETL versus Talend Open Studio, <a title="CloverETL versus Talend Open Studio by Axege" href="http://www.axege.com/Evaluation-de-deux-ETL-Clover-ETL-vs-Talend-Open-Studio.html" target="_blank">look at this one</a> conducted by French organization <a href="http://www.axege.com" target="_blank">Axege</a> [in French <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  ]</p>
<p><em>We have prepared <a href="http://www.cloveretl.com/_upload/clover-etl/Evaluation_of_two_ETL_Axege.pdf" target="_blank">English translation</a> of the original document with comments.</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/16/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=16&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/03/31/cloveretl-x-talend-evaluation-by-axege/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/a2fe7d61b2e2fa3b0adea8b6aeca574a?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">dadik</media:title>
		</media:content>
	</item>
	</channel>
</rss>