<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>CloverETL&#039;s Blog &#187; Using CloverETL</title>
	<atom:link href="http://blog.cloveretl.com/category/using-cloveretl/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cloveretl.com</link>
	<description>Life, the Universe, CloverETL and everything ...</description>
	<lastBuildDate>Thu, 15 Jul 2010 14:12:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.cloveretl.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/dd4c2411bcdf90b36e88bda58e3fce7c?s=96&#038;d=http://s2.wp.com/i/buttonw-com.png</url>
		<title>CloverETL&#039;s Blog &#187; Using CloverETL</title>
		<link>http://blog.cloveretl.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.cloveretl.com/osd.xml" title="CloverETL&#039;s Blog" />
	<atom:link rel='hub' href='http://blog.cloveretl.com/?pushpress=hub'/>
		<item>
		<title>Integration of Clover with PHP</title>
		<link>http://blog.cloveretl.com/2010/06/22/integration-of-clover-with-php/</link>
		<comments>http://blog.cloveretl.com/2010/06/22/integration-of-clover-with-php/#comments</comments>
		<pubDate>Tue, 22 Jun 2010 09:00:39 +0000</pubDate>
		<dc:creator>Vaclav Matous</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=742</guid>
		<description><![CDATA[To witness the power of Clover, consider the following scenario. Our customer required an application whose main purpose was generating reports in xls format. What would seem like a simple task was presented with difficult challenges: Reports had  many different formats specified by many parameters. Reports were generated from heterogeneous sources – xls files, database [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=742&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>To witness the power of Clover, consider the following scenario. Our customer required an application whose main purpose was generating reports in xls format. What would seem like a simple task was presented with difficult challenges:</p>
<ul>
<li>Reports had  many different formats specified by many parameters.</li>
<li>Reports were generated from heterogeneous sources – xls files, database tables and IBM iSeries files (former AS/400).</li>
<li>Application users modified data (mainly added records) in a report before its final confirmation.</li>
<li>The confirmation of a report invoked updates in several database tables and generated log records.</li>
<li>Short response times were needed for the most of operations.</li>
</ul>
<p>We knew right away we would approach this situation with a web application written in PHP because web application architecture brought us plenty of needed advantages. But how can we quickly and reliably integrate heterogeneous sources? This is exactly the type of task that ETL tools were developed for. That’s why we built the whole solution using CloverETL Server with the Launch Services (equivalent of web services) feature. The following schema should illustrate the architecture of our solution.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/06/php_blog.png"><img class="aligncenter size-full wp-image-743" title="php_blog" src="http://cloveretl.files.wordpress.com/2010/06/php_blog.png?w=642&#038;h=542" alt="" width="642" height="542" /></a></p>
<p style="text-align:center;"><a href="http://cloveretl.files.wordpress.com/2010/06/php_blog.png"></a></p>
<p>Users of the application communicate only with the web application layer (1) that uses its own database for storing temporary data, setup information, and a history of reported records. This part also manages access permissions based on user roles.</p>
<p>When a user wants to view some report, he or she sets the proper parameters and other restrictions (e.g. type of records to be reported or dates scope) in the web application using his/her preferred web browser (IE, Firefox …). Confirming the filled parameters invokes a call to a Clover Launch Service (2) and a transformation connected with this service. The transformation integrates data from heterogeneous sources and stores it into a database dedicated to the web application. All stored records are identified by runID (transformation execution’s unique identifier within Clover Server) and the service returns the same runID as a response from its call.</p>
<p>The web application allows users to modify only certain data (according to partial runID). Users can insert, change or delete records (3). These changes have an effect only within the database of the web application and play only a temporary role until the user confirms the report. The confirmation again invokes  a call to the Clover Launch Service that then generates the needed reports, logs records about processed changes, and propagates changes into a history of reported records (4).</p>
<p>To summarize this solution, we moved the entire application logic into a CloverETL Server which allows us to solve such problems with ease. Furthermore we took advantage of Clover Launch Services – a combination of ETL integration power and online processing.</p>
<p>In my opinion, there are many similar problems where the concept described above cuts the time and effort to reach a solution.</p>
<p>8QJ9MFM9KQUM</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/742/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/742/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/742/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/742/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/742/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/742/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/742/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/742/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/742/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/742/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=742&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/06/22/integration-of-clover-with-php/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/73f56f1267c1896b11e3c6df97499559?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">vmatous</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/06/php_blog.png" medium="image">
			<media:title type="html">php_blog</media:title>
		</media:content>
	</item>
		<item>
		<title>ExtSort vs. FastSort – which one is better for me?</title>
		<link>http://blog.cloveretl.com/2010/06/15/extsort-vs-fastsort-%e2%80%93-which-one-is-better-for-me/</link>
		<comments>http://blog.cloveretl.com/2010/06/15/extsort-vs-fastsort-%e2%80%93-which-one-is-better-for-me/#comments</comments>
		<pubDate>Tue, 15 Jun 2010 12:08:59 +0000</pubDate>
		<dc:creator>bigpavel</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[sort]]></category>
		<category><![CDATA[sort components]]></category>
		<category><![CDATA[sorting]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=704</guid>
		<description><![CDATA[I often get asked why CloverETL offers two sort components instead of just one and what’s the right key for determining which one is better for a particular purpose.

The reason for having two sort components in CloverETL is simply to keep things as easy as possible. Since the inner natures of ExtSort and FastSort are quite different it would be really difficult to implement a nice and clean universal one.

Luckily, the decision is simple and straightforward. In case you can dedicate enough system resources (CPU cores and/or memory) for the graph doing the sorting, FastSort is the clear option. On the other hand, if you’re short on resources and want a more conservative behavior, pick ExtSort which will give you steady performance at minimum system requirements.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=704&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>I often get asked why CloverETL offers two sort components instead of just one and what’s the right key for determining which one is better for a particular purpose.</p>
<p>The reason for having two sort components in CloverETL is simply to keep things as easy as possible. Since the inner natures of ExtSort and FastSort are quite different it would be really difficult to implement a nice and clean universal one.</p>
<p>Luckily, the decision is simple and straightforward. In case you can dedicate enough system resources (CPU cores and/or memory) for the graph doing the sorting, FastSort is the clear option. On the other hand, if you’re short on resources and want a more conservative behavior, pick ExtSort which will give you steady performance at minimum system requirements.</p>
<p>FastSort is a very powerful tool, but to truly witness its power, users must set it up correctly to use their hardware&#8217;s maximum potential. We will now dive into the settings behind this impressive component and learn how to max out it&#8217;s ability while being careful to avoid crashes.</p>
<h4>Tweaking FastSort</h4>
<p>FastSort is greedy for both memory and CPU cores and in case the system does not have enough of either, FastSort can quite easily crash with out-of-memory, especially if the records you’re going to sort are big (long string fields, tens or hundreds of fields, etc.).</p>
<h5>Parallelism</h5>
<p>Unlike ExtSort, FastSort can utilize potentially unlimited number of CPU cores to do its job. You can control how many worker threads are used by overriding default value for “Concurrency (threads)”. My experience shows however, that unless you’re able to use really fast disk drives, going for more than 2 threads does not necessarily help and can even slow the process back down a bit. So basically you don’t need to worry about parallelism at all unless you have the hardware to take advantage of it. Remember, that parallelism adds extra memory load for each additional thread!</p>
<h5>Memory</h5>
<p>FastSort can be a bit tricky with memory, since there are multiple settings which affect it. The most important is the “Run size (records)” which denotes the size of the data chunk being sorted at a time. Note, that actual record size and level of parallelism increase the overall memory consumption – so be careful with this setting. The default is 20k records, if you set the “Estimated record count” – which is your rough guess on total count of records to be sorted, the Run size is computed for you based on a experimentally derived formula. This formula tries to get the right “Run size” based on number of records and amount of available memory (which you can limit with “Maximum memory” – defaults to unlimited). This “computed guess” works in most cases, but can fail under certain conditions. You need to test and tweak on your data a bit to get the best result. Run size is definitely a parameter worth playing with!</p>
<p>Be sure to have enough memory dedicated to your JVM – with large, numerous records. You want to give FastSort plenty of free memory – going for 512 MB up to 2 gigs is worth it! (e.g. –Xmx1536m) With a lot of memory, FastSort will do an amazing job. However with default 64 MB heap space setting, FastSort can crash.</p>
<p>&#8216;In memory only sorting&#8217; is an option you can use in case you’re sure that all data will fit into your memory – you can either force it (and then possibly crash due to out-of-memory) or leave it to default auto. Auto means that at first, FastSort tries to sort the data in memory and if that fails, on disk sorting is used instead.</p>
<h5>Other limits and valuable parameters</h5>
<p>Apart from memory settings, you can impose more limits on FastSort to reflect your needs. For example, if your system works with disk quotas which limit the number of open files, you can cap temp files of FastSort with “Max open files”. Note that FastSort uses LOTS of files – hundreds, thousands. If you cap it too much (500 or less) FastSort will continue to work, but  its performance will decrease significantly. So should you need to limit the number of open files, consider switching to ExtSort.</p>
<h5>Settings you can forget</h5>
<p>There are other advanced options for FastSort, but you can leave them to default values unless you are really trying to optimize your sort. Number of read buffers defines how many chunks of data will be held in memory at a time – which must be at least the number of Concurrency – otherwise some of the workers wouldn’t have data to work on. Using too large a number, you’ll end up with out-of-memory – the default is based on current concurrency setting and is just fine.</p>
<p>Average record size is nothing else than a helper guess on average byte size of records in the data – if not set, FastSort computes this automatically from the real data so it’s usually more precise than setting an explicit value.</p>
<p>Tape buffer is a buffer used for each worker for filling the output and slightly affects performance, but the default is fine in almost all cases.</p>
<p>The last two options control how temp files are created, they can be either compressed (defaults to false) and you can even control the charset of string fields (default UTF16). Both are there for space saving purposes (space occupied by temp files during graph execution) and decrease performance.</p>
<h4>The Decision</h4>
<p>FatSort is very powerful sorting component and can significantly speed up your transformation process. But it has to be set up correctly. So, if you are not sure and you want the always safe and simple sort, go with ExtSort. On the other hand, if you know your hardware and want to utilize it to optimize your sort for speed, dive into FastSort and explore it a bit. The results can be extraordinary.</p>
<p><em>To be continued</em> … (Part 2 will discuss ExtSort component)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/704/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/704/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/704/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/704/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/704/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/704/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/704/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/704/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/704/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/704/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=704&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/06/15/extsort-vs-fastsort-%e2%80%93-which-one-is-better-for-me/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0fb57473985720d4d29eac8a52337a73?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">bigpavel</media:title>
		</media:content>
	</item>
		<item>
		<title>How to easily enrich data using CloverETL’s Auto-filling feature</title>
		<link>http://blog.cloveretl.com/2010/06/03/how-to-easily-enrich-data-using-cloveretl%e2%80%99s-auto-filling-feature/</link>
		<comments>http://blog.cloveretl.com/2010/06/03/how-to-easily-enrich-data-using-cloveretl%e2%80%99s-auto-filling-feature/#comments</comments>
		<pubDate>Thu, 03 Jun 2010 07:25:14 +0000</pubDate>
		<dc:creator>tomaswaller</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[autofilling]]></category>
		<category><![CDATA[timestamp]]></category>
		<category><![CDATA[XLS]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=676</guid>
		<description><![CDATA[Users often need to retrieve data from a data source which does not contain this related data but is easily defined. Thus, it is important to be able to add further information to your source that is not already present in the file (e.g. time stamp, name of excel sheet). Such additional information can simplify [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=676&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>Users often need to retrieve data from a data source which does not contain this related data but is easily defined. Thus, it is important to be able to add further information to your source that is not already present in the file (e.g. time stamp, name of excel sheet). Such additional information can simplify further data processing.</p>
<p>For example:</p>
<ul>
<li>Each file has a time of creation, its size, name and the path where it is located, and (in case of an XLS file) the name of the sheet that is read.</li>
<li>When a data source is being read, the reader starts to work at an explicit time; each record is also read at an explicit time.</li>
<li>Records can be numbered in the order in which they are read.</li>
<li>Information about errors may be available (in DBExecute and DBOutputTable).</li>
</ul>
<p>All this information can easily enrich the read data in CloverETL by using the <strong>auto-filling</strong> functionality.</p>
<p>Auto-filling is a feature that is available on the metadata definition level. When you open the Metadata Editor and select any field of the metadata there is an Autofilling property (under the Advance tab). You can select one of the following values:</p>
<table style="height:408px;width:800px;border-collapse:collapse;" border="0" cellspacing="0" cellpadding="6">
<tbody>
<tr>
<td style="border:1px solid black;" width="198"><strong>Name</strong></td>
<td style="border:1px solid black;" width="92"><strong>Date type</strong></td>
<td style="border:1px solid black;" width="367"><strong>Description</strong></td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">default_value</td>
<td style="border:1px solid;" width="92" valign="center">any type</td>
<td style="border:1px solid;" width="367" valign="center">When the null value is   assign to the field and the field is marked as non-nullable, the null is   replaced by the value of Default property of the field</td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">global_row_count</td>
<td style="border:1px solid;" width="92" valign="center">any numeric</td>
<td style="border:1px solid;" width="367" valign="center">Sequence number of the   record in a data source starting from 0. The number <strong>isn’t</strong> reset for each input file while the wildcards are used in   File URL (${DATAIN_DIR}/input*.txt)</td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">source_row_count</td>
<td style="border:1px solid;" width="92" valign="center">any numeric</td>
<td style="border:1px solid;" width="367" valign="center">Sequence number of the   record in data source, starting from 0. The number <strong>is</strong> reset to 0 for each input file while the wildcards are used in   File URL (${DATAIN_DIR}/input*.txt)</td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">metadata_row_count</td>
<td style="border:1px solid;" width="92" valign="center">any numeric</td>
<td style="border:1px solid;" width="367" valign="center">Similar to global_row_count.   But when the reader component supports more than one type of output metadata   (XMLExtract, XMLXPathReader) each metadata has a separate counter.</td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">metadata_source_row_count</td>
<td style="border:1px solid;" width="92" valign="center">any numeric</td>
<td style="border:1px solid;" width="367" valign="center">Similar to source_row_count.   But when the reader component supports more than one type of output metadata   (XMLExtract, XMLXPathReader) each metadata has a separate counter.</td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">source_name</td>
<td style="border:1px solid;" width="92" valign="center">string</td>
<td style="border:1px solid;" width="367" valign="center">Name of data source. For   file readers it’s fully qualified path (ex. /home/user/input.csv),   for DataGenerator it’s ID of graph component, SQL query for DBInputTable,   fully qualified class name for JMSReader</td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">source_size</td>
<td style="border:1px solid;" width="92" valign="center">any numeric</td>
<td style="border:1px solid;" width="367" valign="center">Size of the file in bytes. 0   for non-file readers.</td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">source_timestamp</td>
<td style="border:1px solid;" width="92" valign="center">date</td>
<td style="border:1px solid;" width="367" valign="center">Date and time of the creation   of the file. Empty (null) in all non-file readers.</td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">reader_timestamp</td>
<td style="border:1px solid;" width="92" valign="center">date</td>
<td style="border:1px solid;" width="367" valign="center">Date and time when reader   starts read data</td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">row_timestamp</td>
<td style="border:1px solid;" width="92" valign="center">date</td>
<td style="border:1px solid;" width="367" valign="center">Date and time when the   reader starts read the record</td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">sheet_name</td>
<td style="border:1px solid;" width="92" valign="center">string</td>
<td style="border:1px solid;" width="367" valign="center">Name of the sheet, only in   XLSDataReader</td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">ErrCode</td>
<td style="border:1px solid;" width="92" valign="center">any numeric</td>
<td style="border:1px solid;" width="367" valign="center">Error code returned by   database engine, only in DBExecute and DBOutputTable</td>
</tr>
<tr>
<td style="border:1px solid;" width="198" valign="center">ErrText</td>
<td style="border:1px solid;" width="92" valign="center">string</td>
<td style="border:1px solid;" width="367" valign="center">Error message returned by   database engine, only in DBExecute and DBOutputTable</td>
</tr>
</tbody>
</table>
<p>Remember that any of these functions can be applied to a field not contained in a file, database table, generated data, or JMS message.</p>
<h4>Use case</h4>
<p>Imagine that you have an export of customers from a database in an Excel file. The Excel file is organized into many sheets; each sheet is named by a state abbreviation and contains only customers from one state. See figure bellow.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/06/blog01_data.png"><img class="aligncenter size-full wp-image-691" title="blog01_data" src="http://cloveretl.files.wordpress.com/2010/06/blog01_data.png?w=739&#038;h=506" alt="" width="739" height="506" /></a></p>
<p>Now you want to merge all customers to one CSV file but for each customer you want to also store the state in separate column. It looks very easy <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . For CloverETL it is, not necessarily so for other ETL tools.</p>
<p>The final graph is very simple, there are only two components.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/06/blog01_graph.png"><img class="aligncenter size-full wp-image-692" title="blog01_graph" src="http://cloveretl.files.wordpress.com/2010/06/blog01_graph.png?w=341&#038;h=73" alt="" width="341" height="73" /></a></p>
<p>The most important part of this graph is hidden in the definition of the metadata on the edge. We have to enrich the metadata, which we created from the Excel file by using the proper wizard in CloverETL Designer, with a new field “state” by setting the Autofilling property to sheet_name as you can see below. And that’s all.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/06/blog01_metadataeditorsheetname.png"><img class="aligncenter size-full wp-image-693" title="blog01_metadataEditorSheetName" src="http://cloveretl.files.wordpress.com/2010/06/blog01_metadataeditorsheetname.png?w=900&#038;h=600" alt="" width="900" height="600" /></a></p>
<p>The Autofilling field can be placed on any position in a metadata definition. It’s conveniant that we place the autofilling fields at the end of the metadata definition, after the field with the record delimiter. But this conveniance does not apply when the same metadata are also used for writing to a flat file (in our case). Thus we moved the autofilling fields to the position before the field with a record delimiter.</p>
<p>When you run the graph, you will get the following results:</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/06/blog01_output.png"><img class="aligncenter size-full wp-image-694" title="blog01_output" src="http://cloveretl.files.wordpress.com/2010/06/blog01_output.png?w=507&#038;h=432" alt="" width="507" height="432" /></a></p>
<p><a href="http://cloveretl.files.wordpress.com/2010/06/blog01_output.png"></a>You can use multiple autofilling fields with different functions at the same time to get more useful information from the file. For example, you can get the file name and the sheet names at the same time. Or you can get the number the records, etc. For example, when the following auto-filling functions are used (sheet_name, source_size, row_timestamp, global_row_count), the result will look like this:</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/06/blog01_output02.png"><img class="aligncenter size-full wp-image-695" title="blog01_output02" src="http://cloveretl.files.wordpress.com/2010/06/blog01_output02.png?w=792&#038;h=456" alt="" width="792" height="456" /></a></p>
<p>Remember that only the edges connected to the output port(s) of the following reader components can use the auto-filling functionality:</p>
<ul>
<li>UniversalDataReader</li>
<li>CloverDataReader</li>
<li>XLSDataReader</li>
<li>DBFDataReader</li>
<li>MultiLevelReader</li>
<li>XMLExtract</li>
<li>XMLXPathReader</li>
<li>DBInputTable</li>
<li>DataGenerator</li>
<li>JMSReader</li>
</ul>
<p>DBOutputTable and DBExecute can use only two error auto-filling functions – ErrCode and ErrText.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/676/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/676/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/676/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/676/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/676/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/676/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/676/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/676/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/676/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/676/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=676&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/06/03/how-to-easily-enrich-data-using-cloveretl%e2%80%99s-auto-filling-feature/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2c39e475cb676a45a1a1e3a6ee1bf9b1?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tomaswaller</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/06/blog01_data.png" medium="image">
			<media:title type="html">blog01_data</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/06/blog01_graph.png" medium="image">
			<media:title type="html">blog01_graph</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/06/blog01_metadataeditorsheetname.png" medium="image">
			<media:title type="html">blog01_metadataEditorSheetName</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/06/blog01_output.png" medium="image">
			<media:title type="html">blog01_output</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/06/blog01_output02.png" medium="image">
			<media:title type="html">blog01_output02</media:title>
		</media:content>
	</item>
		<item>
		<title>Building DWH with CloverETL: Slowly Changing Dimension Type 2</title>
		<link>http://blog.cloveretl.com/2010/05/27/building-dwh-with-cloveretl-scd2/</link>
		<comments>http://blog.cloveretl.com/2010/05/27/building-dwh-with-cloveretl-scd2/#comments</comments>
		<pubDate>Thu, 27 May 2010 01:39:40 +0000</pubDate>
		<dc:creator>Petr Uher</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[CloverETL]]></category>
		<category><![CDATA[dwh]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[scd2]]></category>
		<category><![CDATA[slowly changing dimension]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=668</guid>
		<description><![CDATA[In the last part of our data warehouse (DWH) tutorial, I showed you how to load a dimension table that stores historical data according to the Slowly Changing Dimension Type 1 (SCD1). In today’s post, I will focus on a Slowly Changing Dimension Type 2 (SCD2) dimension table. I think that SCD2 is the most [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=668&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/">In the last part</a> of our data warehouse (DWH) tutorial, I showed you how to load a dimension table that stores historical data according to the Slowly Changing Dimension Type 1 (SCD1). In today’s post, I will focus on a Slowly Changing Dimension Type 2 (SCD2) dimension table. I think that SCD2 is the most challenging sub-task of ETL part of DWH design and each ETL architect should be able to deal with it.</p>
<p>In contrast to SCD1, SCD2 table stores preserves history of attributes. So once the value of attribute is changed in external system  (OLTP) we have to create a new record in SCD2 dimension table with the actual value but we also have to mark the old record in SCD2 table as obsolete. The most common way to obsolete the record is to maintain two additional attributes: valid_from and valid_to. Then the record is considered valid at particular date D when valid_from &lt; D ≤ valid_to. You can find a detailed explanation of SCD2 principles in <a href="http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1274661375&amp;sr=8-1">Kimball’s DWH bible</a> or on <a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2">wikipedia.org</a>.</p>
<p>Let us show how SCD2 works in real in a small example. We will use DWH schema introduced in SCD1’s post.</p>
<p style="text-align:left;"><a href="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png"><img class="size-full wp-image-218 aligncenter" title="DB schema of sample DWH" src="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png?w=648&#038;h=273" alt="" width="648" height="273" /></a></p>
<p>It consists of four dimensions (Customer, Product, Store and Date), one degenerate dimension (invoice number) and one fact table (Sales). Fact table stores two additive facts: units and total price.</p>
<p>Store table is populated as SCD1 and we will load Customer table that was marked as SCD2 dimension table. Let’s imagine that Customer changed his email. What will happen in OLTP and DWH Customer table named D_CUSTOMER?</p>
<p><strong>OLTP:</strong></p>
<p>C0001;John;Newman;john.newman@hotmail.com <span style="color:#0000ff;">=&gt;</span> C0001;John;Newman;newman.john@gmail.com</p>
<p><strong>DWH:</strong></p>
<p>0001;C0001;John;Newman;john.newman@hotmail.com;2009-10-10;<span style="color:#ff0000;">null</span> <span style="color:#0000ff;">=&gt;</span><br />
0001;C0001;John;Newman;john.newman@hotmail.com;2009-10-10;<span style="color:#ff0000;">2010-05-20</span><br />
0002;C0001;John;Newman;newman.john@gmail.com;2010-05-21;null</p>
<p>Notice especially the first two attributes (columns) and the last two attributes of DWH table. The first attribute is a surrogate key, it is a unique identifier of the record in D_CUSTOMER table. It is generated by ETL process.  The second one (C0001) is a natural key, a unique identifier of customer in OLTP. When you list all records of the same natural key in D_CUSTOMER you will get a complete history of one customer.</p>
<p>The principle how SCD2 works is explained now I will describe an implementation of SCD2 in CloverETL. See the CloverETL’s graph bellow.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/05/d_customer_scd2.png"><img class="aligncenter size-full wp-image-670" title="D_CUSTOMER_SCD2" src="http://cloveretl.files.wordpress.com/2010/05/d_customer_scd2.png?w=799&#038;h=166" alt="" width="799" height="166" /></a></p>
<p>The basic data-flow of the graph is very simple: in the lower branch we read the data from OLTP (for us as in previous DWH post it’s a CSV file), in the upper branch we read data that is already stored in our SCD2 table. We have to use the Dedup component, as we want only one actual record for each customer. The two branches intersect in DataIntersection component that processes the records according to the natural key. The component has three output ports, as there are three possible outcomes:</p>
<ol>
<li>The record exists only in DWH. This should not happen, it means that the record was deleted in OLTP. The “normal” OLTPs do not allow delete of records. That kind of records end in Trash component.</li>
<li>In DWH table there exists at least one record with the same natural key as the record coming from OLTP. That record goes through the second output port to the component that identifies whether the record was changed (ExtFilter component). And then the record is copied to two records: the first one that obsoletes the current record in D_CUSTOMER (identified by surrogate key) and  the second one that is inserted to D_CUSTOMER and stores the new values read from OLTP. The first one set column valid_to = today()-1 and the second record is inserted with valid_from = today() and valid_to = null.</li>
<li>The record coming from OLTP is a new one, there is no record with the same natural key in DWH. In that case the record is sent to the third output port and in following components is inserted to D_CUSTOMER table with valid_from = today() and valid_to = null.</li>
</ol>
<p>If you want to verify that your CloverETL SCD2 graph works correctly or if you are looking for sample data, you can simply import example project to your Clover installation. It is embedded to your CloverETL Designer as a DWHExample project. For more information how to import example project see <a href="http://www.cloveretl.com/documentation/UserGuide/topic/com.cloveretl.gui.docs/docs/cloveretl-examples-project.html">online documentation</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/668/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=668&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/05/27/building-dwh-with-cloveretl-scd2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cec19d3aab248f6f4591a97277323f2c?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">puher</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png" medium="image">
			<media:title type="html">DB schema of sample DWH</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/05/d_customer_scd2.png" medium="image">
			<media:title type="html">D_CUSTOMER_SCD2</media:title>
		</media:content>
	</item>
		<item>
		<title>CloverETL as a high-throughput XML processor</title>
		<link>http://blog.cloveretl.com/2010/05/20/cloveretl-as-a-high-throughput-xml-processor/</link>
		<comments>http://blog.cloveretl.com/2010/05/20/cloveretl-as-a-high-throughput-xml-processor/#comments</comments>
		<pubDate>Thu, 20 May 2010 15:19:34 +0000</pubDate>
		<dc:creator>jlehotsky</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[DOM]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[SAX]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[XMLSchema]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=654</guid>
		<description><![CDATA[XML is a markup language that has been around for some years now. Originally, it comes from the world of documents &#8211; used in web hypertext, word processors and other representations. Today, it is very popular in many areas, including the world of data exchange. The reasons are simple &#8211; the format is straightforward, well [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=654&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>XML is a markup language that has been around for some years now. Originally, it comes from the world of documents &#8211; used in web hypertext, word processors and other representations. Today, it is very popular in many areas, including the world of data exchange. The reasons are simple &#8211; the format is straightforward, well defined, and easily transferable accross platforms. XML can be easily read and modified by users in contrast to proprietary and binary formats. It also represents structured hierarchical data, which can be very difficult to express in plain CSV format. XML is self-descriptive, which heavily increases the user&#8217;s ability to understand data and eliminates the need of data format description and parsing instructions.</p>
<p>XML is often used to transport data between potentionally incompatible systems, resulting in a task to parse and store data of this format and eventually to process this data. CloverETL provides powerful tools to accomplish this task.</p>
<p>One of the components that provides XML parsing is XMLXPathReader. The user simply defines the mapping of each data element or attribute to a given CloverETL field. In the background of the component there is a DOM parser which allows the user to include general XPath expressions in the mapping definition.</p>
<p>In practice, users will often encounter vast XML files, which typically follow a standard structure. This structure contains records which represent a given entity (company, person, etc.) that can be repeated many times in a large XML data source. It is quite common that these sources of data come in sizes of 10s or even 100s of gigabytes. When this happens, DOM parsing is greatly inappropriate as all this data cannot be contained in memory. For this reason, another CloverETL XML parsing component becomes handy &#8211; XMLExtract. This handles records individually which are usually quite small, at least small enough to be processed in memory.</p>
<p>In XMLExtract, the user is able to define how each element can be mapped to a CloverETL record at every level of the XML structure . XMLExtract also provides the possibility of including a parent key at each structure level, thus allowing later complete reconstructions of the entire data structure. If the XML does not contain the unique key itself, it can also be easily generated using a CloverETL sequence object.</p>
<p>XML data and their basic integrity rules can be very well specified using XML Schema which today is a standard part of well defined data exchange. If you use XML Schema, CloverETL provides a very convenient visual drag&amp;drop editor which helps the user build an XML mapping:</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/05/attach1.jpg"><img class="aligncenter size-full wp-image-657" title="Mapping" src="http://cloveretl.files.wordpress.com/2010/05/attach1.jpg?w=802&#038;h=561" alt="" width="802" height="561" /></a></p>
<p>This screenshot represents an XML mapping which defines how XML and Clover fields are mapped. This mapping can also be displayed as text:</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/05/attach2.jpg"><img class="aligncenter size-full wp-image-658" title="Mapping definition" src="http://cloveretl.files.wordpress.com/2010/05/attach2.jpg?w=802&#038;h=561" alt="" width="802" height="561" /></a></p>
<p>To provide an example where these methods were essential, CloverETL successfully completed a master data consolidation and matching project for an international insurance company. The XML Schemas were very complex, containing hundreds of different XML element types in its structure. The volume of data was over a hundred GBs describing tens of millions of customers as organizations and 4-5 million customers as persons. One of the many tasks assigned to CloverETL was to read and store the vast amount of data in XML in which it performed substantially greater due to XML&#8217;s fast sequential processing.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/654/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/654/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/654/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/654/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/654/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/654/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/654/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/654/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/654/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/654/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=654&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/05/20/cloveretl-as-a-high-throughput-xml-processor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f39a297af252727614a56914e6c234a4?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">jlehotsky</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/05/attach1.jpg" medium="image">
			<media:title type="html">Mapping</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/05/attach2.jpg" medium="image">
			<media:title type="html">Mapping definition</media:title>
		</media:content>
	</item>
		<item>
		<title>Sending E-mails from CloverETL (2) Attachments</title>
		<link>http://blog.cloveretl.com/2010/05/12/sending-e-mails-from-cloveretl-2-attachments/</link>
		<comments>http://blog.cloveretl.com/2010/05/12/sending-e-mails-from-cloveretl-2-attachments/#comments</comments>
		<pubDate>Wed, 12 May 2010 16:15:19 +0000</pubDate>
		<dc:creator>bigpavel</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=639</guid>
		<description><![CDATA[In my previous post I talked about using the EmailSender component featured in CloverETL 2.8 and later to send messages from inside a running transformation graph. EmailSender is used in cases when you need to compose a message from the data that you process in your graph. For example, bulk mailing computed reports, reporting faulty data to administrators, etc. Read my previous post on EmailSender to learn the basics.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=639&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://blog.cloveretl.com/2010/04/08/sending-e-mails-from-cloveretl-1-basics/">previous post</a> I talked about using the EmailSender component featured in CloverETL 2.8 and later to send messages from inside a running transformation graph. EmailSender is used in cases when you need to compose a message from the data that you process in your graph. For example, bulk mailing computed reports, reporting faulty data to administrators, etc. Read my previous post on EmailSender to learn the basics.</p>
<p>In this post, I will focus just on composing messages with attachments using CloverETL&#8217;s EmailSender component.</p>
<p>You can add as many attachments to your message as you want. Any attachment can be, as everything in EmailSender, either taken from your input data record or passed as a parameter or static text.</p>
<p>EmailSender distinguishes between two kinds of attachments – files (on the computer where the transformation runs) and data attachments.</p>
<p><strong>File attachments </strong>work as one would expect – the file is read from your local computer and attached to the message. In CloverETL you can choose how you specify the file name – either pick a static path or you pass it in the input data record.</p>
<p><strong>Data attachments</strong> are composed directly from data coming to the input of EmailSender. This way you can assemble for example XML file attachments, texts and even binary attachments. Obviously, in such case you need to specify additional information so that EmailSender knows how to send your data – specifically its mime type and the attachment name.</p>
<p>To start with attachments, open the EmailSender edit dialog and go to the “Attachments” attribute editor.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/05/attachment1.png"><img class="aligncenter size-full wp-image-641" title="Attachments attribute editor" src="http://cloveretl.files.wordpress.com/2010/05/attachment1.png?w=643&#038;h=259" alt="" width="643" height="259" /></a></p>
<p>The easiest way to add an attachment is to use the “-&gt;” button or drag&amp;drop from Fields to Attachments.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/05/attachment2.png"><img class="aligncenter size-full wp-image-645" title="attachment2" src="http://cloveretl.files.wordpress.com/2010/05/attachment2.png?w=643&#038;h=259" alt="" width="643" height="259" /></a></p>
<p>This way you create a file attachment whose path is passed from an input data record (in field “attchFile” in the example above). Please notice the description column which tells you how the attachment will be handled.</p>
<p>You can also add a new attachment using the “+” button and then editing the newly created item with the “&#8230;” editor button (you may find this a bit confusing at first – just hit the the plus “+” button and a new line in the Attachments table appears. Then click in the first column of the new row and click the three-dot edit button “&#8230;” &#8211; and bingo, you&#8217;re there!).</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/05/attachment3.png"><img class="aligncenter size-full wp-image-646" title="attachment3" src="http://cloveretl.files.wordpress.com/2010/05/attachment3.png?w=547&#038;h=400" alt="" width="547" height="400" /></a></p>
<p>In the dialog, you can either browse for a local file or pick a field to use as file name. Notice the screenshot – I can even use the field as just a part of some predefined path. This allows me to compose the attachment path from static path, field value or even parameters (the full example might look like “${PROJECT}\attachments\$attchFile”).</p>
<p>Data attachments can be defined with the last option – Attachment data from record. Simply pick the field where your attachment data will be, then specify the attachment name and mime type – both can again be either taken from a field or static. See the following screenshot to get the hang of it.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/05/attachment4.png"><img class="aligncenter size-full wp-image-647" title="attachment4" src="http://cloveretl.files.wordpress.com/2010/05/attachment4.png?w=484&#038;h=118" alt="" width="484" height="118" /></a></p>
<p>That&#8217;s it! You can now send attachments with CloverETL&#8217;s EmailSender. Just always remember, that you can set any value to either constant, parameter or field value using the $field notation. With this in mind,  you can set up pretty much everything you&#8217;ll ever need.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/639/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/639/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/639/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/639/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/639/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/639/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/639/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/639/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/639/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/639/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=639&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/05/12/sending-e-mails-from-cloveretl-2-attachments/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0fb57473985720d4d29eac8a52337a73?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">bigpavel</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/05/attachment1.png" medium="image">
			<media:title type="html">Attachments attribute editor</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/05/attachment2.png" medium="image">
			<media:title type="html">attachment2</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/05/attachment3.png" medium="image">
			<media:title type="html">attachment3</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/05/attachment4.png" medium="image">
			<media:title type="html">attachment4</media:title>
		</media:content>
	</item>
		<item>
		<title>Loop execution of transformation</title>
		<link>http://blog.cloveretl.com/2010/04/28/loop-execution-of-transformation/</link>
		<comments>http://blog.cloveretl.com/2010/04/28/loop-execution-of-transformation/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 15:05:56 +0000</pubDate>
		<dc:creator>mvarecha</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[CloverETL]]></category>
		<category><![CDATA[groovy]]></category>
		<category><![CDATA[ISIR]]></category>
		<category><![CDATA[listener]]></category>
		<category><![CDATA[Server]]></category>
		<category><![CDATA[SOA]]></category>
		<category><![CDATA[soap]]></category>
		<category><![CDATA[web service]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=627</guid>
		<description><![CDATA[Case study description Czech Insolvency Registry (http://isir.justice.cz) basically contains data about economic subjects that entered insolvency and have financial difficulties with paying off their debts. The registry allows everybody to download data using public SOAP Web Service. It can be done manually or automatically with the right software. https://isir.justice.cz:8443/isir_ws/services/IsirPub001?wsdl CloverETL can easily help with the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=627&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<h4>Case study description</h4>
<p>Czech Insolvency Registry (<a href="http://isir.justice.cz/">http://isir.justice.cz</a>) basically contains data about economic subjects that entered insolvency and have financial difficulties with paying off their debts. The registry allows everybody to download data using public SOAP Web Service. It can be done manually or automatically with the right software.<br />
<a href="https://isir.justice.cz:8443/isir_ws/services/IsirPub001?wsdl">https://isir.justice.cz:8443/isir_ws/services/IsirPub001?wsdl</a></p>
<p>CloverETL can easily help with the automatically download that would save time and technical difficulties. CloverETL graph can get required data by calling the web service, processes data and store it in required format. Unfortunately the Registry’s web service is very poorly designed. The service doesn&#8217;t give you current status of each of the economic subjects, but provides the whole history of the required company. Therefore we have to download not only the current information we need but the whole information since the year 2008 (the registry foundation). That is a lot of data to process – actually thousands of log records for each company! Moreover the Registry’s Web service „GetIsirPub0012“ provides only maximum of 1000 records per one call. If one company has few thousands of records you have to undertake more calls.  So we have to download data in thousand-records bunches, but we don&#8217;t know in advance exactly how many of these bunches (records) there are for each company. That makes the whole process quiet difficult.</p>
<p>But solution with CloverETL is simple. CloverETL Server provides features “graph event listener” and “groovy task” that help us with all the above described challenges. Firstly, we will of course design a CloverETL graph that processes for the beginning just one thousand –record bunch of data (see picture bellow).</p>
<h4>Transformation graph</h4>
<p>This graph has a parameter „startID“ which has value “0” by default. If we want to process 1000 records starting let’s say from no. 2541, then  start ID will be startID=2541, and the first downloaded record will be identified by. If we run graph without parameters, it&#8217;ll download and process first thousand of records (no. 0 – 999).</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/04/isir_graph.png"><img class="aligncenter size-full wp-image-628" title="ISIR_graph" src="http://cloveretl.files.wordpress.com/2010/04/isir_graph.png?w=800&#038;h=345" alt="" width="800" height="345" /></a></p>
<p>Graph also contains couple of components to store ID of last downloaded record so that  the next bunch to download may use the last ID  as the startID. It will be automatically stored to graph parameters as lasted parameter. It can be done by in-line Java code in Reformat component:<br />
<code>String id = GetVal.getString(source[0],"id");<br />
getGraph().getGraphProperties().setProperty("lastID", id );</code></p>
<h4>The loop</h4>
<p>The graph we designed must be executed n-times to download and process all records. At first we don&#8217;t know how many times, but we know, that we can stop the downloading process as soon as there are no more records to read. It means we can stop the process as soon as  “started” and “lasted” are equal.</p>
<p>How to achieve such loop?</p>
<h5>Graph event listener</h5>
<p>To achieve the automatic loop, for the graph that we designed and described previously, we&#8217;ll define graph event listener for “FINISHED_OK” graph event on CloverETL Server. So every time transformation finishes without error („FINISHED_OK“), listener will trigger task that we selected. We need to specify this tasks now. Since we want to  execute the same graph repeatedly, we have to specify “execute graph” task. This task will repeat executing the graph indefinitely. However we need to stop this loop at some point. We need to “break the loop” when the startID and the lastID parameters are equal. Therefore it is actually better to create “groovy task” instead of „execute graph“.</p>
<h5>Groovy task</h5>
<p>Groovy is scripting language with Java syntax. In addition, groovy scripts may access java objects and use java libraries. See Groovy project site <a href="http://groovy.codehaus.org/">http://groovy.codehaus.org/</a> for more details.</p>
<p>We&#8217;ll create a simple groovy script which decides whether execute the graph again or not. To decide it, we&#8217;ll need to get graph properties from the finished graph. These properties are accessible by calling method event.getProperties().</p>
<p>Then, we&#8217;ll need to execute graph using CloverETL Server Java API. It&#8217;s done by calling method <code>serverFacade.executeGraph()</code>.</p>
<p>Script may return String value which is stored in „Task history log“.</p>
<p><code>// these variables are predefined:<br />
// sessionToken<br />
// event<br />
// serverFacade</code></p>
<p><code>import com.cloveretl.server.persistent.RunRecord;<br />
import org.apache.log4j.Logger;<br />
import com.cloveretl.server.api.*;<br />
import org.springframework.web.context.WebApplicationContext;<br />
import org.springframework.web.context.support.WebApplicationContextUtils;</code></p>
<p><code>Logger log = Logger.getLogger("groovy-ISIR-graphEventListener");</code></p>
<p><code>Properties eventProps = event.getProperties();<br />
log.info("event properties: " + eventProps);</code></p>
<p><code>// get lastID and startID from previous graph execution<br />
String lastIDString = eventProps.getProperty("lastID");<br />
String startIDString = eventProps.getProperty("startID");<br />
long lastID = Long.valueOf(lastIDString);<br />
long startID = Long.valueOf(startIDString);</code></p>
<p><code>// lastID and startID from last graph execution are equal – break the loop<br />
if (lastID == startID)<br />
return "no more records to download";</code></p>
<p><code>// prepare startID which will be passed to next graph execution<br />
Properties properties = new Properties();<br />
properties.setProperty("startID", lastIDString);</code></p>
<p><code>String SANDBOX = eventProps.getProperty("SANDBOX_CODE");<br />
String GRAPH = eventProps.getProperty("GRAPH_FILE");<br />
GraphExecutionCommand graphExecutionCommand = new GraphExecutionCommand(<br />
null, SANDBOX, GRAPH, null, null, null, true, properties, null, null);<br />
Response respExec = serverFacade.executeGraph(sessionToken, graphExecutionCommand);<br />
String result = "graph "+SANDBOX+"/"+GRAPH+" executed: "+respExec.getBean();<br />
log.info(result);<br />
return result;</code></p>
<h4>Graph results</h4>
<p>All graph results for each bunch of data are stored to only one CSV file. They are always added, so don’t worry there is no danger that some of them will be overwritten <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . So when the whole batch of transformations is finished, we have only one CSV file with all processed records. Or if somebody wishes we can consolidate records and store them directly into database where the data can be stored in more friendly and usable format.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/627/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/627/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/627/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/627/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/627/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/627/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/627/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/627/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/627/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/627/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=627&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/04/28/loop-execution-of-transformation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/998f04e59afe9c312019cab4da3f99be?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">mvarecha</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/04/isir_graph.png" medium="image">
			<media:title type="html">ISIR_graph</media:title>
		</media:content>
	</item>
		<item>
		<title>Sending E-mails from CloverETL (1) Basics</title>
		<link>http://blog.cloveretl.com/2010/04/08/sending-e-mails-from-cloveretl-1-basics/</link>
		<comments>http://blog.cloveretl.com/2010/04/08/sending-e-mails-from-cloveretl-1-basics/#comments</comments>
		<pubDate>Thu, 08 Apr 2010 08:57:31 +0000</pubDate>
		<dc:creator>bigpavel</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=590</guid>
		<description><![CDATA[If you ever have a situation where you need to send an email from your data transformation, CloverETL gives you several options to do it. If you need to monitor your transformation&#8217;s health and status, you would be better off with CloverETL Server, which offers plenty of monitoring and reporting features. These can be hooked [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=590&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><!-- 		@page { margin: 2cm } 		P { margin-bottom: 0.21cm } -->If you ever have a situation where you need to send an email from your data transformation, CloverETL gives you several options to do it. If you need to monitor your transformation&#8217;s health and status, you would be better off with CloverETL Server, which offers plenty of monitoring and reporting features. These can be hooked to an “e-mail” action which sends you e-mail alerts based on predefined rules. This is very useful feature in an enterprise environment. We will hopefully cover these options in some later posts.</p>
<p>Today, I would like to focus on simpler, yet powerful way of sending e-mail messages from CloverETL – sending them directly from the transformation graph. You use a dedicated component – the EmailSender – to send e-mail messages from inside a graph.</p>
<p>The component falls into writers category which suggests that it can take data on the input and output it somewhere. In this case, it puts it into an e-mail message. You can map your input onto any aspect of the message – from “To”, “Cc”, “Subject”, etc. to the message body or as an attachment(s). EmailSender reads input records and for each one it composes and sends out a single e-mail message.</p>
<p>A small, but quite nice feature is that you don&#8217;t need to map all message properties to your input. You need to prepare only such parts that vary with each record and leave the rest to defaults or set up parameters for it. For example, if you want to compose your message body from data input records but want the message recipient to be fixed, you can use a parameter for “To” and keep it separate from data records.</p>
<p>Basic setup of EmailSender requires setting basic parameters for SMTP – host, and username and password if required. You can also setup SSL and TLS if the server supports it.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/04/croppercapture11.png"><img class="alignnone size-full wp-image-591" title="EmailSender attributes window" src="http://cloveretl.files.wordpress.com/2010/04/croppercapture11.png?w=740&#038;h=537" alt="" width="740" height="537" /></a></p>
<blockquote><address>Please note, that it is a common CloverETL approach to use parameters to pass sensitive data – password in this case (see screenshot).</address>
</blockquote>
<p><!-- 		@page { margin: 2cm } 		P { margin-bottom: 0.21cm } -->Let&#8217;s now take a peek at setting the actual message composition and the mapping onto input – the Message attribute. It opens an editor which displays message headers and your input.<span style="color:#000000;"> </span></p>
<p><a href="http://cloveretl.files.wordpress.com/2010/04/croppercapture13.png"><img class="alignnone size-full wp-image-593" title="EmailSender Message Editor" src="http://cloveretl.files.wordpress.com/2010/04/croppercapture13.png?w=783&#038;h=319" alt="" width="783" height="319" /></a></p>
<p><!-- 		@page { margin: 2cm } 		P { margin-bottom: 0.21cm } -->The mapping is actually extremely simple – you use fields from your input as variables to fill in the values. Each field is used as $field (see use of $text in MessageBody). Also note the Alternative column – for records where the “text” field in the example has an empty value, this alternative value will be used (can be constant or defined in a parameter).</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/04/croppercapture14.png"><img class="alignnone size-full wp-image-594" title="EmailSender Message Editor (2)" src="http://cloveretl.files.wordpress.com/2010/04/croppercapture14.png?w=783&#038;h=319" alt="" width="783" height="319" /></a></p>
<p><!-- 		@page { margin: 2cm } 		P { margin-bottom: 0.21cm } 		A:link { so-language: zxx } -->You can even combine several fields into one to form (any) text – not only in message body, but also “Subject”, “To”, etc.</p>
<p>If you need a more advanced approach, you can use the “+” button to add additional keys and values. They are passed as-is to your e-mail message as extra headers (e.g. for “X-Mailer: CloverETL” you set key Name to “X-Mailer” and set its Value to “CloverETL”).</p>
<p>That&#8217;s all for now, hope you will find many cases where you find this component to be a useful tool.</p>
<p>Also, wait for my next post where I explain how you can use EmailSender to send messages with attachments!<span style="color:#000000;"> </span></p>
<p><span style="color:#000000;"> </span></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/590/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=590&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/04/08/sending-e-mails-from-cloveretl-1-basics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/0fb57473985720d4d29eac8a52337a73?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">bigpavel</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/04/croppercapture11.png" medium="image">
			<media:title type="html">EmailSender attributes window</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/04/croppercapture13.png" medium="image">
			<media:title type="html">EmailSender Message Editor</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/04/croppercapture14.png" medium="image">
			<media:title type="html">EmailSender Message Editor (2)</media:title>
		</media:content>
	</item>
		<item>
		<title>Data profiling</title>
		<link>http://blog.cloveretl.com/2010/03/31/data-profiling/</link>
		<comments>http://blog.cloveretl.com/2010/03/31/data-profiling/#comments</comments>
		<pubDate>Wed, 31 Mar 2010 12:15:15 +0000</pubDate>
		<dc:creator>Agata Vackova</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[data profiling]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[profiler]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=537</guid>
		<description><![CDATA[Before you start to develop any data transformation you should explore your data (make data profiling). There are a lot of tools on the market that can help you. But why to install and learn another software when you can use the tool you are familiar with? CloverETL is mainly data transformation tool but it [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=537&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>Before you start to develop any data transformation you should explore your data (make data profiling). There are a lot of tools on the market that can help you. But why to install and learn another software when you can use the tool you are familiar with? CloverETL is mainly data transformation tool but it can be easily used for data profiling as well (as I will show you in this blog post).</p>
<p>It is very easy to do data statistic with latest version of CloverETL. You can find <em>DataProfiling</em> project in <a title="CloverETL Examples Project" href="http://www.cloveretl.com/documentation/UserGuide/topic/com.cloveretl.gui.docs/docs/cloveretl-examples-project.html">CloverETL Examples Projects</a>. The project consists of two graphs: <em>BasicStatistic</em> and <em>AdvancedStatistic</em>.</p>
<p>The first one finds basic statistic for input data file:</p>
<ul>
<li>minimum value for numeric fields or minimum length of data for string and byte fields</li>
<li>maximum value for numeric fields or maximum length of data for string and byte fields</li>
<li>average value for numeric fields or average length of data for string and byte fields</li>
<li>number of records in data file</li>
<li>number of not null values for each data field</li>
<li>number of null values for each data field</li>
</ul>
<p>Additionally, for string data fields, it finds:</p>
<ul>
<li>first not null value</li>
<li>if all values are ASCII</li>
</ul>
<p>The second one calculates for each data field:</p>
<ul>
<li>number of records in data file</li>
<li>number of not null values</li>
<li>number of unique values</li>
<li>minimum value</li>
<li>maximum value</li>
<li>average value for numeric fields</li>
<li>median value</li>
<li>modus value</li>
</ul>
<p>It also finds frequency counts for fields with not many (the threshold is defined by a parameter <code>HISTOGRAM_THRESHOLD</code>) unique values.</p>
<h4><strong>BasicStatistic graph</strong></h4>
<p>The graphs in the project are prepared to analyze data from the excel file <code>ORDERS.xls</code> placed in <code>data-in</code> directory. But for purpose of this post we will analyze data stored in a flat file <code>employees.list.dat</code> (also placed in <code>data-in</code> directory).</p>
<p>To do that we need to set following parameters:</p>
<p>input_file=${DATAIN_DIR}/employees.list.dat</p>
<p>metadata=${META_DIR}/employees.fmt</p>
<p>READER_TYPE=DATA_READER</p>
<p>Metadata file (<code>employees.fmt</code>) has to contain metadata for <code>employees.list.dat</code>:</p>
<pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;Record name="EMPLOYEE" recordDelimiter="\n" recordSize="-1" type="delimited"&gt;
&lt;Field delimiter="," format="#" name="EMP_NO" nullable="true" shift="0" type="integer"/&gt;
&lt;Field delimiter="," name="FIRST_NAME" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="LAST_NAME" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="PHONE_EXT" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," format="dd/MM/yyyy" name="HIRE_DATE" nullable="true" shift="0" type="date"/&gt;
&lt;Field delimiter="," name="DEPT_NO" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="JOB_CODE" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="JOB_GRADE" nullable="true" shift="0" type="numeric"/&gt;
&lt;Field delimiter="," name="JOB_COUNTRY" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="SALARY" nullable="true" shift="0" type="numeric"/&gt;
&lt;Field name="FULL_NAME" nullable="true" shift="0" type="string"/&gt;
&lt;/Record&gt;</pre>
<p>Lets see the graph with mid-results and output:</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/03/basicstatistic.png"><img class="alignnone size-full wp-image-538" title="BasicStatistic" src="http://cloveretl.files.wordpress.com/2010/03/basicstatistic.png?w=1024&#038;h=668" alt="" width="1024" height="668" /></a></p>
<p>DataReader parses data from the input file. Normalizer creates record that consists of three basic fields: original field name, field type and &#8220;normalized&#8221; value, which is original value for numeric fields,  time in milliseconds for date fields and data length for string or byte fields. Moreover this component collects basic information about string data e.g.: finds first not null value, checks if string contains only ASCII characters and if it can be converted to number. The next component (Rollup – Statistic) calculates minimum, maximum and average value for each group of records (with the same field name). It also propagates first not null value, checks if all isAscii and isNumber fields are not false and sets result value for the whole group. Fourth and fifth component has only “cosmetic” aim – they convert times in milliseconds back to user friendly form and sort output records. The writer converts results to report.</p>
<p>By default the final report is a plain html file:</p>
<p><strong>Data statistic for data-in/employees.list.dat</strong></p>
<table border="1" cellspacing="3" cellpadding="2" width="1143">
<col span="1" width="122"></col>
<col span="1" width="73"></col>
<col span="1" width="138"></col>
<col span="1" width="138"></col>
<col span="1" width="146"></col>
<col span="1" width="44"></col>
<col span="1" width="100"></col>
<col span="1" width="74"></col>
<col span="1" width="102"></col>
<col span="1" width="53"></col>
<col span="1" width="73"></col>
<tbody>
<tr>
<th width="122">Field name</th>
<th width="73">Field type</th>
<th width="138">min</th>
<th width="138">max</th>
<th width="146">average</th>
<th width="44">count</th>
<th width="100">count not null</th>
<th width="74">count null</th>
<th width="102">first not null</th>
<th width="53">is Ascii</th>
<th width="73">is number</th>
</tr>
<tr>
<td width="122">DEPT_NO</td>
<td width="73">string</td>
<td width="138">3.0</td>
<td width="138">3.0</td>
<td width="146">3.0</td>
<td width="44">51</td>
<td width="100">43</td>
<td width="74">8</td>
<td width="102">600</td>
<td width="53">true</td>
<td width="73">true</td>
</tr>
<tr>
<td width="122">EMP_NO</td>
<td width="73">integer</td>
<td width="138">0.0</td>
<td width="138">145.0</td>
<td width="146">58.392156862745104</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102"> </td>
<td width="53">true</td>
<td width="73">true</td>
</tr>
<tr>
<td width="122">FIRST_NAME</td>
<td width="73">string</td>
<td width="138">3.0</td>
<td width="138">11.0</td>
<td width="146">5.549019607843137</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">Robert</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">FULL_NAME</td>
<td width="73">string</td>
<td width="138">10.0</td>
<td width="138">22.0</td>
<td width="146">14.549019607843137</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">Nelson, Robert</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">HIRE_DATE</td>
<td width="73">date</td>
<td width="138">28/12/1988 00:00:00</td>
<td width="138">15/11/1994 00:00:00</td>
<td width="146"> </td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102"> </td>
<td width="53"> </td>
<td width="73"> </td>
</tr>
<tr>
<td width="122">JOB_CODE</td>
<td width="73">string</td>
<td width="138">2.0</td>
<td width="138">24.0</td>
<td width="146">6.1568627450980395</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">VP</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">JOB_COUNTRY</td>
<td width="73">string</td>
<td width="138">2.0</td>
<td width="138">11.0</td>
<td width="146">3.6470588235294117</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">USA</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">JOB_GRADE</td>
<td width="73">number</td>
<td width="138">0.0</td>
<td width="138">5.0</td>
<td width="146">3.4117647058823524</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102"> </td>
<td width="53">true</td>
<td width="73">true</td>
</tr>
<tr>
<td width="122">LAST_NAME</td>
<td width="73">string</td>
<td width="138">3.0</td>
<td width="138">12.0</td>
<td width="146">6.8431372549019605</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">Nelson</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">PHONE_EXT</td>
<td width="73">string</td>
<td width="138">1.0</td>
<td width="138">5.0</td>
<td width="146">2.8627450980392157</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">250</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">SALARY</td>
<td width="73">number</td>
<td width="138">0.0</td>
<td width="138">9.9E7</td>
<td width="146">2267125.137254902</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102"> </td>
<td width="53">true</td>
<td width="73">true</td>
</tr>
</tbody>
</table>
<p>but it can be easily changed to excel file (just adjust graph parameter <code>WRITER_TYPE=XLS_WRITER</code>).</p>
<h4><strong>AdvancedStatistic graph</strong></h4>
<p><a href="http://cloveretl.files.wordpress.com/2010/03/advancedstatistic.png"><img class="alignnone size-full wp-image-539" title="AdvancedStatistic" src="http://cloveretl.files.wordpress.com/2010/03/advancedstatistic.png?w=1024&#038;h=561" alt="" width="1024" height="561" /></a></p>
<p>The phase 0 of <em>AdvancedStatistic</em> graph works similarly as graph <em>BasicStatistic</em>, but it uses Aggregators for statistic calculations instead of Rollup. Be particular about component called Simplification in phase 0 of the graph: it stores number of records in file and names of fields with number of unique values under threshold in dictionary (marked by red ellipses on the picture above). Then Histogram filter component can read this field&#8217;s names and skip the records that aren&#8217;t between fields for frequency calculations (green eclipses on the picture). Phase 1 Aggregators count frequencies for fields that were filtered out by previous component.</p>
<p>Resulting file looks as follows:</p>
<p><strong>Advanced data statistic and histograms for ./data-in/employees.list.dat</strong></p>
<p>Statistics</p>
<table border="1">
<tbody>
<tr>
<th>Field name</th>
<th>Field type</th>
<th>min</th>
<th>max</th>
<th>average number</th>
<th>count</th>
<th>count not null</th>
<th>count unique</th>
<th>median</th>
<th>modus</th>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>000</td>
<td>900</td>
<td> </td>
<td>51</td>
<td>43</td>
<td>20</td>
<td>600</td>
<td>623</td>
</tr>
<tr>
<td>EMP_NO</td>
<td align="center">integer</td>
<td>0.0</td>
<td>145.0</td>
<td>58.3921568627451</td>
<td>51</td>
<td>51</td>
<td>47</td>
<td>45.0</td>
<td>2.0</td>
</tr>
<tr>
<td>FIRST_NAME</td>
<td align="center">string</td>
<td>Andrew</td>
<td>Yuki</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>44</td>
<td>Mark</td>
<td>Robert</td>
</tr>
<tr>
<td>FULL_NAME</td>
<td align="center">string</td>
<td>Baldwin, Janet</td>
<td>Young, Katherine</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>50</td>
<td>Lee, Terri</td>
<td>Sutherland, Claudia</td>
</tr>
<tr>
<td>HIRE_DATE</td>
<td align="center">date</td>
<td>28/12/1988 00:00:00</td>
<td>15/11/1994 00:00:00</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>44</td>
<td>20/04/1992 00:00:00</td>
<td>02/01/1994 00:00:00</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Admin</td>
<td>Vice President</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>16</td>
<td>Mktg</td>
<td>Eng</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Canada</td>
<td>USA</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>9</td>
<td>USA</td>
<td>USA</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>0.0</td>
<td>5.0</td>
<td>3.411764705882353</td>
<td>51</td>
<td>51</td>
<td>6</td>
<td>4.0</td>
<td>4.0</td>
</tr>
<tr>
<td>LAST_NAME</td>
<td align="center">string</td>
<td>Baldwin</td>
<td>Young</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>48</td>
<td>Lambert</td>
<td>Johnson</td>
</tr>
<tr>
<td>PHONE_EXT</td>
<td align="center">string</td>
<td>1</td>
<td>null</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>46</td>
<td>3355</td>
<td>null</td>
</tr>
<tr>
<td>SALARY</td>
<td align="center">number</td>
<td>0.0</td>
<td>9.9E7</td>
<td>2267125.137254902</td>
<td>51</td>
<td>51</td>
<td>40</td>
<td>61637.8125</td>
<td>0.0</td>
</tr>
</tbody>
</table>
<p>Histograms</p>
<table border="1">
<tbody>
<tr>
<th>Field name</th>
<th>Field type</th>
<th>value</th>
<th>count</th>
<th>count %</th>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td> </td>
<td>8</td>
<td>
<code style="color:#333333;">###############·····················································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>000</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>100</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>110</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>115</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>120</td>
<td>3</td>
<td>
<code style="color:#333333;">#####·······························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>121</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>123</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>125</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>130</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>140</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>180</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>600</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>621</td>
<td>4</td>
<td>
<code style="color:#333333;">#######·····························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>622</td>
<td>3</td>
<td>
<code style="color:#333333;">#####·······························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>623</td>
<td>5</td>
<td>
<code style="color:#333333;">#########···························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>670</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>671</td>
<td>3</td>
<td>
<code style="color:#333333;">#####·······························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>672</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>900</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Admin</td>
<td>4</td>
<td>
<code style="color:#333333;">#######·····························································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>CEO</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>CFO</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Dir</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Doc</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Eng</td>
<td>15</td>
<td>
<code style="color:#333333;">#############################·······································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Finan</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Inside Sales Coordinator</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Mktg</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Mngr</td>
<td>4</td>
<td>
<code style="color:#333333;">#######·····························································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>PRel</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>SRep</td>
<td>9</td>
<td>
<code style="color:#333333;">#################···················································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Sales</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Sales Representative</td>
<td>6</td>
<td>
<code style="color:#333333;">###########·························································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>VP</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Vice President</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Canada</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>England</td>
<td>3</td>
<td>
<code style="color:#333333;">#####·······························································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>France</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Italy</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Japan</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Sales</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Switzerland</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>UK</td>
<td>4</td>
<td>
<code style="color:#333333;">#######·····························································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>USA</td>
<td>36</td>
<td>
<code style="color:#333333;">######################################################################······························</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>0.0</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>1.0</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>2.0</td>
<td>8</td>
<td>
<code style="color:#333333;">###############·····················································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>3.0</td>
<td>14</td>
<td>
<code style="color:#333333;">###########################·········································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>4.0</td>
<td>16</td>
<td>
<code style="color:#333333;">###############################·····································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>5.0</td>
<td>10</td>
<td>
<code style="color:#333333;">###################·················································································</code>
</td>
</tr>
</tbody>
</table>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/537/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=537&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/03/31/data-profiling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/934c88184df6c0034450ae00a1695ee8?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">agad</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/03/basicstatistic.png" medium="image">
			<media:title type="html">BasicStatistic</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/03/advancedstatistic.png" medium="image">
			<media:title type="html">AdvancedStatistic</media:title>
		</media:content>
	</item>
		<item>
		<title>Working with CloverETL as a new user</title>
		<link>http://blog.cloveretl.com/2010/03/22/working-with-cloveretl-as-a-new-user/</link>
		<comments>http://blog.cloveretl.com/2010/03/22/working-with-cloveretl-as-a-new-user/#comments</comments>
		<pubDate>Mon, 22 Mar 2010 03:45:14 +0000</pubDate>
		<dc:creator>sswezey</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[GATech]]></category>
		<category><![CDATA[Georgia Institute of Technology]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=527</guid>
		<description><![CDATA[I would like to share my experience with CloverETL as an external person. I study at Georgia Institute of Technology in Atlanta, GA with a major in Computer Science. On my search for interesting internships in Europe I found about Javlin a.s., a company based in Prague. As I wanted to get on hand experience [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=527&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>I would like to share my experience with CloverETL as an external person. I study at <em>Georgia Institute of Technology in Atlanta,</em> GA with a major in <em>Computer Science</em>. On my search for interesting internships in Europe I found about Javlin a.s., a company based in Prague. As I wanted to get on hand experience with programming and software development I was immediately interested in this company. It sounded like an appealing opportunity where I could gain a lot of helpful knowledge and work experience.</p>
<p>In January I arrived in Prague and started working with CloverETL immediately when I began my internship with Javlin. When I arrived here, I had no clue exactly what CloverETL ‘did’, much less what date warehousing and extract-transform-load tools were. After the first week or so, I really figured out what ETL tools are, and especially, what CloverETL is. I played around with CloverETL, and was intrigued by what it can do. I really liked what it could – I don’t have any need for it now as a student, but I can see how incredibly powerful it can be for certain people or companies. I really like how simple it is to just connect components with a GUI, you can read a client XLS sheet, compare it with your Outlook address book (exported as a CSV file), removing duplicate records, collecting all data for each client into one record, and then making a new XLS sheet or CSV file for Outlook. Also, I like how the engine itself is open-source, enabling anyone to download and use it – but in my opinion, the Designer is much quicker and easier to use.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/527/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/527/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/527/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/527/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/527/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/527/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/527/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/527/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/527/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/527/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=527&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/03/22/working-with-cloveretl-as-a-new-user/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/06756ec4b63f280183ecf1298517ee44?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">sswezey</media:title>
		</media:content>
	</item>
	</channel>
</rss>