<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>CloverETL&#039;s Blog &#187; statistics</title>
	<atom:link href="http://blog.cloveretl.com/tag/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cloveretl.com</link>
	<description>Life, the Universe, CloverETL and everything ...</description>
	<lastBuildDate>Thu, 15 Jul 2010 14:12:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.cloveretl.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/dd4c2411bcdf90b36e88bda58e3fce7c?s=96&#038;d=http://s2.wp.com/i/buttonw-com.png</url>
		<title>CloverETL&#039;s Blog &#187; statistics</title>
		<link>http://blog.cloveretl.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.cloveretl.com/osd.xml" title="CloverETL&#039;s Blog" />
	<atom:link rel='hub' href='http://blog.cloveretl.com/?pushpress=hub'/>
		<item>
		<title>Data profiling</title>
		<link>http://blog.cloveretl.com/2010/03/31/data-profiling/</link>
		<comments>http://blog.cloveretl.com/2010/03/31/data-profiling/#comments</comments>
		<pubDate>Wed, 31 Mar 2010 12:15:15 +0000</pubDate>
		<dc:creator>Agata Vackova</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[data profiling]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[profiler]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=537</guid>
		<description><![CDATA[Before you start to develop any data transformation you should explore your data (make data profiling). There are a lot of tools on the market that can help you. But why to install and learn another software when you can use the tool you are familiar with? CloverETL is mainly data transformation tool but it [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=537&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>Before you start to develop any data transformation you should explore your data (make data profiling). There are a lot of tools on the market that can help you. But why to install and learn another software when you can use the tool you are familiar with? CloverETL is mainly data transformation tool but it can be easily used for data profiling as well (as I will show you in this blog post).</p>
<p>It is very easy to do data statistic with latest version of CloverETL. You can find <em>DataProfiling</em> project in <a title="CloverETL Examples Project" href="http://www.cloveretl.com/documentation/UserGuide/topic/com.cloveretl.gui.docs/docs/cloveretl-examples-project.html">CloverETL Examples Projects</a>. The project consists of two graphs: <em>BasicStatistic</em> and <em>AdvancedStatistic</em>.</p>
<p>The first one finds basic statistic for input data file:</p>
<ul>
<li>minimum value for numeric fields or minimum length of data for string and byte fields</li>
<li>maximum value for numeric fields or maximum length of data for string and byte fields</li>
<li>average value for numeric fields or average length of data for string and byte fields</li>
<li>number of records in data file</li>
<li>number of not null values for each data field</li>
<li>number of null values for each data field</li>
</ul>
<p>Additionally, for string data fields, it finds:</p>
<ul>
<li>first not null value</li>
<li>if all values are ASCII</li>
</ul>
<p>The second one calculates for each data field:</p>
<ul>
<li>number of records in data file</li>
<li>number of not null values</li>
<li>number of unique values</li>
<li>minimum value</li>
<li>maximum value</li>
<li>average value for numeric fields</li>
<li>median value</li>
<li>modus value</li>
</ul>
<p>It also finds frequency counts for fields with not many (the threshold is defined by a parameter <code>HISTOGRAM_THRESHOLD</code>) unique values.</p>
<h4><strong>BasicStatistic graph</strong></h4>
<p>The graphs in the project are prepared to analyze data from the excel file <code>ORDERS.xls</code> placed in <code>data-in</code> directory. But for purpose of this post we will analyze data stored in a flat file <code>employees.list.dat</code> (also placed in <code>data-in</code> directory).</p>
<p>To do that we need to set following parameters:</p>
<p>input_file=${DATAIN_DIR}/employees.list.dat</p>
<p>metadata=${META_DIR}/employees.fmt</p>
<p>READER_TYPE=DATA_READER</p>
<p>Metadata file (<code>employees.fmt</code>) has to contain metadata for <code>employees.list.dat</code>:</p>
<pre>&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;Record name="EMPLOYEE" recordDelimiter="\n" recordSize="-1" type="delimited"&gt;
&lt;Field delimiter="," format="#" name="EMP_NO" nullable="true" shift="0" type="integer"/&gt;
&lt;Field delimiter="," name="FIRST_NAME" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="LAST_NAME" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="PHONE_EXT" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," format="dd/MM/yyyy" name="HIRE_DATE" nullable="true" shift="0" type="date"/&gt;
&lt;Field delimiter="," name="DEPT_NO" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="JOB_CODE" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="JOB_GRADE" nullable="true" shift="0" type="numeric"/&gt;
&lt;Field delimiter="," name="JOB_COUNTRY" nullable="true" shift="0" type="string"/&gt;
&lt;Field delimiter="," name="SALARY" nullable="true" shift="0" type="numeric"/&gt;
&lt;Field name="FULL_NAME" nullable="true" shift="0" type="string"/&gt;
&lt;/Record&gt;</pre>
<p>Lets see the graph with mid-results and output:</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/03/basicstatistic.png"><img class="alignnone size-full wp-image-538" title="BasicStatistic" src="http://cloveretl.files.wordpress.com/2010/03/basicstatistic.png?w=1024&#038;h=668" alt="" width="1024" height="668" /></a></p>
<p>DataReader parses data from the input file. Normalizer creates record that consists of three basic fields: original field name, field type and &#8220;normalized&#8221; value, which is original value for numeric fields,  time in milliseconds for date fields and data length for string or byte fields. Moreover this component collects basic information about string data e.g.: finds first not null value, checks if string contains only ASCII characters and if it can be converted to number. The next component (Rollup – Statistic) calculates minimum, maximum and average value for each group of records (with the same field name). It also propagates first not null value, checks if all isAscii and isNumber fields are not false and sets result value for the whole group. Fourth and fifth component has only “cosmetic” aim – they convert times in milliseconds back to user friendly form and sort output records. The writer converts results to report.</p>
<p>By default the final report is a plain html file:</p>
<p><strong>Data statistic for data-in/employees.list.dat</strong></p>
<table border="1" cellspacing="3" cellpadding="2" width="1143">
<col span="1" width="122"></col>
<col span="1" width="73"></col>
<col span="1" width="138"></col>
<col span="1" width="138"></col>
<col span="1" width="146"></col>
<col span="1" width="44"></col>
<col span="1" width="100"></col>
<col span="1" width="74"></col>
<col span="1" width="102"></col>
<col span="1" width="53"></col>
<col span="1" width="73"></col>
<tbody>
<tr>
<th width="122">Field name</th>
<th width="73">Field type</th>
<th width="138">min</th>
<th width="138">max</th>
<th width="146">average</th>
<th width="44">count</th>
<th width="100">count not null</th>
<th width="74">count null</th>
<th width="102">first not null</th>
<th width="53">is Ascii</th>
<th width="73">is number</th>
</tr>
<tr>
<td width="122">DEPT_NO</td>
<td width="73">string</td>
<td width="138">3.0</td>
<td width="138">3.0</td>
<td width="146">3.0</td>
<td width="44">51</td>
<td width="100">43</td>
<td width="74">8</td>
<td width="102">600</td>
<td width="53">true</td>
<td width="73">true</td>
</tr>
<tr>
<td width="122">EMP_NO</td>
<td width="73">integer</td>
<td width="138">0.0</td>
<td width="138">145.0</td>
<td width="146">58.392156862745104</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102"> </td>
<td width="53">true</td>
<td width="73">true</td>
</tr>
<tr>
<td width="122">FIRST_NAME</td>
<td width="73">string</td>
<td width="138">3.0</td>
<td width="138">11.0</td>
<td width="146">5.549019607843137</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">Robert</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">FULL_NAME</td>
<td width="73">string</td>
<td width="138">10.0</td>
<td width="138">22.0</td>
<td width="146">14.549019607843137</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">Nelson, Robert</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">HIRE_DATE</td>
<td width="73">date</td>
<td width="138">28/12/1988 00:00:00</td>
<td width="138">15/11/1994 00:00:00</td>
<td width="146"> </td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102"> </td>
<td width="53"> </td>
<td width="73"> </td>
</tr>
<tr>
<td width="122">JOB_CODE</td>
<td width="73">string</td>
<td width="138">2.0</td>
<td width="138">24.0</td>
<td width="146">6.1568627450980395</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">VP</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">JOB_COUNTRY</td>
<td width="73">string</td>
<td width="138">2.0</td>
<td width="138">11.0</td>
<td width="146">3.6470588235294117</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">USA</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">JOB_GRADE</td>
<td width="73">number</td>
<td width="138">0.0</td>
<td width="138">5.0</td>
<td width="146">3.4117647058823524</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102"> </td>
<td width="53">true</td>
<td width="73">true</td>
</tr>
<tr>
<td width="122">LAST_NAME</td>
<td width="73">string</td>
<td width="138">3.0</td>
<td width="138">12.0</td>
<td width="146">6.8431372549019605</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">Nelson</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">PHONE_EXT</td>
<td width="73">string</td>
<td width="138">1.0</td>
<td width="138">5.0</td>
<td width="146">2.8627450980392157</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102">250</td>
<td width="53">true</td>
<td width="73">false</td>
</tr>
<tr>
<td width="122">SALARY</td>
<td width="73">number</td>
<td width="138">0.0</td>
<td width="138">9.9E7</td>
<td width="146">2267125.137254902</td>
<td width="44">51</td>
<td width="100">51</td>
<td width="74">0</td>
<td width="102"> </td>
<td width="53">true</td>
<td width="73">true</td>
</tr>
</tbody>
</table>
<p>but it can be easily changed to excel file (just adjust graph parameter <code>WRITER_TYPE=XLS_WRITER</code>).</p>
<h4><strong>AdvancedStatistic graph</strong></h4>
<p><a href="http://cloveretl.files.wordpress.com/2010/03/advancedstatistic.png"><img class="alignnone size-full wp-image-539" title="AdvancedStatistic" src="http://cloveretl.files.wordpress.com/2010/03/advancedstatistic.png?w=1024&#038;h=561" alt="" width="1024" height="561" /></a></p>
<p>The phase 0 of <em>AdvancedStatistic</em> graph works similarly as graph <em>BasicStatistic</em>, but it uses Aggregators for statistic calculations instead of Rollup. Be particular about component called Simplification in phase 0 of the graph: it stores number of records in file and names of fields with number of unique values under threshold in dictionary (marked by red ellipses on the picture above). Then Histogram filter component can read this field&#8217;s names and skip the records that aren&#8217;t between fields for frequency calculations (green eclipses on the picture). Phase 1 Aggregators count frequencies for fields that were filtered out by previous component.</p>
<p>Resulting file looks as follows:</p>
<p><strong>Advanced data statistic and histograms for ./data-in/employees.list.dat</strong></p>
<p>Statistics</p>
<table border="1">
<tbody>
<tr>
<th>Field name</th>
<th>Field type</th>
<th>min</th>
<th>max</th>
<th>average number</th>
<th>count</th>
<th>count not null</th>
<th>count unique</th>
<th>median</th>
<th>modus</th>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>000</td>
<td>900</td>
<td> </td>
<td>51</td>
<td>43</td>
<td>20</td>
<td>600</td>
<td>623</td>
</tr>
<tr>
<td>EMP_NO</td>
<td align="center">integer</td>
<td>0.0</td>
<td>145.0</td>
<td>58.3921568627451</td>
<td>51</td>
<td>51</td>
<td>47</td>
<td>45.0</td>
<td>2.0</td>
</tr>
<tr>
<td>FIRST_NAME</td>
<td align="center">string</td>
<td>Andrew</td>
<td>Yuki</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>44</td>
<td>Mark</td>
<td>Robert</td>
</tr>
<tr>
<td>FULL_NAME</td>
<td align="center">string</td>
<td>Baldwin, Janet</td>
<td>Young, Katherine</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>50</td>
<td>Lee, Terri</td>
<td>Sutherland, Claudia</td>
</tr>
<tr>
<td>HIRE_DATE</td>
<td align="center">date</td>
<td>28/12/1988 00:00:00</td>
<td>15/11/1994 00:00:00</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>44</td>
<td>20/04/1992 00:00:00</td>
<td>02/01/1994 00:00:00</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Admin</td>
<td>Vice President</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>16</td>
<td>Mktg</td>
<td>Eng</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Canada</td>
<td>USA</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>9</td>
<td>USA</td>
<td>USA</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>0.0</td>
<td>5.0</td>
<td>3.411764705882353</td>
<td>51</td>
<td>51</td>
<td>6</td>
<td>4.0</td>
<td>4.0</td>
</tr>
<tr>
<td>LAST_NAME</td>
<td align="center">string</td>
<td>Baldwin</td>
<td>Young</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>48</td>
<td>Lambert</td>
<td>Johnson</td>
</tr>
<tr>
<td>PHONE_EXT</td>
<td align="center">string</td>
<td>1</td>
<td>null</td>
<td> </td>
<td>51</td>
<td>51</td>
<td>46</td>
<td>3355</td>
<td>null</td>
</tr>
<tr>
<td>SALARY</td>
<td align="center">number</td>
<td>0.0</td>
<td>9.9E7</td>
<td>2267125.137254902</td>
<td>51</td>
<td>51</td>
<td>40</td>
<td>61637.8125</td>
<td>0.0</td>
</tr>
</tbody>
</table>
<p>Histograms</p>
<table border="1">
<tbody>
<tr>
<th>Field name</th>
<th>Field type</th>
<th>value</th>
<th>count</th>
<th>count %</th>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td> </td>
<td>8</td>
<td>
<code style="color:#333333;">###############·····················································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>000</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>100</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>110</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>115</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>120</td>
<td>3</td>
<td>
<code style="color:#333333;">#####·······························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>121</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>123</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>125</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>130</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>140</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>180</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>600</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>621</td>
<td>4</td>
<td>
<code style="color:#333333;">#######·····························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>622</td>
<td>3</td>
<td>
<code style="color:#333333;">#####·······························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>623</td>
<td>5</td>
<td>
<code style="color:#333333;">#########···························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>670</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>671</td>
<td>3</td>
<td>
<code style="color:#333333;">#####·······························································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>672</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>DEPT_NO</td>
<td align="center">string</td>
<td>900</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Admin</td>
<td>4</td>
<td>
<code style="color:#333333;">#######·····························································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>CEO</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>CFO</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Dir</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Doc</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Eng</td>
<td>15</td>
<td>
<code style="color:#333333;">#############################·······································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Finan</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Inside Sales Coordinator</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Mktg</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Mngr</td>
<td>4</td>
<td>
<code style="color:#333333;">#######·····························································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>PRel</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>SRep</td>
<td>9</td>
<td>
<code style="color:#333333;">#################···················································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Sales</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Sales Representative</td>
<td>6</td>
<td>
<code style="color:#333333;">###########·························································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>VP</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_CODE</td>
<td align="center">string</td>
<td>Vice President</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Canada</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>England</td>
<td>3</td>
<td>
<code style="color:#333333;">#####·······························································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>France</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Italy</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Japan</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Sales</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>Switzerland</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>UK</td>
<td>4</td>
<td>
<code style="color:#333333;">#######·····························································································</code>
</td>
</tr>
<tr>
<td>JOB_COUNTRY</td>
<td align="center">string</td>
<td>USA</td>
<td>36</td>
<td>
<code style="color:#333333;">######################################################################······························</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>0.0</td>
<td>1</td>
<td>
<code style="color:#333333;">#···································································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>1.0</td>
<td>2</td>
<td>
<code style="color:#333333;">###·································································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>2.0</td>
<td>8</td>
<td>
<code style="color:#333333;">###############·····················································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>3.0</td>
<td>14</td>
<td>
<code style="color:#333333;">###########################·········································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>4.0</td>
<td>16</td>
<td>
<code style="color:#333333;">###############################·····································································</code>
</td>
</tr>
<tr>
<td>JOB_GRADE</td>
<td align="center">number</td>
<td>5.0</td>
<td>10</td>
<td>
<code style="color:#333333;">###################·················································································</code>
</td>
</tr>
</tbody>
</table>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/537/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/537/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/537/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=537&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/03/31/data-profiling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/934c88184df6c0034450ae00a1695ee8?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">agad</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/03/basicstatistic.png" medium="image">
			<media:title type="html">BasicStatistic</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/03/advancedstatistic.png" medium="image">
			<media:title type="html">AdvancedStatistic</media:title>
		</media:content>
	</item>
		<item>
		<title>CloverETL in non-enterprise action</title>
		<link>http://blog.cloveretl.com/2009/07/17/cloveretl-in-non-enterprise-action/</link>
		<comments>http://blog.cloveretl.com/2009/07/17/cloveretl-in-non-enterprise-action/#comments</comments>
		<pubDate>Fri, 17 Jul 2009 09:29:40 +0000</pubDate>
		<dc:creator>Vaclav Matous</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[results]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=52</guid>
		<description><![CDATA[CloverETL can be used not only in enterprise environment, but also in sport and entertainment industry. Prague hosted 10th FIMBA World Maxibasketball Championship in the first week of july. More than 160 teams from 31 countries took part in this popular event. Match data (results and statistics) was transmitted in xls format, the most popular [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=52&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-55" title="logo_maxibasketball" src="http://cloveretl.files.wordpress.com/2009/07/logo_maxibasketball.gif?w=90&#038;h=90" alt="logo_maxibasketball" width="90" height="90" /><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } --><span lang="en-US">CloverETL can be used not only in enterprise environment, but also in sport and entertainment industry. Prague hosted <a href="http://maxibasketballprague2009.com" target="_blank">10<sup>th</sup> FIMBA World Maxibasketball Championship</a> in the first week of july. More than 160 teams from 31 countries took part in this popular event.</span></p>
<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } --><span lang="en-US">Match data (results and statistics) was transmitted in xls format, the most popular format for these purposes I guess. Although the excel files are user-friendly, for automatical processing are slightly inconvenient. CloverETL transformed xls data and stored them into a database to be used in more comfortable way from data engineer&#8217;s point of view.</span></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/52/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/52/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/52/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=52&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/07/17/cloveretl-in-non-enterprise-action/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/73f56f1267c1896b11e3c6df97499559?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">vmatous</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/07/logo_maxibasketball.gif" medium="image">
			<media:title type="html">logo_maxibasketball</media:title>
		</media:content>
	</item>
	</channel>
</rss>