<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>CloverETL&#039;s Blog &#187; data warehousing</title>
	<atom:link href="http://blog.cloveretl.com/tag/data-warehousing/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cloveretl.com</link>
	<description>Life, the Universe, CloverETL and everything ...</description>
	<lastBuildDate>Thu, 15 Jul 2010 14:12:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.cloveretl.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/dd4c2411bcdf90b36e88bda58e3fce7c?s=96&#038;d=http://s2.wp.com/i/buttonw-com.png</url>
		<title>CloverETL&#039;s Blog &#187; data warehousing</title>
		<link>http://blog.cloveretl.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.cloveretl.com/osd.xml" title="CloverETL&#039;s Blog" />
	<atom:link rel='hub' href='http://blog.cloveretl.com/?pushpress=hub'/>
		<item>
		<title>Building DWH with CloverETL: Slowly Changing Dimension Type 1</title>
		<link>http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/</link>
		<comments>http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 10:21:56 +0000</pubDate>
		<dc:creator>Petr Uher</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[clover]]></category>
		<category><![CDATA[data warehousing]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[SCD1]]></category>
		<category><![CDATA[slowly changing dimension]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=213</guid>
		<description><![CDATA[The very typical usage of ETL tools is loading the data warehouse (DWH). So I decided to write a tutorial that will describe typical data warehouse tasks (slowly changing dimensions, date dimension, filling fact tables) and propose solutions with using of CloverETL. If you are a newbie in data warehousing I recommend you reading some [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=213&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>The very typical usage of ETL tools is loading the data warehouse (DWH). So I decided to write a tutorial that will describe typical data warehouse tasks (slowly changing dimensions, date dimension, filling fact tables) and propose solutions with using of CloverETL.</p>
<p>If you are a newbie in data warehousing I recommend you reading some of the books by <a href="http://eu.wiley.com/WileyCDA/Section/id-302479.html?query=Ralph+Kimball">Ralph Kimball</a> or <a href="http://eu.wiley.com/WileyCDA/Section/id-302479.html?query=W.+H.+Inmon">H. I. Imon</a>.</p>
<p>Sample data warehouse collects the information about sales for a small store chain that offers electronics like iPod, MP3, laptops etc.</p>
<p>The DB schema of my data warehouse is very simple. It consists of four dimensions (Customer, Product, Store and Date), one degenerate dimension (invoice number) and one fact table (Sales). Fact table stores two additive facts: units and total price. you can see complete DB scheme on figure below.</p>
<div id="attachment_218" class="wp-caption aligncenter" style="width: 658px"><img class="size-full wp-image-218" title="DB schema of sample DWH" src="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png?w=648&#038;h=273" alt="DB schema of sample DWH" width="648" height="273" /><p class="wp-caption-text">DB schema of sample DWH</p></div>
<h2>Store dimension</h2>
<p>One thing you will surely face when you build data warehouse is working with several types of slowly changing dimension (SCD). In this part of the tutorial I used the simplest SCD type.</p>
<p>The simplest and surely the most popular SCD type among ETL developers is <a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_1">slowly changing dimension type 1</a>. It doesn&#8217;t store any history, so once the value in online transaction processing system (OLTP) has been changed, the value in DWH is immediately overwritten as well.</p>
<p>I decided to use SCD1 for store dimension which collects basic information about stores: address, store identifier, store manager etc. Each store is identified in OLTP by unique store number (natural key). But for DWH I have to generate an own surrogate identifier ID_D_STORE.</p>
<p>The basic idea of processing SCD1 is very simple: compare records in DWH and OLTP, insert  missing records into DWH, update the DWH records according to OLTP. For all these tasks the attribute that helps us to find corresponding records is the natural key – STORE_NUMBER.</p>
<p>So let&#8217;s go to develop CloverETL graph. For better portability all inputs and output data are stored in csv files, thus you don&#8217;t have to configure any database. The store dimension of DWH is stored in D_STORE.tbl file, the actual data from OLTP are stored in Store_25092009.csv. In both of these files we have to read, sort on natural key STORE_NUMBER and find the records that aren&#8217;t in D_STORE (third output of DataIntersection). In this last step we will use DataIntersection component. Simultaneously (by second output of DataIntersection) we get the potential records that can be different in OLTP and DWH. These records are then filtered and only the records having any different value of any attribute are processed and new values are stored to D_STORE_update.tbl file. New records are written to D_STORE_insert.tbl file once ID_D_STORE attribute is added. ID_D_STORE attribute gains its value from sequence that we have already defined in CloverETL in advance. And that&#8217;s all. You can see the resulting graph below.</p>
<div id="attachment_217" class="wp-caption aligncenter" style="width: 1034px"><img class="size-full wp-image-217" title="D_STORE_SCD1" src="http://cloveretl.files.wordpress.com/2009/09/d_store_scd1_cut1.png?w=1024&#038;h=260" alt="CloverETL graph D_STORE_SCD1" width="1024" height="260" /><p class="wp-caption-text">CloverETL graph D_STORE_SCD1</p></div>
<p>If you want to read/write data from/to database easily replace UniversalDataReaders with DBInputTables and UniversalDataWriters with DBOutputTable components.</p>
<p>You can download complete CloverETL project <a href="http://drop.io/cbhirdh/asset/scd1-example-zip">here</a>.</p>
<p>To be continued. In the next part we will deal with <a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2">slowly changing dimension type 2</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/213/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/213/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/213/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/213/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/213/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=213&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cec19d3aab248f6f4591a97277323f2c?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">puher</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png" medium="image">
			<media:title type="html">DB schema of sample DWH</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/09/d_store_scd1_cut1.png" medium="image">
			<media:title type="html">D_STORE_SCD1</media:title>
		</media:content>
	</item>
		<item>
		<title>Partitioning output records into m excel files with n sheets</title>
		<link>http://blog.cloveretl.com/2009/04/02/partitioning-output-records-into-m-excel-files-with-n-sheets/</link>
		<comments>http://blog.cloveretl.com/2009/04/02/partitioning-output-records-into-m-excel-files-with-n-sheets/#comments</comments>
		<pubDate>Thu, 02 Apr 2009 12:14:46 +0000</pubDate>
		<dc:creator>Vaclav Matous</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data transformation]]></category>
		<category><![CDATA[data warehousing]]></category>
		<category><![CDATA[Excel]]></category>
		<category><![CDATA[XLS]]></category>

		<guid isPermaLink="false">http://cloveretl.wordpress.com/?p=21</guid>
		<description><![CDATA[Customers often tend to have obscure requirements. In a recent project we faced an interesting issue. Output records had to be split into unknown number of excel files according to their category. In addition, records within each file should have been written in datasheets according to their subcategory. The number of subcategories varied from 1 [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=21&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } --></p>
<p style="margin-bottom:0;"><span lang="en-US">Customers often tend to have obscure requirements. </span>In a recent project we faced an interesting issue. Output records had to be split into unknown number of excel files according to their category. In addition, records within each file should have been written in datasheets according to their subcategory. The number of subcategories varied from 1 to 1024, <span lang="en-US">so the whole solution seemed to me quite impractical.</span></p>
<p style="margin-bottom:0;">Fortunately, we could solve the customer’s requirement very easily using CloverETL. For example, there are (among others) two fields – category and subcategory – in your metadata coming into XLSWriter. Then, if you set <em>File URL</em> in the form of <em>filename_#.xls</em>, <span lang="en-US"><em>Data sheet </em></span><span lang="en-US"><span style="font-style:normal;">set as </span></span><span lang="en-US"><em>$subcategory</em></span><span lang="en-US"><span style="font-style:normal;"> and </span></span><span lang="en-US"><em>Partition key</em></span><span lang="en-US"><span style="font-style:normal;"> as </span></span><span lang="en-US"><em>category</em></span><span lang="en-US"><span style="font-style:normal;">, the writer will split records into files according to the categories and into datasheets according to the subcategories.</span></span></p>
<p style="margin-bottom:0;font-style:normal;" lang="en-US">Finally, the customer came to a conclusion that one file with many records is better than dozens datasheets within dozens files with very few records.</p>
<p style="margin-bottom:0;font-style:normal;" lang="en-US">
<div id="attachment_20" class="wp-caption aligncenter" style="width: 667px"><img class="size-full wp-image-20" title="xls_partitioning" src="http://cloveretl.files.wordpress.com/2009/04/xls_partitioning.png?w=657&#038;h=535" alt="Settings of XLS_WRITER" width="657" height="535" /><p class="wp-caption-text">Settings of XLS_WRITER</p></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/21/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/21/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/21/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=21&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/04/02/partitioning-output-records-into-m-excel-files-with-n-sheets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/73f56f1267c1896b11e3c6df97499559?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">vmatous</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/04/xls_partitioning.png" medium="image">
			<media:title type="html">xls_partitioning</media:title>
		</media:content>
	</item>
	</channel>
</rss>