<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>CloverETL&#039;s Blog &#187; slowly changing dimension</title>
	<atom:link href="http://blog.cloveretl.com/tag/slowly-changing-dimension/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cloveretl.com</link>
	<description>Life, the Universe, CloverETL and everything ...</description>
	<lastBuildDate>Thu, 15 Jul 2010 14:12:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.cloveretl.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/dd4c2411bcdf90b36e88bda58e3fce7c?s=96&#038;d=http://s2.wp.com/i/buttonw-com.png</url>
		<title>CloverETL&#039;s Blog &#187; slowly changing dimension</title>
		<link>http://blog.cloveretl.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.cloveretl.com/osd.xml" title="CloverETL&#039;s Blog" />
	<atom:link rel='hub' href='http://blog.cloveretl.com/?pushpress=hub'/>
		<item>
		<title>Building DWH with CloverETL: Slowly Changing Dimension Type 2</title>
		<link>http://blog.cloveretl.com/2010/05/27/building-dwh-with-cloveretl-scd2/</link>
		<comments>http://blog.cloveretl.com/2010/05/27/building-dwh-with-cloveretl-scd2/#comments</comments>
		<pubDate>Thu, 27 May 2010 01:39:40 +0000</pubDate>
		<dc:creator>Petr Uher</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[CloverETL]]></category>
		<category><![CDATA[dwh]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[scd2]]></category>
		<category><![CDATA[slowly changing dimension]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=668</guid>
		<description><![CDATA[In the last part of our data warehouse (DWH) tutorial, I showed you how to load a dimension table that stores historical data according to the Slowly Changing Dimension Type 1 (SCD1). In today’s post, I will focus on a Slowly Changing Dimension Type 2 (SCD2) dimension table. I think that SCD2 is the most [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=668&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/">In the last part</a> of our data warehouse (DWH) tutorial, I showed you how to load a dimension table that stores historical data according to the Slowly Changing Dimension Type 1 (SCD1). In today’s post, I will focus on a Slowly Changing Dimension Type 2 (SCD2) dimension table. I think that SCD2 is the most challenging sub-task of ETL part of DWH design and each ETL architect should be able to deal with it.</p>
<p>In contrast to SCD1, SCD2 table stores preserves history of attributes. So once the value of attribute is changed in external system  (OLTP) we have to create a new record in SCD2 dimension table with the actual value but we also have to mark the old record in SCD2 table as obsolete. The most common way to obsolete the record is to maintain two additional attributes: valid_from and valid_to. Then the record is considered valid at particular date D when valid_from &lt; D ≤ valid_to. You can find a detailed explanation of SCD2 principles in <a href="http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1274661375&amp;sr=8-1">Kimball’s DWH bible</a> or on <a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2">wikipedia.org</a>.</p>
<p>Let us show how SCD2 works in real in a small example. We will use DWH schema introduced in SCD1’s post.</p>
<p style="text-align:left;"><a href="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png"><img class="size-full wp-image-218 aligncenter" title="DB schema of sample DWH" src="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png?w=648&#038;h=273" alt="" width="648" height="273" /></a></p>
<p>It consists of four dimensions (Customer, Product, Store and Date), one degenerate dimension (invoice number) and one fact table (Sales). Fact table stores two additive facts: units and total price.</p>
<p>Store table is populated as SCD1 and we will load Customer table that was marked as SCD2 dimension table. Let’s imagine that Customer changed his email. What will happen in OLTP and DWH Customer table named D_CUSTOMER?</p>
<p><strong>OLTP:</strong></p>
<p>C0001;John;Newman;john.newman@hotmail.com <span style="color:#0000ff;">=&gt;</span> C0001;John;Newman;newman.john@gmail.com</p>
<p><strong>DWH:</strong></p>
<p>0001;C0001;John;Newman;john.newman@hotmail.com;2009-10-10;<span style="color:#ff0000;">null</span> <span style="color:#0000ff;">=&gt;</span><br />
0001;C0001;John;Newman;john.newman@hotmail.com;2009-10-10;<span style="color:#ff0000;">2010-05-20</span><br />
0002;C0001;John;Newman;newman.john@gmail.com;2010-05-21;null</p>
<p>Notice especially the first two attributes (columns) and the last two attributes of DWH table. The first attribute is a surrogate key, it is a unique identifier of the record in D_CUSTOMER table. It is generated by ETL process.  The second one (C0001) is a natural key, a unique identifier of customer in OLTP. When you list all records of the same natural key in D_CUSTOMER you will get a complete history of one customer.</p>
<p>The principle how SCD2 works is explained now I will describe an implementation of SCD2 in CloverETL. See the CloverETL’s graph bellow.</p>
<p><a href="http://cloveretl.files.wordpress.com/2010/05/d_customer_scd2.png"><img class="aligncenter size-full wp-image-670" title="D_CUSTOMER_SCD2" src="http://cloveretl.files.wordpress.com/2010/05/d_customer_scd2.png?w=799&#038;h=166" alt="" width="799" height="166" /></a></p>
<p>The basic data-flow of the graph is very simple: in the lower branch we read the data from OLTP (for us as in previous DWH post it’s a CSV file), in the upper branch we read data that is already stored in our SCD2 table. We have to use the Dedup component, as we want only one actual record for each customer. The two branches intersect in DataIntersection component that processes the records according to the natural key. The component has three output ports, as there are three possible outcomes:</p>
<ol>
<li>The record exists only in DWH. This should not happen, it means that the record was deleted in OLTP. The “normal” OLTPs do not allow delete of records. That kind of records end in Trash component.</li>
<li>In DWH table there exists at least one record with the same natural key as the record coming from OLTP. That record goes through the second output port to the component that identifies whether the record was changed (ExtFilter component). And then the record is copied to two records: the first one that obsoletes the current record in D_CUSTOMER (identified by surrogate key) and  the second one that is inserted to D_CUSTOMER and stores the new values read from OLTP. The first one set column valid_to = today()-1 and the second record is inserted with valid_from = today() and valid_to = null.</li>
<li>The record coming from OLTP is a new one, there is no record with the same natural key in DWH. In that case the record is sent to the third output port and in following components is inserted to D_CUSTOMER table with valid_from = today() and valid_to = null.</li>
</ol>
<p>If you want to verify that your CloverETL SCD2 graph works correctly or if you are looking for sample data, you can simply import example project to your Clover installation. It is embedded to your CloverETL Designer as a DWHExample project. For more information how to import example project see <a href="http://www.cloveretl.com/documentation/UserGuide/topic/com.cloveretl.gui.docs/docs/cloveretl-examples-project.html">online documentation</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/668/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/668/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/668/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=668&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2010/05/27/building-dwh-with-cloveretl-scd2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cec19d3aab248f6f4591a97277323f2c?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">puher</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png" medium="image">
			<media:title type="html">DB schema of sample DWH</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2010/05/d_customer_scd2.png" medium="image">
			<media:title type="html">D_CUSTOMER_SCD2</media:title>
		</media:content>
	</item>
		<item>
		<title>Building DWH with CloverETL: Slowly Changing Dimension Type 1</title>
		<link>http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/</link>
		<comments>http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 10:21:56 +0000</pubDate>
		<dc:creator>Petr Uher</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[clover]]></category>
		<category><![CDATA[data warehousing]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[SCD1]]></category>
		<category><![CDATA[slowly changing dimension]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=213</guid>
		<description><![CDATA[The very typical usage of ETL tools is loading the data warehouse (DWH). So I decided to write a tutorial that will describe typical data warehouse tasks (slowly changing dimensions, date dimension, filling fact tables) and propose solutions with using of CloverETL. If you are a newbie in data warehousing I recommend you reading some [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=213&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>The very typical usage of ETL tools is loading the data warehouse (DWH). So I decided to write a tutorial that will describe typical data warehouse tasks (slowly changing dimensions, date dimension, filling fact tables) and propose solutions with using of CloverETL.</p>
<p>If you are a newbie in data warehousing I recommend you reading some of the books by <a href="http://eu.wiley.com/WileyCDA/Section/id-302479.html?query=Ralph+Kimball">Ralph Kimball</a> or <a href="http://eu.wiley.com/WileyCDA/Section/id-302479.html?query=W.+H.+Inmon">H. I. Imon</a>.</p>
<p>Sample data warehouse collects the information about sales for a small store chain that offers electronics like iPod, MP3, laptops etc.</p>
<p>The DB schema of my data warehouse is very simple. It consists of four dimensions (Customer, Product, Store and Date), one degenerate dimension (invoice number) and one fact table (Sales). Fact table stores two additive facts: units and total price. you can see complete DB scheme on figure below.</p>
<div id="attachment_218" class="wp-caption aligncenter" style="width: 658px"><img class="size-full wp-image-218" title="DB schema of sample DWH" src="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png?w=648&#038;h=273" alt="DB schema of sample DWH" width="648" height="273" /><p class="wp-caption-text">DB schema of sample DWH</p></div>
<h2>Store dimension</h2>
<p>One thing you will surely face when you build data warehouse is working with several types of slowly changing dimension (SCD). In this part of the tutorial I used the simplest SCD type.</p>
<p>The simplest and surely the most popular SCD type among ETL developers is <a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_1">slowly changing dimension type 1</a>. It doesn&#8217;t store any history, so once the value in online transaction processing system (OLTP) has been changed, the value in DWH is immediately overwritten as well.</p>
<p>I decided to use SCD1 for store dimension which collects basic information about stores: address, store identifier, store manager etc. Each store is identified in OLTP by unique store number (natural key). But for DWH I have to generate an own surrogate identifier ID_D_STORE.</p>
<p>The basic idea of processing SCD1 is very simple: compare records in DWH and OLTP, insert  missing records into DWH, update the DWH records according to OLTP. For all these tasks the attribute that helps us to find corresponding records is the natural key – STORE_NUMBER.</p>
<p>So let&#8217;s go to develop CloverETL graph. For better portability all inputs and output data are stored in csv files, thus you don&#8217;t have to configure any database. The store dimension of DWH is stored in D_STORE.tbl file, the actual data from OLTP are stored in Store_25092009.csv. In both of these files we have to read, sort on natural key STORE_NUMBER and find the records that aren&#8217;t in D_STORE (third output of DataIntersection). In this last step we will use DataIntersection component. Simultaneously (by second output of DataIntersection) we get the potential records that can be different in OLTP and DWH. These records are then filtered and only the records having any different value of any attribute are processed and new values are stored to D_STORE_update.tbl file. New records are written to D_STORE_insert.tbl file once ID_D_STORE attribute is added. ID_D_STORE attribute gains its value from sequence that we have already defined in CloverETL in advance. And that&#8217;s all. You can see the resulting graph below.</p>
<div id="attachment_217" class="wp-caption aligncenter" style="width: 1034px"><img class="size-full wp-image-217" title="D_STORE_SCD1" src="http://cloveretl.files.wordpress.com/2009/09/d_store_scd1_cut1.png?w=1024&#038;h=260" alt="CloverETL graph D_STORE_SCD1" width="1024" height="260" /><p class="wp-caption-text">CloverETL graph D_STORE_SCD1</p></div>
<p>If you want to read/write data from/to database easily replace UniversalDataReaders with DBInputTables and UniversalDataWriters with DBOutputTable components.</p>
<p>You can download complete CloverETL project <a href="http://drop.io/cbhirdh/asset/scd1-example-zip">here</a>.</p>
<p>To be continued. In the next part we will deal with <a href="http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2">slowly changing dimension type 2</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/213/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/213/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/213/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/213/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/213/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/213/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=213&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/10/08/building-dwh-with-cloveretl-scd1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cec19d3aab248f6f4591a97277323f2c?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">puher</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/09/dwh-example.png" medium="image">
			<media:title type="html">DB schema of sample DWH</media:title>
		</media:content>

		<media:content url="http://cloveretl.files.wordpress.com/2009/09/d_store_scd1_cut1.png" medium="image">
			<media:title type="html">D_STORE_SCD1</media:title>
		</media:content>
	</item>
	</channel>
</rss>