<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>CloverETL&#039;s Blog &#187; parsing</title>
	<atom:link href="http://blog.cloveretl.com/tag/parsing/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cloveretl.com</link>
	<description>Life, the Universe, CloverETL and everything ...</description>
	<lastBuildDate>Thu, 15 Jul 2010 14:12:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.cloveretl.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/dd4c2411bcdf90b36e88bda58e3fce7c?s=96&#038;d=http://s2.wp.com/i/buttonw-com.png</url>
		<title>CloverETL&#039;s Blog &#187; parsing</title>
		<link>http://blog.cloveretl.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.cloveretl.com/osd.xml" title="CloverETL&#039;s Blog" />
	<atom:link rel='hub' href='http://blog.cloveretl.com/?pushpress=hub'/>
		<item>
		<title>Parallel reader</title>
		<link>http://blog.cloveretl.com/2009/10/23/parallel-reader/</link>
		<comments>http://blog.cloveretl.com/2009/10/23/parallel-reader/#comments</comments>
		<pubDate>Fri, 23 Oct 2009 11:36:51 +0000</pubDate>
		<dc:creator>Martin Zatopek</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[clover]]></category>
		<category><![CDATA[ETL]]></category>
		<category><![CDATA[parallel]]></category>
		<category><![CDATA[ParallelReader]]></category>
		<category><![CDATA[parsing]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=252</guid>
		<description><![CDATA[In October release 2.8.1 of Clover we introduced a new component which definitely should attract your attention – the Parallel Reader. The name itself already suggests the goal of the component – improve reading speed by going parallel. The component is very similar to Universal Data Reader in function – it reads delimited flat files [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=252&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>In October release 2.8.1 of Clover we introduced a new component which definitely should attract your attention – the Parallel Reader. The name itself already suggests the goal of the component – improve reading speed by going parallel. The component is very similar to Universal Data Reader in function – it reads delimited flat files like CSV, tab<br />
delimited, etc. &#8211; much hasn&#8217;t changed here. But the real difference comes from under the hood.</p>
<p>There are two major optimizations which allow Parallel Reader to exhibit excellent performance results, especially on server-class machines with fast modern disks or better yet, disk arrays. The first optimization we have done is – of course – reading the file in parallel. The input file is divided into a set of virtual data chunks which are fed into reading threads. These work all together at the same time &#8211; each one parsing data records just from its own file part. The number of threads can be specified by component parameter “Level Of Parallelism” and should reflect the hardware setup – e.g. number of disks in a stripped RAID – to harness the maximum power of Parallel Reader. Another great performance gain we achieved is merely by just simplifying the data parser inside. This parser is as simple as possible – although with limited validation, error handling, and some functionality &#8211; but really, really fast.</p>
<p>Although the new reader has a few limitations coming from its nature, extreme speed in common use cases compensates all these drawbacks. If you are processing big amounts of data (hundreds of megabytes and more) and your transformation does not depend on data records being read in original order, Parallel Reader is here and it might just be the right choice for you – why not give it a try?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/252/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/252/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/252/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=252&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/10/23/parallel-reader/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/a09e97c9dfaf07365d2353d0ed474b28?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">mzatopek</media:title>
		</media:content>
	</item>
		<item>
		<title>Hidden features: Mutable delimiter</title>
		<link>http://blog.cloveretl.com/2009/08/18/hidden-features-mutable-delimiter/</link>
		<comments>http://blog.cloveretl.com/2009/08/18/hidden-features-mutable-delimiter/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 09:38:10 +0000</pubDate>
		<dc:creator>Petr Uher</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[delimited file]]></category>
		<category><![CDATA[delimiter]]></category>
		<category><![CDATA[Hidden features]]></category>
		<category><![CDATA[parsing]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=133</guid>
		<description><![CDATA[CloverETL provides a very useful feature: mutable delimiter. When you parse a delimited file (eg. CSV) you can specify different delimiter for each field. This isn&#8217;t surprising for daily CloverETL users however for users of other ETL tools it can be. It might not be very well known that in CloverETL you can even define [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=133&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>CloverETL provides a very useful feature: <strong>mutable delimiter</strong>. When you parse a delimited file (eg. CSV) you can specify different delimiter for each field. This isn&#8217;t surprising for daily CloverETL users however for users of other ETL tools it can be. It might not be very well known that in CloverETL you can even define more delimiters for one field (so called &#8220;mutable delimiter&#8221;) and CloverETL chooses the right one. It reveals new ways of file processing with irregular structure in CloverETL. I believe this functionality isn&#8217;t provided by  any other ETL tool on the market. If I am wrong you can leave me a message in comments. I&#8217;m always happy to find &#8220;hidden features&#8221; of other ETL tools.</p>
<p>Syntax of a mutable delimiter: delimiters have to be separated by &#8216;<code>\\|</code>&#8216;. For example if you want to define that field delimiter can be &#8216;<code>;</code>&#8216; or &#8216;<code>,</code>&#8216; or &#8216;<code>#</code>&#8216; you have to write &#8216;<code>;\\|,\\|#</code>&#8216;.<br />
The simple example of using a mutable delimiter you can download <a href="http://drop.io/cbhirdh/asset/mutabledelimiter-zip" target="_blank">here</a> as a zipped CloverETL project. The import of existing CloverETL project to your <a href="http://www.cloveretl.com/clover-gui/" target="_blank">CloverETL Designer</a> is described in <a href="http://www.cloveretl.com/_upload/clover-gui/docs/html/manual_html_chunk/ch04.html" target="_blank">CloverETL documentation</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/133/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/133/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/133/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/133/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/133/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/133/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/133/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/133/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/133/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/133/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=133&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/08/18/hidden-features-mutable-delimiter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/cec19d3aab248f6f4591a97277323f2c?s=96&#38;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">puher</media:title>
		</media:content>
	</item>
		<item>
		<title>Parsing of an Apache access log</title>
		<link>http://blog.cloveretl.com/2009/05/07/parsing-of-an-apache-access-log/</link>
		<comments>http://blog.cloveretl.com/2009/05/07/parsing-of-an-apache-access-log/#comments</comments>
		<pubDate>Thu, 07 May 2009 07:21:44 +0000</pubDate>
		<dc:creator>Vaclav Matous</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[parsing]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=33</guid>
		<description><![CDATA[The UniversalDataReader is designed for reading files in various formats. We use this component for many purposes. One of them is parsing of an Apache access log. The file normally includes records in a commonly used combined log format, e.g.: 127.0.0.1 &#8211; frank [10/Oct/2000:13:55:36 -0700] &#8220;GET /apache_pb.gif HTTP/1.0&#8243; 200 2326 &#8220;http://www.example.com/start.html&#8221; &#8220;Mozilla/4.08 [en] (Win98; I [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=33&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } --></p>
<p style="margin-bottom:0;"><span lang="cs-CZ">The UniversalDataReader is designed for reading files in various formats. We </span><span lang="cs-CZ"> use  this component for many purposes. One of them is parsing of an Apache access log. The file normally includes records in a commonly used combined log format, e.g.: </span></p>
<p style="margin-bottom:0;" lang="cs-CZ"><em>127.0.0.1 &#8211; frank [10/Oct/2000:13:55:36 -0700] &#8220;GET /apache_pb.gif HTTP/1.0&#8243; 200 2326 &#8220;http://www.example.com/start.html&#8221; &#8220;Mozilla/4.08 [en] (Win98; I ;Nav)&#8221;</em></p>
<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } --></p>
<p style="margin-bottom:0;" lang="cs-CZ">Fields in the record are delimited by a space mark. But a space can be included in some quoted fields, such as <em>&#8220;GET /apache_pb.gif HTTP/1.0&#8243;</em>, so a single space is an improper delimiter. Fortunately, CloverETL allows you to define  variable delimiters in metadata. So parsing of the log depends only on a proper setting of metadata on an output edge from the reader. In our case we defined following delimiters: space, space, space+left square bracket, right square bracket+space+quotation mark, quotation mark+space etc.</p>
<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } --></p>
<p style="margin-bottom:0;" lang="cs-CZ">The complete example with an additional computing of the most visited pages and the most visiting IP addresses can be found in <a href="http://www.cloveretl.com/download/examples/cloverETL.examples.rel-2-7-0.zip">Advanced Examples from release 2-7-0</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/33/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/33/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/33/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/33/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/33/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/33/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/33/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/33/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/33/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/33/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=33&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/05/07/parsing-of-an-apache-access-log/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/73f56f1267c1896b11e3c6df97499559?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">vmatous</media:title>
		</media:content>
	</item>
	</channel>
</rss>