<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>CloverETL&#039;s Blog &#187; apache</title>
	<atom:link href="http://blog.cloveretl.com/tag/apache/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cloveretl.com</link>
	<description>Life, the Universe, CloverETL and everything ...</description>
	<lastBuildDate>Thu, 15 Jul 2010 14:12:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.cloveretl.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/dd4c2411bcdf90b36e88bda58e3fce7c?s=96&#038;d=http://s2.wp.com/i/buttonw-com.png</url>
		<title>CloverETL&#039;s Blog &#187; apache</title>
		<link>http://blog.cloveretl.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.cloveretl.com/osd.xml" title="CloverETL&#039;s Blog" />
	<atom:link rel='hub' href='http://blog.cloveretl.com/?pushpress=hub'/>
		<item>
		<title>Parsing of an Apache access log</title>
		<link>http://blog.cloveretl.com/2009/05/07/parsing-of-an-apache-access-log/</link>
		<comments>http://blog.cloveretl.com/2009/05/07/parsing-of-an-apache-access-log/#comments</comments>
		<pubDate>Thu, 07 May 2009 07:21:44 +0000</pubDate>
		<dc:creator>Vaclav Matous</dc:creator>
				<category><![CDATA[Using CloverETL]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[log]]></category>
		<category><![CDATA[parsing]]></category>

		<guid isPermaLink="false">http://blog.cloveretl.com/?p=33</guid>
		<description><![CDATA[The UniversalDataReader is designed for reading files in various formats. We use this component for many purposes. One of them is parsing of an Apache access log. The file normally includes records in a commonly used combined log format, e.g.: 127.0.0.1 &#8211; frank [10/Oct/2000:13:55:36 -0700] &#8220;GET /apache_pb.gif HTTP/1.0&#8243; 200 2326 &#8220;http://www.example.com/start.html&#8221; &#8220;Mozilla/4.08 [en] (Win98; I [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=33&subd=cloveretl&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } --></p>
<p style="margin-bottom:0;"><span lang="cs-CZ">The UniversalDataReader is designed for reading files in various formats. We </span><span lang="cs-CZ"> use  this component for many purposes. One of them is parsing of an Apache access log. The file normally includes records in a commonly used combined log format, e.g.: </span></p>
<p style="margin-bottom:0;" lang="cs-CZ"><em>127.0.0.1 &#8211; frank [10/Oct/2000:13:55:36 -0700] &#8220;GET /apache_pb.gif HTTP/1.0&#8243; 200 2326 &#8220;http://www.example.com/start.html&#8221; &#8220;Mozilla/4.08 [en] (Win98; I ;Nav)&#8221;</em></p>
<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } --></p>
<p style="margin-bottom:0;" lang="cs-CZ">Fields in the record are delimited by a space mark. But a space can be included in some quoted fields, such as <em>&#8220;GET /apache_pb.gif HTTP/1.0&#8243;</em>, so a single space is an improper delimiter. Fortunately, CloverETL allows you to define  variable delimiters in metadata. So parsing of the log depends only on a proper setting of metadata on an output edge from the reader. In our case we defined following delimiters: space, space, space+left square bracket, right square bracket+space+quotation mark, quotation mark+space etc.</p>
<p><!-- 		@page { size: 21cm 29.7cm; margin: 2cm } 		P { margin-bottom: 0.21cm } --></p>
<p style="margin-bottom:0;" lang="cs-CZ">The complete example with an additional computing of the most visited pages and the most visiting IP addresses can be found in <a href="http://www.cloveretl.com/download/examples/cloverETL.examples.rel-2-7-0.zip">Advanced Examples from release 2-7-0</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cloveretl.wordpress.com/33/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cloveretl.wordpress.com/33/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/cloveretl.wordpress.com/33/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/cloveretl.wordpress.com/33/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/cloveretl.wordpress.com/33/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/cloveretl.wordpress.com/33/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/cloveretl.wordpress.com/33/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/cloveretl.wordpress.com/33/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/cloveretl.wordpress.com/33/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/cloveretl.wordpress.com/33/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.cloveretl.com&blog=7070972&post=33&subd=cloveretl&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://blog.cloveretl.com/2009/05/07/parsing-of-an-apache-access-log/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/73f56f1267c1896b11e3c6df97499559?s=96&#38;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">vmatous</media:title>
		</media:content>
	</item>
	</channel>
</rss>