Tag Archives: XLS

Integration of CloverETL with PHP

To witness the power of Clover, consider the following scenario. Our customer required an application whose main purpose was generating reports in xls format. What would seem like a simple task was presented with difficult challenges:

  • Reports had  many different formats specified by many parameters.
  • Reports were generated from heterogeneous sources – xls files, database tables and IBM iSeries files (former AS/400).
  • Application users modified data (mainly added records) in a report before its final confirmation.
  • The confirmation of a report invoked updates in several database tables and generated log records.
  • Short response times were needed for the most of operations.

We knew right away we would approach this situation with a web application written in PHP because web application architecture brought us plenty of needed advantages. But how can we quickly and reliably integrate heterogeneous sources? This is exactly the type of task that ETL tools were developed for. That’s why we built the whole solution using CloverETL Server with the Launch Services (equivalent of web services) feature. The following schema should illustrate the architecture of our solution.

Users of the application communicate only with the web application layer (1) that uses its own database for storing temporary data, setup information, and a history of reported records. This part also manages access permissions based on user roles.

When a user wants to view some report, he or she sets the proper parameters and other restrictions (e.g. type of records to be reported or dates scope) in the web application using his/her preferred web browser (IE, Firefox …). Confirming the filled parameters invokes a call to a Clover Launch Service (2) and a transformation connected with this service. The transformation integrates data from heterogeneous sources and stores it into a database dedicated to the web application. All stored records are identified by runID (transformation execution’s unique identifier within Clover Server) and the service returns the same runID as a response from its call.

The web application allows users to modify only certain data (according to partial runID). Users can insert, change or delete records (3). These changes have an effect only within the database of the web application and play only a temporary role until the user confirms the report. The confirmation again invokes  a call to the Clover Launch Service that then generates the needed reports, logs records about processed changes, and propagates changes into a history of reported records (4).

To summarize this solution, we moved the entire application logic into a CloverETL Server which allows us to solve such problems with ease. Furthermore we took advantage of Clover Launch Services – a combination of ETL integration power and online processing.

In my opinion, there are many similar problems where the concept described above cuts the time and effort to reach a solution.

How to Easily Enrich Data Using CloverETL’s Auto-filling feature

Users often need to retrieve data from a data source which does not contain this related data but is easily defined. Thus, it is important to be able to add further information to your source that is not already present in the file (e.g. time stamp, name of excel sheet). Such additional information can simplify further data processing.

For example:

  • Each file has a time of creation, its size, name and the path where it is located, and (in case of an XLS file) the name of the sheet that is read.
  • When a data source is being read, the reader starts to work at an explicit time; each record is also read at an explicit time.
  • Records can be numbered in the order in which they are read.
  • Information about errors may be available (in DBExecute and DBOutputTable).

All this information can easily enrich the read data in CloverETL by using the auto-filling functionality.

Auto-filling is a feature that is available on the metadata definition level. When you open the Metadata Editor and select any field of the metadata there is an Autofilling property (under the Advance tab). You can select one of the following values:

Name Date type Description
default_value any type When the null value is assign to the field and the field is marked as non-nullable, the null is replaced by the value of Default property of the field
global_row_count any numeric Sequence number of the record in a data source starting from 0. The number isn’t reset for each input file while the wildcards are used in File URL (${DATAIN_DIR}/input*.txt)
source_row_count any numeric Sequence number of the record in data source, starting from 0. The number is reset to 0 for each input file while the wildcards are used in File URL (${DATAIN_DIR}/input*.txt)
metadata_row_count any numeric Similar to global_row_count. But when the reader component supports more than one type of output metadata (XMLExtract, XMLXPathReader) each metadata has a separate counter.
metadata_source_row_count any numeric Similar to source_row_count. But when the reader component supports more than one type of output metadata (XMLExtract, XMLXPathReader) each metadata has a separate counter.
source_name string Name of data source. For file readers it’s fully qualified path (ex. /home/user/input.csv), for DataGenerator it’s ID of graph component, SQL query for DBInputTable, fully qualified class name for JMSReader
source_size any numeric Size of the file in bytes. 0 for non-file readers.
source_timestamp date Date and time of the creation of the file. Empty (null) in all non-file readers.
reader_timestamp date Date and time when reader starts read data
row_timestamp date Date and time when the reader starts read the record
sheet_name string Name of the sheet, only in XLSDataReader
ErrCode any numeric Error code returned by database engine, only in DBExecute and DBOutputTable
ErrText string Error message returned by database engine, only in DBExecute and DBOutputTable

Remember that any of these functions can be applied to a field not contained in a file, database table, generated data, or JMS message.

Use case

Imagine that you have an export of customers from a database in an Excel file. The Excel file is organized into many sheets; each sheet is named by a state abbreviation and contains only customers from one state. See figure bellow.

Now you want to merge all customers to one CSV file but for each customer you want to also store the state in separate column. It looks very easy :-) . For CloverETL it is, not necessarily so for other ETL tools.

The final graph is very simple, there are only two components.

The most important part of this graph is hidden in the definition of the metadata on the edge. We have to enrich the metadata, which we created from the Excel file by using the proper wizard in CloverETL Designer, with a new field “state” by setting the Autofilling property to sheet_name as you can see below. And that’s all.

The Autofilling field can be placed on any position in a metadata definition. It’s conveniant that we place the autofilling fields at the end of the metadata definition, after the field with the record delimiter. But this conveniance does not apply when the same metadata are also used for writing to a flat file (in our case). Thus we moved the autofilling fields to the position before the field with a record delimiter.

When you run the graph, you will get the following results:

You can use multiple autofilling fields with different functions at the same time to get more useful information from the file. For example, you can get the file name and the sheet names at the same time. Or you can get the number the records, etc. For example, when the following auto-filling functions are used (sheet_name, source_size, row_timestamp, global_row_count), the result will look like this:

Remember that only the edges connected to the output port(s) of the following reader components can use the auto-filling functionality:

  • UniversalDataReader
  • CloverDataReader
  • XLSDataReader
  • DBFDataReader
  • MultiLevelReader
  • XMLExtract
  • XMLXPathReader
  • DBInputTable
  • DataGenerator
  • JMSReader

DBOutputTable and DBExecute can use only two error auto-filling functions – ErrCode and ErrText.

Export from a Database to Excel

I was motivated to write this post by inquiries such as this one on MySQL forum about how to export data from database to Excel.  MySQL is not the only one though, similar requests can be found on Oracle or Postgres forums too. Of course a CSV format or doing some coding is always an option, but Clover lets you generate XLS/XLS(X) effortlessly and with some advantages:

  • Clover does NOT require Excel to do be installed on machine where we are doing the export. (Yes, this means you can export even on your Linux MySQL installation.)
  • We can export from any database that has a JDBC driver.(MySQL, Oracle, Informix, DB2, Postgres)
  • Export can run on any platform – Windows, OS X, Linux.
  • It is possible to do additional validation and manipulation before writing to Excel.

To keep things simple we will work just with a single database table called customers stored in MySQL:

Our goals in this post are:

  1. Connect to the database and download the customer data.
  2. Write the customer data to Excel sheet, including the header with column names.

And finally, to make it more interesting

3. Split customers into multiple sheets based on the state of their origin.

Let’s build the graphs now:

Step 1: Connecting to MySQL database

Create a new project and a new graph (see the videos if you don’t know what I am talking about), go to Outline window and new database connection.

Connection to MySQL database

Connection to my MySQL database

Step 2: Building the graph

Build the graph from DBInputTable and XLSDataWriter components. DBInputTable reads data from database table, or result of SQL query, while ExcelDataWriter writes it to Excel spreadsheet.

Graph exporting from MySQL to Excel

Graph exporting from MySQL to Excel

You will need a metadata definition to describe the structure of your database records. The easiest way to do this is to use a wizard from Outline window. Right click on Metadata entry and select Metadata > Extract from database. Once done, assign your metadata on the link (edge) between the DBInputTable and XLSDataWriter. Basic structure of our export graph is laid out.

Metadata extracted from database

Metadata extracted from database

We’ll configure the components now.

Configure the DBInputTable to use the MySQL connection we created in the previous step. Supply the SQL query to retrieve the data. Mine is just simple “select *”, but it can be any statement – joins, aggregation, while-filtering etc.

DBInputTable Configuration

DBInputTable Configuration

Next, configure the XLSDataWriter to write into desired output file. Notice the property settings. To include header on the very first line, we set the Metadata row property is to 1. Data must be appended after the header row. That is why the Start row property is set to 2.

XLSDataWriter configuration

XLSDataWriter Configuration

Step 4. Run the graph

Start the graph now and check the exported data in your output file. If you just wanted to export the data you are done!

Exported data in Excel format

Exported data in Excel format

Optionally: Split data to sheets based on field value

Clover has a nifty feature of being able to write data into multiple sheets, splitting them based on a field value. Let’s split our customers into sheets based on the state of their origin. To do this we only need to change a setting  in XLSDataWriter configuration. Set the Sheet name property to value $state. This tells the writer to examine the value of state field of each record to be written our and place it to a matching sheet.

Export to multiple=

The resulting Excel file now contains multiple worksheets (one for each US state) and all records in the same worksheet have the same state.

Data exported to multiple=

After the reading is set up, you can further extend the graph with some more components – filters, sorters, duplicate removals, or reformat to compute new values or change structure of output. Of course, designing a reverse process – import from Excel to a database is similarly straightforward. Happy exporting!

CloverETL 2.9 Released: Infobright Data Writer, Web Services Component and Other New Features.

New CloverETL version 2.9. was just released. This version brings a new Infobright Data Writer component, enhances the connectivity by adding Web Services component and adds features that simplify common data transformation tasks.

New Features and Components:
Infobright Data Writer
In response to customer requests, this component writes data into Infobright software, a column-oriented relational database. Infobright is a provider of solutions designed to deliver a scalable data warehouse optimized for analytic queries.

Web Services component
The new component makes communication with Web Services easier than ever. It provides user friendly graphical interface for mapping your data into Web Service fields, automatically generates requests and process responses. It offers faster, easier and more comfortable way to interact with remote Web Services.

Reading formatted values from XLS
Additionally to reading plain data from MicrosoftTM ExcelTM sheets, the Excel component is now also capable of reading user-formatted values such as currencies, dates or numbers.

New tracking option
Customers can now see all absolute speed rates for finished data transformations, facilitating comparative analysis in pursuit of process improvements.

New Aspell Lookup table
Brand new implementation of this component brings better performance, improved configuration and better customization.

Improved treatment of empty (NULL) values
Developers can now specify special strings that should be treated as empty (NULL) when data is being parsed. This feature simplifies processing of typical application export files which often contain values insignificant for ETL processing. Additionally it may lead to improved processing throughput and lower memory consumption of data transformation.

More user friendly File URL dialog and improved LDAP functionality.

Customers can evaluate these new features along with CloverETL’s other leading capabilities with a free 30-day trial of the CloverETL Designer Pro evaluation, which is available at www.cloveretl.com Information management professionals can also evaluate the enterprise integration features of CloverETL Server via an online demo, which is also available at www.cloveretl.com.

Partitioning output records into m excel files with n sheets

Customers often tend to have obscure requirements. In a recent project we faced an interesting issue. Output records had to be split into unknown number of excel files according to their category. In addition, records within each file should have been written in datasheets according to their subcategory. The number of subcategories varied from 1 to 1024, so the whole solution seemed to me quite impractical.

Fortunately, we could solve the customer’s requirement very easily using CloverETL. For example, there are (among others) two fields – category and subcategory – in your metadata coming into XLSWriter. Then, if you set File URL in the form of filename_#.xls, Data sheet set as $subcategory and Partition key as category, the writer will split records into files according to the categories and into datasheets according to the subcategories.

Finally, the customer came to a conclusion that one file with many records is better than dozens datasheets within dozens files with very few records.

Settings of XLS_WRITER

Settings of XLS_WRITER