Tag Archives: oracle

Export from a Database to Excel

I was motivated to write this post by inquiries such as this one on MySQL forum about how to export data from database to Excel.  MySQL is not the only one though, similar requests can be found on Oracle or Postgres forums too. Of course a CSV format or doing some coding is always an option, but Clover lets you generate XLS/XLS(X) effortlessly and with some advantages:

  • Clover does NOT require Excel to do be installed on machine where we are doing the export. (Yes, this means you can export even on your Linux MySQL installation.)
  • We can export from any database that has a JDBC driver.(MySQL, Oracle, Informix, DB2, Postgres)
  • Export can run on any platform – Windows, OS X, Linux.
  • It is possible to do additional validation and manipulation before writing to Excel.

To keep things simple we will work just with a single database table called customers stored in MySQL:

Our goals in this post are:

  1. Connect to the database and download the customer data.
  2. Write the customer data to Excel sheet, including the header with column names.

And finally, to make it more interesting

3. Split customers into multiple sheets based on the state of their origin.

Let’s build the graphs now:

Step 1: Connecting to MySQL database

Create a new project and a new graph (see the videos if you don’t know what I am talking about), go to Outline window and new database connection.

Connection to MySQL database

Connection to my MySQL database

Step 2: Building the graph

Build the graph from DBInputTable and XLSDataWriter components. DBInputTable reads data from database table, or result of SQL query, while ExcelDataWriter writes it to Excel spreadsheet.

Graph exporting from MySQL to Excel

Graph exporting from MySQL to Excel

You will need a metadata definition to describe the structure of your database records. The easiest way to do this is to use a wizard from Outline window. Right click on Metadata entry and select Metadata > Extract from database. Once done, assign your metadata on the link (edge) between the DBInputTable and XLSDataWriter. Basic structure of our export graph is laid out.

Metadata extracted from database

Metadata extracted from database

We’ll configure the components now.

Configure the DBInputTable to use the MySQL connection we created in the previous step. Supply the SQL query to retrieve the data. Mine is just simple “select *”, but it can be any statement – joins, aggregation, while-filtering etc.

DBInputTable Configuration

DBInputTable Configuration

Next, configure the XLSDataWriter to write into desired output file. Notice the property settings. To include header on the very first line, we set the Metadata row property is to 1. Data must be appended after the header row. That is why the Start row property is set to 2.

XLSDataWriter configuration

XLSDataWriter Configuration

Step 4. Run the graph

Start the graph now and check the exported data in your output file. If you just wanted to export the data you are done!

Exported data in Excel format

Exported data in Excel format

Optionally: Split data to sheets based on field value

Clover has a nifty feature of being able to write data into multiple sheets, splitting them based on a field value. Let’s split our customers into sheets based on the state of their origin. To do this we only need to change a setting  in XLSDataWriter configuration. Set the Sheet name property to value $state. This tells the writer to examine the value of state field of each record to be written our and place it to a matching sheet.

Export to multiple=

The resulting Excel file now contains multiple worksheets (one for each US state) and all records in the same worksheet have the same state.

Data exported to multiple=

After the reading is set up, you can further extend the graph with some more components – filters, sorters, duplicate removals, or reformat to compute new values or change structure of output. Of course, designing a reverse process – import from Excel to a database is similarly straightforward. Happy exporting!

Oracle Uses CloverETL for Data Integration in its OIA Server

We are very happy to have learned that Oracle uses CloverETL for data manipulation in its OIA server. Oracle Identity Analytics (OIA) is a solution that provides enterprise role lifecycle management and identity compliance functionality. See Oracle whitepaper http://www.oracle.com/technology/products/id_mgmt/oia/pdf/oia_wp.pdf

Originally Sun Microsystems used CloverETL in its Sun Role Manager. It is a great pleasure to see that Oracle decided to continue using CloverETL for its software after aquiring Sun Microsystems.

DataDirect’s OracleDB JDBC Driver Speed Test

Purpose

Compare the speed of data loading into Oracle database (Oracle Database 11g Release 11.1.0.6.0 – Enterprise Edition) with Oracle corp. JDBC driver, DataDirect JDBC Oracle driver and direct data loading (OracleDataWriter component – sqlldr utility) in CloverETL.

 

Test description

 

Graph used for testing:

DDdbLoad.grf

The above graph loads data into database table that contains 3 number columns and 127 varchar columns.

Database table for storing data is truncated before each data loading – DBExecute components, each with query: TRUNCATE TABLE dd_test1 REUSE STORAGE

Phase 1: loading data with DDBulkLoad (DataDirect) object from csv file (loader.load(file))

Phase 3: loading data with DDBulkLoad (DataDirect) object from ResultSet (loader.load(resultSet)) – created ResultSet implementation, that reads data from DataRecord (read from the edge).

Phase 4: loading data with DBOutputTable with Oracle corp’s  JDBC driver:

Manifest-Version: 1.0
Specification-Title:    Oracle JDBC driver classes for use with JDK14
Sealed: true
Created-By: 1.4.2_08 (Sun Microsystems Inc.)
Implementation-Title:   ojdbc14.jar
Specification-Vendor:   Oracle Corporation
Specification-Version:  Oracle JDBC Driver version - "10.2.0.1.0XE"
Implementation-Version: Oracle JDBC Driver version - "10.2.0.1.0XE"
Implementation-Vendor:  Oracle Corporation
Implementation-Time:    Wed Jan 25 01:28:31 2006

Phase 7: loading data with DBOutputTable with DataDirect Oracle JDBC driver – enabled bulk load feature

Phase 9: loading data with OracleDataWriter component from csv file (sqlldr utility)

Phase 11: loading data with OracleDataWriter component from edge (sqlldr utility)

Test processing

Graph run 3 times for 10,000,000 records with default DataDirect settings.

Graph run 3 times for 1,000,000 records with default DataDirect settings.

Graph run 3 times for 1,000,000 records with following settings:

 

Test results

Results in seconds.

1,000,000 records with default DataDirect settings:

Phase 1: 178, 167, 132 – min: 132, max: 178, average: 159

Phase 3: 128, 166, 152 – min: 128, max: 166, average: 149

Phase 5: 228, 246, 290 – min: 228, max: 290, average: 255

Phase 7: 176, 170, 239 – min: 170, max: 239, average: 195

Phase 9: 44, 45, 56 – min: 44, max: 56, average: 48

Phase 11: 104, 95, 106 – min: 95, max: 104, average: 102

1,000,000 records with custom settings:

Phase 1: 163, 152, 142 – min: 142, max: 163, average: 152

Phase 3: 166, 133, 134 – min: 133, max: 166, average: 144

Phase 5: 278, 263, 260 – min: 260, max: 278, average: 267

Phase 7: 239, 172, 209 – min: 172, max: 239, average: 207

10,000,000 records with default DataDirect settings:

Phase 1: 1553, 1818, 1352 – min: 1352, max: 1818, average: 1574

Phase 3: 1475, 1299, 1298 – min: 1298, max: 1475, average: 1357

Phase 5: 3041, 2592, 2550 – min: 2550, max: 3041, average: 2728

Phase 7: 1824, 1623, 1722 – min: 1722, max: 1824, average: 1723

Phase 9: 404, 432, 472 – min: 404, max: 472, average: 436

Phase 11: 1096, 975, 1012 – min: 975, max: 1096, average: 1028

Summary

Loading data was slowest when DBOutputTable with original Oracle corp. driver was used. All loadings with DataDirect driver were faster than with Oracle corp. driver but the usage of DDBulkLoad object (DataDirect) plainly increases the speed of loading data in comparison with setting EnableBulkLoad=true and using DBOutputTable. The results for loading data from csv file and edge (result set) are very similar with slight predomination of ResultSet method. All three methods with DataDirect driver usage, get to more steady execution times with number of records to load.

The fastest way of loading data is unquestionably direct data loading with sqlldr utility. Even when inter-storing data in pipe, the sqlldr utility is about 50% faster than any other method, but is less convenient.
1,000,000 records
10,000,000 records