CloverETL's Blog

February 24, 2010

CloverETL File-URL Dialog

Filed under: Developing Clover — Tags: , , , , , , — jausperger @ 3:36 pm

The CloverETL Designer has a brand new File URL Dialog, which was introduced in the version 2.9. The newly designed file dialog is very friendly and intuitive to navigate. There are a lot of new features and improvements. The dialog is separated into several tabs to simplify navigation. They enable users to easily specify resources such as local files, remote files or shared memory (dictionary). The new dialog is more comfortable to use and has simplified clear design as you can see in the picture bellow. The dialog window adjusts itself according to the context.

Clover Server

In the new File dialog you can also find a new CloverETL Server tab specially designed to work with files located on CloverETL Server. It is only visible if you have opened the dialog from existing CloverETL Server project. It looks very similar to the tab you work with on your local computer but you can browse remote CloverETL sandboxes. All names of sandboxes for which you have permissions are in the bookmarks. So you can easily access them.

File URLs

This tab handles all types of URLs but it’s mainly designed to browse remote file system via http/https/ftp/ftps/sftp protocols. It also brings special dialog where you can specify advanced parameters of connection like proxy server, HTTP properties.

Port / Dictionary

The port and dictionary tabs are specific to CloverETL. The Port tab is visible only if the component or graph element allows reading/writing data from/to the port. Dictionary is a shared memory between parts of the graph. It is identified by name and processing type parameter. Both tabs help you to specify the URLs in a visual way so you don’t have to know the exact syntax of CloverETL’s URLs and your work will be easier and more productive.

Extensibility

Due to new modular dialog architecture, the dialog itself can be extended for specific tabs if needed.

February 1, 2010

CloverETL version 2.9 was released. It adds Infobright Data Writer, Web Services component and other new features.

Filed under: Developing Clover — Tags: , , , , , — Lucie Felixova @ 10:18 am

New CloverETL version 2.9. was just released. This version brings a new Infobright Data Writer component, enhances the connectivity by adding Web Services component and adds features that simplify common data transformation tasks.

New Features and Components:
Infobright Data Writer
In response to customer requests, this component writes data into Infobright software, a column-oriented relational database. Infobright is a provider of solutions designed to deliver a scalable data warehouse optimized for analytic queries.

Web Services component
The new component makes communication with Web Services easier than ever. It provides user friendly graphical interface for mapping your data into Web Service fields, automatically generates requests and process responses. It offers faster, easier and more comfortable way to interact with remote Web Services.

Reading formatted values from XLS
Additionally to reading plain data from MicrosoftTM ExcelTM sheets, the Excel component is now also capable of reading user-formatted values such as currencies, dates or numbers.

New tracking option
Customers can now see all absolute speed rates for finished data transformations, facilitating comparative analysis in pursuit of process improvements.

New Aspell Lookup table
Brand new implementation of this component brings better performance, improved configuration and better customization.

Improved treatment of empty (NULL) values
Developers can now specify special strings that should be treated as empty (NULL) when data is being parsed. This feature simplifies processing of typical application export files which often contain values insignificant for ETL processing. Additionally it may lead to improved processing throughput and lower memory consumption of data transformation.

More user friendly File URL dialog and improved LDAP functionality.

Customers can evaluate these new features along with CloverETL’s other leading capabilities with a free 30-day trial of the CloverETL Designer Pro evaluation, which is available at www.cloveretl.com Information management professionals can also evaluate the enterprise integration features of CloverETL Server via an online demo, which is also available at www.cloveretl.com.

November 4, 2009

New level of parallelism in CloverETL

Filed under: Developing Clover — Tags: , , , , , — mvarecha @ 12:28 pm

For the upcoming release of CloverETL 2.9, we are working on improvements in CloverETL Server which will allow run transformations in parallel on multiple cluster nodes.

CloverETL Server already supports clustering, so more instances may cooperate to each other. Current stable version already implements common cluster features: fail-over/high-availability and scalability of lots of requests which are load-balanced on available cluster nodes. These features are actually implemented since version 1.3.

The basic concept of new parallelism
Transformation may be automatically executed in parallel on more cluster nodes according to configuration and each of these “worker” transformations processes just its part of data. Because there is one “master” transformation, which manages the other transformations and which gathers tracking data from “worker” transformations, the parallelism is transparent for CloverETL Server client. Client by default “sees” just one (master) execution and aggregated tracking data. However there are still logs and tracking data for each of “worker” transformations, so it’s still possible to inspect details of this parallel execution. “Worker” transformations outputs are gathered to the “master”, thus client has one single transformation output which may be processed further.

So how to get parts of input data?
Basically, transformation can process data which is already partitioned, which is the best case and there is no overhead with partitioning of data, or CloverETL Server itself can partition input data from one single source and distribute data on the fly (during the transformation) to several cluster nodes using the network connection. Overhead of this operation depends on the speed of network communication and other conditions.

Design changes in the graph
We aim to keep the transformation graph almost the same as it would be for “standalone” execution. Thus there will be just a couple of extra components in the graph which is intended to run in parallel. These components will handle partitioning/departitioning of data in case it’s not already partitioned.

Scalability
The new parallelism in CloverETL Server is a giant leap for scalability of the transformations. Ever since the graph is designed for paraller run, the number of computers which run this transformation depends just on cluster configuration. Graph itself is still the same. Configuration of the parallelism includes:

  • working CloverETL Server cluster, thus standalone server instances won’t be able to handle such execution
  • “partitioned” sandbox(see below) with list of locations

New sandbox types
On server side, graphs and related files are organized in so-called sandboxes. Until version 2.8, there was just one type: “shared” sandbox. It means that it contains the same files and directory structure on all cluster nodes. Since version 2.9 there will be two more types:

  • “local” sandbox – is (locally) accessible on just one cluster node. It’s intended for huge input/output data which is not intended to be shared/replicated among multiple cluster nodes.
  • “partitioned” sandbox – each of its physical location contains just part of data. It’s intended as a storage for partitioned input/output data of transformations which are supposed to run in parallel. List of physical locations actually specifies nodes which will run “worker” transformations.

Master – worker responsibilities
Master observes all related workers and when some transformation phase is finished on all workers, it’s master’s responsibility to allow the workers to process next phase. When any of the workers fails from any reason, it’s master’s responsibility to abort all the other workers and select whole execution as failed. Master/worker – These terms have meaning only in the scope of one transformation. Since 2.9 there is no privileged node configured as “master” in the cluster, but it doesn’t mean that all the nodes are equal. There may be differences between nodes in accessibility to physical sources. Configuration of sandboxes should reflect it.

April 23, 2009

CloverETL version 2.7 released

Filed under: Developing Clover — Tags: — dpavlis @ 9:33 pm

OpenSys released today new version of CloverETL Engine – 2.7.

According to ChangeLog, there have been more then 300 changes – some of them new functionality, some fixes of reported problems.
Together with the new engine, CloverETL Designer (previously CloverETL GUI) version 2.2 is also available.

Details about changes of both CloverETL Engine and Designer can be found on CloverETL’s main site.

March 31, 2009

Upcoming 2.7 release of Clover – faster & better

Filed under: Developing Clover — dpavlis @ 4:37 pm

As of today (Mar 31st), Clover Engine 2.7 branch has been created and the testing/QA process has started. Within approx 2 weeks, brand new version of CloverETL will be ready. It brings many small new features and bug fixes, but also several significant improvements – mostly in speed.

The aging ExtSort component is being replaced by new FastSort, which can bring up to 2.5 times the performance of old ExtSort. I am sure, there will be special post on this blog by FastSort’s developer Pavel Najvar, who will explan in detail where he found those hidden 250% of speed.

There are also speed improvements in our Universal Data Reader (reader of text data, delimited or fixed). We thoroughly profiled its code and were able to find 20-25% of additional speed. This puts us even farher in front of competition !

Blog at WordPress.com.