CloverETL 4.0 is now out, and with it comes a ton of new features. You can read the complete release notes here. In our previous article , we briefly introduced the new main features in 4.0. Today, we are going to dive deeper into subgraph functionality, explaining why are subgraphs a very important feature of CloverETL 4.0 and how they will help you with your data jobs.
What are subgraphs?
In technical terms, subgraphs are reusable, user-defined components, with logic composed of other components or nested subgraphs, instead of Java code, able to “receive” or “provide” metadata using edges to interchange data in memory, and providing a graphical configuration closely resembling regular Java-based components.
Put more simply, subgraphs are components that can have a whole graph nested inside of them. You are able to create or customize these components easily, set their parameters, set their outputs and inputs, and attach metadata to them that will spread through the whole graph.
To help demonstrate this concept, let’s take a look at an example:
In this first picture, we’ve created a reusable post address validator subgraph that uses Google GEO API behind the scenes. The functionality requires three ETL components to work. However, this is completely hidden from the subgraph's end-users. All they need to do is simply drag and drop a subgraph component, attach the edges, and Voila! The validated addresses are sent out!
Our second example contains a more advanced subgraph. As you can see, a string of data logic, consisting of multiple components, is visually reduced down to a single one. Did you notice that the reader component is a part of the subgraph as well? Subgraphs can contain preconfigured readers with data, which is very useful for adding lookups tables within the subgraphs or for enriching data that are processed. Once segmentation logic is converted into a subgraph, it can be reused in multiple graphs without the need to copy it. To allow for more flexibility when the logic is reused, the subgraph developer may decide to expose some of the filtering conditions (as we see in the Partitioner or Filter), so that subgraph users can set their own segmentation rules.
You can find other examples of subgraphs in the trial version of CloverETL - download it over here.
What can subgraphs do for you?
A subgraph can make your data integration jobs much easier. Here are the main benefits that subgraphs bring:
Visually simplify your graphs. Hide advanced logic.
Complex graphs can be hard to understand and navigate. Subgraphs solve this issue by hiding advanced logic into subgraph components, making graphs much easier to read.
Look at the example above. By using a subgraph, we were able to reduce the number of components by almost 50%, and data flow in the graph is now easy to follow and understand.
Improve collaboration. Share logic and split work within your team
If your team is working on complex graphs, subgraphs give you the possibility to better plan your work. With the introduction of subgraphs, you are able to split complex graphs into smaller subgraphs and assign those subgraphs to the member of your team, based on their skill or availability. Also, if you create a subgraph repository, your team can take the advantage of previously-created subgraphs, which can save a lot of time.
Increase the speed of development of new graphs. Reuse already-created content
As you can reuse parts of graphs you’ve previously created before, the development time for creating new graphs is shortened significantly.
Take, for example, this graph, which analyzes Twitter sentiment. It can read tweets, analyze them, and then give you results.
The important thing is that the reading of the tweets is wrapped as a subgraph.
So, if you need to analyze another aspect of Twitter, you could just use this subgraph for reading tweets, and you would then only need to build the part that would process other details that you are curious about.
Also, a change in a subgraph will be applied to all graphs that use that subgraph, which simplifies change management. And in case the Twitter API changes (as it frequently does), you can easily update your subgraph, and this change will be propagated to all your graphs using this subgraph.
Extend functionality. Move beyond the standard components and connectors
The data environment is rapidly changing, with new technologies and new data sources appearing almost every day. Although we at Clover try to support as many sources and data types as possible, it isn’t possible to cover them all. But now, thank to the help of subgraphs, you are able to create any connector you need.
Here is an example of a custom-built connector for Salesforce.com. This connector is not a standard component of CloverETL. However, with the use of subgraphs, you are able to create a component that can tap into Salesforce.com and retrieve whatever you need.
After opening the connector, you can see the inner workings of the subgraph.
You could also create another subgraph that can take care of logins to your data source/s and components that are processing your data requests. In fact, you are able to create a connector for any application, any data source, or for whatever you need.