Optional ports introduced in CloverETL 4.1 allow you to design generic and versatile subgraphs that replace potentially redundant variations of the same subgraph offering various combinations of inputs and outputs.
However, giving users such freedom of choice means you have to deal with numerous design challenges to handle the missing edge connections.
In this blog post, I will share three key concepts that will help you master the wizardry behind creating versatile subgraphs.
Let's assume we're building a subgraph called “Lean DataIntersection” - a „user friendly“ version of DataIntersection component that is forgiving in terms of what's connected to it (remember, standard DataIntersection needs all ports connected) and would take care of pre-sorting the inputs as a bonus (the two FastSort components).
Here are two sample scenarios showing how such a subgraph can be used:
Setting Optional Ports
We want to set the second input and two output ports as optional so that users can freely to use the “forgiving” component.
You can choose between two modes of optional ports (right-click the port in the vertical bar, or in outline):
Optional port (edge receives zero records)
If we select this mode the second FastSort will receive 0 input records (no edge connected). See the illustration below to view the results. Not exactly what we want, right?
Optional port (edge is removed)
Unfortunately, the second option is not much better at solving our dilemma either. In this case, instead of zero records, the edge would be removed completely at runtime and we would end up with a crippled subgraph illustrated below.
Either way, merely setting ports to optional does not give you the results you'd expect. There's more to set than just the ports.
Key Concept I: Dynamically Enabling Components
One of the key factors to versatile subgraphs is to have Clover dynamically enable/disable parts of the subgraph that are affected by the missing input. In this case we want to disable the second FastSort and DataIntersection completely whenever the optional input is not connected.
To set components to be enabled only if certain inputs/output are connected, go to „Enable“ menu of a component (right-click). In our case we're using “When Input Port 1 Is Connected”.
The Enable condition is relative to the parent graph of our subgraph, not the component itself! Thus, you should read it as follows:
"When Input Port 1 (of this subgraph) Is Connected (in the calling parent graph)"
The “?” icon indicates the component is set to enable/disable dynamically.
Wondering why the top FastSort is also dynamically disabled? There's no optional port so that part will always work, right? Well, yes. Leaving it always enabled would work just fine but in the “one input port SimpleCopy” mode it would be sorting all the data without a real purpose and a good design avoids such costly operations whenever possible.
Key Concept II: Component Pass-through
When CloverETL disables a component, it needs to know how to bypass it. A disabled component is always replaced by a single edge and often it's simply obvious (e.g. disabling a sorter simply creates a “short-circuit”).
For complicated components like DataIntersection or subgraphs having multiple inputs and outputs you need to tell CloverETL which ports to connect.
You can set pass-through for any component or subgraph by going to Edit and scrolling all the way down to Common properties and setting Pass Through Input Port and Pass Through Output Port.
Why do I need to set pass-through?
If we didn't set the DataIntersection pass-through properly, CloverETL would connect the first input port with the first output port (Port 0 > Port 0) by default which is not what we want.
However, you can typically ignore pass-through as the default behavior mostly works just fine.
Key Concept III: Metadata Propagation
Versatile generic graphs tend to depend heavily on metadata propagation rather than having everything predefined. For example, our Lean Data Intersection has no internal metadata whatsoever; everything is driven by what the parent graph provides by connecting edges with metadata to it.
This is where you can easily fall into a trap. Do not rely on metadata propagating from ports that are set as optional. Keep in mind that with some use-cases there will be no edge connected, thus no metadata!
The solution is to either use manually assigned metadata or use edge metadata (Select Metadata from Another Edge) so that your metadata is propagated from edges that are guaranteed to receive metadata at all times.
Dynamically disabling components connected to optional ports and setting correct pass-through will save you a lot of trouble with metadata propagation. In fact, if you do everything else right, you likely won't have problems with metadata propagation at all!
These three are basic concepts will help you change the way how you design versatile subgraphs. You can read more about more advanced us-ecases for optional ports in our follow-up blog. Remember to watch for the final version of CloverETL 4.1!