ComplexDataReader is a powerful new component in CloverETL meant for reading elaborate heterogeneous data. However, all data cannot be read easily even if you spend a lot of time configuring the component. Sometimes you need to think in advance: What if you come across unknown metadata you have not handled? Normally, the graph crashes.
This post will examine a way of preventing that or, more specifically, how to handle errors in input data.
Example Input Data
What We Will Do
We can instantly distinguish three kinds of metadata on the input: product, product_range and service. ComplexDataReader is the best component to parse these using three states of a state machine. As you can see, there is one line that does not fit into the data. The magic trick of this example lies in preparing one extra state – the error state. The state will be responsible for “catching” all incorrect data which would cause the component to fail. In order to be able to decide which data are “bad,” or, more precisely, when to switch to the error state, you have to write a custom Selector class in Java. The idea behind the code is very simple and will be explained below:
“Prep Work”
First, we need to prepare metadata for all three states of the state machine plus one extra. The extra metadata will represent error lines on the input we need to “throw away.”
Second, do not forget to connect the component to its succeeding components and assign metadata to output edges.
Third, set the “File URL” property to point the component to the input file.
Here are the three aforementioned metadata:
And one extra metadata for error lines:
Designing State Machine
We are going to create four states:
Note: There are no transition edges to be seen in the graph. It is because the Selector itself will decide when to change between states.
Start configuring the component via the “Transform” property. Create four states corresponding to the metadata and set “Initial state” to “Let selector decide”:
Switch to state “$0 product” and define its output mapping. In this state, we will send all fields to the output. Thus, drag state $0 to the “Value” column in the right-hand pane. You will produce the “$0.*” directive. In the “Transition table”, switch “Target state” to “Let selector decide”:
Repeat the same procedure for all remaining states (including the error state). Always send everything to the output port and “Let selector decide” about the target state:
Writing Custom Selector
We are now going to prepare a Java class that will do the magic of this example – switch between states “$0 product”, “$1 service”, “$2 product_range” and the “$3 error” state in case there are errors on reading. This particular prefix Selector will assume there is another record on the following line(s) and will try to read it. If there really is a new record, we can recover from the error line and carry on reading.
You can prepare the Java class in any editor of your choice. After writing it, just remember to place it into the “trans” folder of your project. On that condition, CloverETL will automatically compile the class for you.
The Selector class will look like this:
public class CustomPrefixInputMetadataSelector1 extends com.opensys.cloveretl.component.complexdatareader.PrefixInputMetadataSelector {
private static final int DEFAULT = 3;
@Override
public int select(int prevState) {
int result = super.select(prevState);
if(result == org.jetel.component.RecordTransform.ALL) {
return DEFAULT;
}
return result;
}
}
A few comments concerning the code:
int result = super.select(prevState);
First, we try to call the default selector and store the number of the next state into result.if(result == org.jetel.component.RecordTransform.ALL)
And if the default selector cannot decide…return DEFAULT;
We return the default state number – number 3. This is the error state.
Now that you are done with the code, switch to the “Selector” tab in “State transitions”. In “Selector URL”, browse for your custom Selector. Notice that after you specify its location, the “Selector properties” area changes:
Conclusions & Pitfalls
In this article, we have presented a way of handling flaws in the input data. We have been capable of addressing a situation when the selector looks on the following metadata and cannot decide which state goes next.
However, there are numerous cases when you just cannot prevent reading errors from occurring. For instance, even if the selector recognizes the following metadata but then fails on parsing them, we cannot react and the graph fails. You can imagine that as a file whose field types suddenly change, (e.g. from integer to date – the selector starts parsing an integer and crashes). Another known case we cannot handle is changeable number of fields in one record. If new fields occur or their number decreases, the graph execution fails. The only exception to this are fields added at the end of a record. These can be handled with the help of lenient data policy.
Download a complete CloverETL project – error handling in ComplexDataReader























