CloverETL allows users to read several different kinds of files. These files may have various formats, they can be located on a local or remote computer, they can be accessed through a proxy, and they can also be compressed into zip, gzip, or tar archives. Users can also read data from the Console, from an input Port, or from a selected Dictionary entry.
A File URL must be specified using the URL dialog.
- Workspace view tab serves to specify files within the workspace independently of whether the workspace belongs to CloverETL Project or CloverETL Server Project.
- Local files tab displays the file structure of a local computer.
- Remote files tab displays the file structure of a remote computer. It allows the user to specify the protocol, username, password, port, server, proxy, username for proxy, password for proxy, identification of proxy server, and the port for the proxy. After specifying these properties, the file structure of the remote computer is displayed (except for http and https protocols).
- Port tab displays the input fields that are string, byte, or cbyte data types and allows the user to select one of them and also select a processing type from a combo box
- Dictionary tab displays declared dictionary entries and allows to select one of them and choose the processing type from a combo
Now I will present the list of supported values of the File URL attribute.
Local Files (without compression)
Examples:
/path/file1.txt – reads one file
/path/file1.txt;/path/file2.txt – reads two files in one directory (semicolon separates files that will be read one after another)
/path1/fileA.txt;/path2/fileB.txt – reads two files in two directories (semicolon separates files that will be read one after another)
/path?/file*.txt – reads files in directories, when both the directories and the files must match the specified pattern.
/path/* – reads all files in the specified directory
Local Files (with compression)
Examples:
zip:(/path/file.zip) – reads the first file added to the zip archive
zip:(/path/file.zip)#innerfolder/innerfile.txt – reads the innerfile.txt contained in the innerfolder which has been compressed into the specified zip archive.
zip:(/path/file??.zip)#innerfolder?/innerfile*.txt – reads files contained in the innerfolders which have been compressed into the specified zip archives (each of these files, innerfolders, archive files must match their respective pattern)
gzip:(/path/file.gz) – reads the file compressed in the gzip archive
gzip:(/path/file??.gz) – reads the files compressed into specified gzip archives (each of these archives must match specified pattern)
tar:(/path/file.tar) – reads the first file added to the tar archive
tar:(/path/file.tar)#innerfolder/innerfile.txt – reads the innerfile.txt contained in the innerfolder which is compressed in the specified tar archive.
tar:(/path/file??.tar)#innerfolder?/innerfile*.txt – reads files contained in the innerfolders which have been compressed in the specified tar archives (each of these files, innerfolders, archive files must match their respective pattern)
zip:((zip:/path/file*.zip)#innerfolder/innerfile.zip)#innermostfolder??/innermostfile*.txt –reads innermost files contained in the innermostfolders which have been compressed into the specified innerfile zip archive contained in the innerfolder which has been compressed into the specified external zip archives (each of these innermostfiles, innermostfolders, external zip archives must match their respective pattern) Remember that innerfile.zip and innerfolder may not contain wildcards.
Remote Files (without compression)
Unlike locally stored files, files on remote computers are accessible using a set of supported protocols. Sometimes it is also necessary to use a proxy server.
The following protocols are supported for accessing a remote server: sftp, ftp, ftps, http, https.
Access without proxy:
The structure of all remote files that are accessible directly, without a proxy, is as follows:
protocol://username:password@serverpassword@server :port/(whole|relative)path/file
Here, the whole path should be used for the sftp protocol, the other four protocols use relative paths.
Examples:
sftp://johnsmith:mypassword@myserver/home/johnsmith/relativepath/filename.txt
ftp://johnsmith:mypassword@myserver/relativepath/filename.txt
ftps://johnsmith:mypassword@myserver/relativepath/filename.txt
http://johnsmith:mypassword@myserver/relativepath/filename.txt
https://johnsmith:mypassword@myserver/relativepath/filename.txt
In the patterns shown above, username, password, and port may be ommitted if possible, whereas the other parts of such File URL are required.
Example (with username, password, and port ommitted):
http://myserver/relativepath/filename.txt
Access through proxy:
The structure of all remote files that are accessible through a proxy is as follows:
protocol:(proxy:proxyuser:proxypassword@proxyserver:proxyport)//username:password@server:port/(whole|relative)path/file
or with SOCKS V4 or V5 proxy:
protocol:(proxysocks:proxyuser:proxypassword@proxyserver:proxyport)//username:password@server:port/(whole|relative)path/file
Example:
ftp:(proxy:proxyuser:proxypassword@proxyserver:proxyport)//johnsmith:mypassword@myserver/relativepath/filename.txt
Also in this case, proxyuser, proxypassword, and proxyport can be ommitted if possible; the other parts of this pattern are required.
With SOCKS V4 or V5 proxy an example follows:
ftp:(proxysocks:proxyuser:proxypassword@proxyserver:proxyport)//johnsmith:mypassword@myserver/relativepath/filename.txt
Remote Files (with compression)
Remote File URLs may also be combined with archiving protocols in a similar manner to local File URLs.
Example:
zip:(ftp://johnsmith:mypassword@myserver/relativepath/myarchive.zip)#innerfolder/filename.txt
Wildcards may also be used in a similar way:
Example:
zip:(ftp://johnsmith:mypassword@myserver/relativepath/myarchive*.zip)#innerfolder??/filename?.txt
Note:
Remember that http and https protocols do not support wildcards in top level files or archives.
Console Input
File URL for Console input will be: – (hyphen character)
User types the input into Console after the graph starts, types data separated by field delimiters, presses Enter to specify end of records, and finishes the input after the last record by pressing Ctrl+Z.
Input Port Reading
CloverETL also supports reading incoming data through the input port of some Readers. Metadata connected to the input port must contain at least one field of string, byte, or cbyte data type. The user selects the field from which data should be read and parsed according to the output metadata. Three processing types can be selected in CloverETL:
- discrete (the default value)
- stream
- source
File URL pattern is the following:
port:$0.fieldname:discrete|stream|source
Discrete processing type:
When the processing type is discrete, each record is parsed separately, according to the output metadata.
Example:
port:$0.customer:discrete
Note:
The colon and the word discrete can be ommitted.
Example:
port:$0.customer
Stream processing type:
When processing type is stream, all records are concatenated and parsed according to the output metadata. If input metadata contains a null value, this null means eof and separates groups of records. All records before such a null are concatenated, but separately from all records after such a null, which are also concatenated into another data source.
Example:
port:$0.customer:stream
Source processing type:
Example:
port:$0.file:source
When processing type is source, values of the selected field ($0.file) are valid URLs. The Reader to which such input is connected takes the file accessible with this URL and reads the contents. Metadata on the output must match the structure of the files specified with the help of these URLs.
Dictionary Entry Reading
Dictionary tab allows the selection of one of the graph dictionary entries. The processing type in the combo box should also be specified.
File URL pattern is:
dict:myentry:discrete|source
Discrete processing type:
Example:
dict:customer:discrete
Reads contents of dictionary entry whose name is customer.
Source processing type:
Example:
dict:file:source
When processing type is source, the value of the selected dictionary entry (file) is a valid URL. The reader with this File URL takes the contents of the file accessible with the help of this dictionary entry and reads the file contents. Metadata on the component’s output must match the structure of the file specified with the help of this URL.



