Tag Archives: ftp

Versatility of the File URL Attribute Used in Readers of CloverETL

CloverETL allows users to read several different kinds of files. These files may have various formats, they can be located on a local or remote computer, they can be accessed through a proxy, and they can also be compressed into zip, gzip, or tar archives. Users can also read data from the Console, from an input Port, or from a selected Dictionary entry.

A File URL must be specified using the URL dialog.

File URL Dialog

  • Workspace view tab serves to specify files within the workspace independently of whether the workspace belongs to CloverETL Project or CloverETL Server Project.
  • Local files tab displays the file structure of a local computer.
  • Remote files tab displays the file structure of a remote computer. It allows the user to specify the protocol, username, password, port, server, proxy, username for proxy, password for proxy, identification of proxy server, and the port for the proxy. After specifying these properties, the file structure of the remote computer is displayed (except for http and https protocols).
  • Port tab displays the input fields that are string, byte, or cbyte data types and allows the user to select one of them and also select a processing type from a combo box
  • Dictionary tab displays declared dictionary entries and allows to select one of them and choose the processing type from a combo

Now I will present the list of supported values of the File URL attribute.

Local Files (without compression)

Examples:
/path/file1.txt – reads one file
/path/file1.txt;/path/file2.txt – reads two files in one directory (semicolon separates files that will be read one after another)
/path1/fileA.txt;/path2/fileB.txt – reads two files in two directories (semicolon separates files that will be read one after another)
/path?/file*.txt – reads files in directories, when both the directories and the files must match the specified pattern.
/path/* – reads all files in the specified directory

Local Files (with compression)

Examples:
zip:(/path/file.zip) – reads the first file added to the zip archive
zip:(/path/file.zip)#innerfolder/innerfile.txt – reads the innerfile.txt contained in the innerfolder which has been compressed into the specified zip archive.
zip:(/path/file??.zip)#innerfolder?/innerfile*.txt – reads files contained in the innerfolders which have been compressed into the specified zip archives (each of these files, innerfolders, archive files must match their respective pattern)
gzip:(/path/file.gz) – reads the file compressed in the gzip archive
gzip:(/path/file??.gz) – reads the files compressed into specified gzip archives (each of these archives must match specified pattern)
tar:(/path/file.tar) – reads the first file added to the tar archive
tar:(/path/file.tar)#innerfolder/innerfile.txt – reads the innerfile.txt contained in the innerfolder which is compressed in the specified tar archive.
tar:(/path/file??.tar)#innerfolder?/innerfile*.txt – reads files contained in the innerfolders which have been compressed in the specified tar archives (each of these files, innerfolders, archive files must match their respective pattern)
zip:((zip:/path/file*.zip)#innerfolder/innerfile.zip)#innermostfolder??/innermostfile*.txt –reads innermost files contained in the innermostfolders which have been compressed into the specified innerfile zip archive contained in the innerfolder which has been compressed into the specified external zip archives (each of these innermostfiles, innermostfolders, external zip archives must match their respective pattern) Remember that innerfile.zip and innerfolder may not contain wildcards.

Remote Files  (without compression)

Unlike locally stored files, files on remote computers are accessible using a set of supported protocols. Sometimes it is also necessary to use a proxy server.

The following protocols are supported for accessing a remote server: sftp, ftp, ftps, http, https.

Access without proxy:

The structure of all remote files that are accessible directly, without a proxy, is as follows:
protocol://username:password@serverpassword@server :port/(whole|relative)path/file

Here, the whole path should be used for the sftp protocol, the other four protocols use relative paths.

Examples:
sftp://johnsmith:mypassword@myserver/home/johnsmith/relativepath/filename.txt
ftp://johnsmith:mypassword@myserver/relativepath/filename.txt
ftps://johnsmith:mypassword@myserver/relativepath/filename.txt
http://johnsmith:mypassword@myserver/relativepath/filename.txt
https://johnsmith:mypassword@myserver/relativepath/filename.txt

In the patterns shown above, username, password, and port may be ommitted if possible, whereas the other parts of such File URL are required.

Example (with username, password, and port ommitted):

http://myserver/relativepath/filename.txt

Access through proxy:

The structure of all remote files that are accessible through a proxy is as follows:
protocol:(proxy:proxyuser:proxypassword@proxyserver:proxyport)//username:password@server:port/(whole|relative)path/file

or with SOCKS V4 or V5 proxy:
protocol:(proxysocks:proxyuser:proxypassword@proxyserver:proxyport)//username:password@server:port/(whole|relative)path/file

Example:
ftp:(proxy:proxyuser:proxypassword@proxyserver:proxyport)//johnsmith:mypassword@myserver/relativepath/filename.txt

Also in this case, proxyuser, proxypassword, and proxyport can be ommitted if possible; the other parts of this pattern are required.

With SOCKS V4 or V5 proxy an example follows:
ftp:(proxysocks:proxyuser:proxypassword@proxyserver:proxyport)//johnsmith:mypassword@myserver/relativepath/filename.txt

Remote Files (with compression)

Remote File URLs may also be combined with archiving protocols in a similar manner to local File URLs.

Example:
zip:(ftp://johnsmith:mypassword@myserver/relativepath/myarchive.zip)#innerfolder/filename.txt

Wildcards may also be used in a similar way:

Example:
zip:(ftp://johnsmith:mypassword@myserver/relativepath/myarchive*.zip)#innerfolder??/filename?.txt

Note:

Remember that http and https protocols do not support wildcards in top level files or archives.

Console Input

File URL for Console input will be: – (hyphen character)

User types the input into Console after the graph starts, types data separated by field delimiters, presses Enter to specify end of records, and finishes the input after the last record by pressing Ctrl+Z.

Input Port Reading

CloverETL also supports reading incoming data through the input port of some Readers. Metadata connected to the input port must contain at least one field of string, byte, or cbyte data type. The user selects the field from which data should be read and parsed according to the output metadata. Three processing types can be selected in CloverETL:

  • discrete (the default value)
  • stream
  • source

File URL pattern is the following:

port:$0.fieldname:discrete|stream|source

Discrete processing type:

When the processing type is discrete, each record is parsed separately, according to the output metadata.

Example:
port:$0.customer:discrete

Note:
The colon and the word discrete can be ommitted.

Example:
port:$0.customer

Stream processing type:

When processing type is stream, all records are concatenated and parsed according to the output metadata. If input metadata contains a null value, this null means eof and separates groups of records. All records before such a null are concatenated, but separately from all records after such a null, which are also concatenated into another data source.

Example:
port:$0.customer:stream

Source processing type:

Example:
port:$0.file:source

When processing type is source, values of the selected field ($0.file) are valid URLs. The Reader to which such input is connected takes the file accessible with this URL and reads the contents. Metadata on the output must match the structure of the files specified with the help of these URLs.

Dictionary Entry Reading

Dictionary tab allows the selection of one of the graph dictionary entries. The processing type in the combo box should also be specified.

File URL pattern is:
dict:myentry:discrete|source

Discrete processing type:

Example:
dict:customer:discrete

Reads contents of dictionary entry whose name is customer.

Source processing type:

Example:
dict:file:source

When processing type is source, the value of the selected dictionary entry (file) is a valid URL. The reader with this File URL takes the contents of the file accessible with the help of this dictionary entry and reads the file contents. Metadata on the component’s output must match the structure of the file specified with the help of this URL.

Accessing Files with New File URL Dialog

The CloverETL Designer has a brand new File URL Dialog, which was introduced in the version 2.9. The newly designed file dialog is very friendly and intuitive to navigate. There are a lot of new features and improvements. The dialog is separated into several tabs to simplify navigation. They enable users to easily specify resources such as local files, remote files or shared memory (dictionary). The new dialog is more comfortable to use and has simplified clear design as you can see in the picture bellow. The dialog window adjusts itself according to the context.

Clover Server

In the new File dialog you can also find a new CloverETL Server tab specially designed to work with files located on CloverETL Server. It is only visible if you have opened the dialog from existing CloverETL Server project. It looks very similar to the tab you work with on your local computer but you can browse remote CloverETL sandboxes. All names of sandboxes for which you have permissions are in the bookmarks. So you can easily access them.

File URLs

This tab handles all types of URLs but it’s mainly designed to browse remote file system via http/https/ftp/ftps/sftp protocols. It also brings special dialog where you can specify advanced parameters of connection like proxy server, HTTP properties.

Port / Dictionary

The port and dictionary tabs are specific to CloverETL. The Port tab is visible only if the component or graph element allows reading/writing data from/to the port. Dictionary is a shared memory between parts of the graph. It is identified by name and processing type parameter. Both tabs help you to specify the URLs in a visual way so you don’t have to know the exact syntax of CloverETL’s URLs and your work will be easier and more productive.

Extensibility

Due to new modular dialog architecture, the dialog itself can be extended for specific tabs if needed.