Salesforce offers two APIs to access and manipulate your cloud data – SOAP and REST, the latter called “Bulk API”. Although you can use both in CloverETL, you'd probably use CloverETL to manipulate large portions of data. Therefore, the Bulk Salesforce API is more likely your preferred approach.
To see why, let’s take a look at what Salesforce documentation has to say about Bulk Salesforce API:
“Bulk API is based on REST principles and is optimized for loading or deleting large sets of data. You can use it to query, insert, update, upsert, or delete a large number of records asynchronously by submitting batches which are processed in the background by Salesforce.
The documentation continues with suitable use-cases:
SOAP API, in contrast, is optimized for real-time client applications that update small numbers of records at a time. Although SOAP API can also be used for processing large numbers of records, when the data sets contain hundreds of thousands of records, it becomes less practical. Bulk API is designed to make it simple to process data from a few thousand to millions of records.”
SOAP Salesforce API – A Bad Idea for Batch Processing
In my experience, the above excerpt proves to be more than true. Using SOAP calls to insert contacts into the database can be convenient for interactive applications; however, when you try to call insert for every record in a batch, it can yield only a few calls per second (usually around 10) – this obviously is not an option. The same holds for SOAP version of query() call. It's powerful, but designed only for highly specific queries returning maybe hundreds of results, an option great for searches. On top of the performance issues, there is a hard limit of 2,000 items returned per a call. What's more, in reality it would typically be much less, usually varying from 50 to 1,500 per call depending on record sizes. Although repeating the call with paging offset is possible, it’s an unnecessary hassle.
Bulk Salesforce API
On the other hand, Bulk API works on batches of either CSV or XML data and is extremely efficient for both inserts and queries. There are limits too, but it's much better suited for batch processing. Most notably, a query can return up to 15 files of 1GB each, totaling to a 15 GB result set. Refer to Bulk API limits page for further details including limitations of SOQL queries used in Bulk API.
To be able to use Bulk API, you first have to enable it in the Administrative Permissions section. You will find an “API Enabled” checkbox in the setup of System Administrator’s profile. For developer accounts, this comes already pre-selected.
Most operations in Bulk API share a pretty standard sequence of steps to get things done:
- Login (SOAP)
- Create a job (REST)
- Add a batch to the job (REST)
- Close the job (REST)
In CloverETL, you can perform these steps either with HTTPConnector (for REST calls) or with WebServiceClient (for SOAP calls). In both cases, responses you get are in XML and can be directly parsed using an adjacent XMLExtract component. You can see a sample in the graph below and a much heavier use in the second example where we fetch some data from a query.
Just make sure you properly set your Salesforce account information in salesforce.prm file.
Please note the sample graph only illustrates an “optimistic” use case; thus, it does not do full due diligence in error handling. In production, I would suggest parsing all the responses, then check and handle errors gracefully, e.g. in a jobflow.
Login has to be performed via SOAP API. In CloverETL, we use a WebServiceClient component for this. The request body of a SOAP request looks like this:
Change SF_LOGIN, SF_PASSWORD and SF_TOKEN in salesforce.prm to your valid credentials. As a result of the login operation, you will get a sessionID and a Server URL for your subsequent queries. As we will need only the first part of the URL, we use Reformat just after the login to extract it and pass it on.
2. Create a job
Once we’re logged in, it’s time for some Bulk API. First, we create a “job” using HTTPConnector. The request body specifies the operation, data type, and content type. Please note that SessionID additionally goes into the HTTP header of the request.
This call returns an XML containing jobID which we get using XMLExtract component. Create job response example:
<?xml version="1.0" encoding="UTF-8"?><jobInfo
3. Add a batch to the job
A batch is the actual CSV data you want to insert. The CSV file needs to be passed as a body of the HTTP request. This is pretty easy with HTTPConnector. In Input Mapping, you simply specify the “inputFileUrl” that leads to the file.
4. Close the job
The last step you need to do is finalize the job. Again, we’ll use HTTPConnector with the following XML as a request body:
<?xml version="1.0" encoding="UTF-8"?>
Fetching the results of a query is similar to the bulk insert. Here's a sequence of required actions:
- Create job
- Create bulk query
- Get results list
- Download file(s) with records
In this case, we create a bulk query, which is basically an HTTP call that sends a SOQL statement to be processed. In return, Salesforce sends back a queryID which is used to fetch results. As I said earlier, Salesforce might split big datasets into 1GB files, thus returning multiple result handles for a single query.
The last “Fetch Results” call is pretty interesting: it uses the ability of the HTTPConnector to store the response body directly into a file. In this case, the component might get invoked multiple times if the “Extract Result IDs” returns more than one result handle.
Bulk API in CloverETL - Some Advice
To close, here's some final advice that can help you use Salesforce Bulk API in CloverETL:
- Use the WebServiceClient component for SOAP API calls and the HTTPConnector component for Bulk API calls
- XMLExtract is best for parsing XML responses of API calls
- Use “Additional HTTP headers” attribute of the HTTPConnector component to specify session and content-type
- The Input/output mapping attribute of components allows you to construct URL from input fields