Manage CouchDB with CloverETL

Share this article:

NoSQL databases are fast and used for high volumes of data and documents – and they are today's rising stars. When a customer asked us for help to manage CouchDB, we decided it would be helpful to others if we shared some tips in a blog post.

Apache’s CouchDB is an open source NoSQL document-oriented database, an alternative to the well known MongoDB. It's free and pretty functional, and because CloverETL can read and write JSON and has an HTTPconnector, we can work with and manage it pretty easily.

About Apache CouchDB

A brief introduction from the Apache CouchDB documentation describes the interface method and the way it stores data:

The CouchDB API is the primary method of interfacing to a CouchDB instance. Requests are made using HTTP and requests are used to request information from the database, store new data, and perform views and formatting of the information stored within the documents.

For nearly all operations, the submitted data, and the returned data structure, is defined within a JavaScript Object Notation (JSON) object.

CloverETL can actually both read and write JSON, and then use CloverETL HTTP connectors to control the database. The CouchDB returns HTTP Status Codes as a response to any operation, which can be used for later processing in CloverETL. Read more about CouchDB API in the documentation: http://docs.couchdb.org/en/latest/api/basics.html#errors.

Demo Project

Download sample project here: CouchDB_project.zip

Manage CouchDB Databases

In this example, we first delete the database created for demo purposes so we can show how CloverETL can easily control the CouchDB database via HTTPConnector using standard request methods. More information about this is available in CloverETL HTTPConnector documentation: http://doc.cloveretl.com/documentation/UserGuide/topic/com.cloveretl.gui.docs/docs/httpconnector.html

As you can see in the picture below, we delete the database with the name white_house, using the request method DELETE. The response is saved in the output file delete_db.out.

Manage CouchDB

Right after, we create a new database for each different insert method using the request method PUT. This time, we get the output on an edge using standard CloverETL Output mapping.

Manage CouchDB

JSON File Creation in CloverETL

There are a couple of ways to insert JSON documents into CouchDB. The following example demonstrates these three different ones:

  • Single insert from CloverETL to the DB
  • Bulk insert directly from CloverETL to the DB
  • JSON file upload to the DB

The sample data used in the example is provided in a CSV file and sent to the JSONWriter component for generating the final JSON file. The JSONWriter component uses specific mapping.

Further information about the component can be found in CloverETL documentation: http://doc.cloveretl.com/documentation/UserGuide/topic/com.cloveretl.gui.docs/docs/jsonwriter.html

In the following steps, we’ll be using previously created JSON files in the format described below.

JSON file for Single Record Inserting Method

The “_id” field is unique ID field which can be provided on insert or is generated by CouchDB.

{
"_id":"1",
"NAMELAST":"LAST_NAME",
"NAMEFIRST":"FIRST_NAME",
.
.
.
"CALLER_NAME_LAST":"CALLER_LAST_NAME",
"CALLER_NAME_FIRST":"CALLER_FIRST_NAME",
"Description":"DESCRIPTION STRING",
"RELEASE_DATE":"MM/DD/YYYY\r"
}

JSON file for Bulk Insert Method

The bulk method requires a set of single documents to be wrapped in an array named “docs” as shown below:

More information about HTTP Bulk Document API is available here: http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API

{
"docs":[
{
"_id":"1",
"NAMELAST":"LAST_NAME",
"NAMEFIRST":"FIRST_NAME",
.
.
.
"CALLER_NAME_LAST":"CALLER_LAST_NAME",
"CALLER_NAME_FIRST":"CALLER_FIRST_NAME",
"Description":"DESCRIPTION STRING",
"RELEASE_DATE":"MM/DD/YYYY\r"
},
{
"_id":"14208",
"NAMELAST":"LAST_NAME",
"NAMEFIRST":"FIRST_NAME",
.
.
.
"CALLER_NAME_LAST":"CALLER_LAST_NAME",
"CALLER_NAME_FIRST":"CALLER_FIRST_NAME",
"Description":"DESCRIPTION STRING",
"RELEASE_DATE":"MM/DD/YYYY\r"
}
]
}

JSON file Upload to the DB

The format of the JSON file used in the third method (uploading a JSON file into the DB) is the same as in bulk method, only saved in a local file.

Creating the JSON file

The following picture shows the part of the graph where the UniversalDataReader component reads the CSV file with data and copies it to three different edges each for the JSON file types described above. Two of them are processed further later, while the last one is saved to the local file.

Manage CouchDB

Inserting and Uploading JSON into CouchDB

For all of the methods we’ve mentioned, the Additional HTTP Headers property needs to be set to: Content-Type=application/json.

Read more about the HTTP headers: http://docs.couchdb.org/en/latest/api/basics.html#http-headers

The first used is actually the simplest one – inserting JSON documents one by one using POST request method. However, it’s also the slowest, due to multiple requests on the DB (for each single record).

Manage CouchDB

CouchDB provides bulk_docs API for inserting multiple documents using a single JSON docs array.

The bulk_docs API is called using URL suffix: _bulk_docs – i.e.: http://URL/DB_NAME/_bulk_docs

Read more about this here: http://docs.couchdb.org/en/latest/api/database/bulk-api.html?highlight=bulk#db-bulk-docs

BulkInsertCloverETL - Manage CouchDB

The third method is basically the same as the bulk method, only it uses the local file instead of data directly from the edge with defined Input file URL in CloverETL HTTPConnector.

InsertlocaljsonCloverETL - Manage CouchDB

Reading from CouchDB

Reading an Entire Database using Temp View

Reading from a database is performed using CloverETL HTTPConnector.

In the case of reading the whole database, API method _temp_view needs to be called using POST and with Request Content specified in CloverETL HTTPConnector component i.e.: { "map" : "function(doc) { emit(doc._id, doc); }"}

Read more about CouchDB views and mapping function: http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

readingwholeDBCloverETL-Manage CouchDB

Reading a Single Document

The CloverETL HTTPConnector reads a single record from the database using the GET method and a defined ID suffix in URL – i.e. http://URL/DB_NAME/ID

ReadingSingleDocCloverETL - Manage CouchDB

JSON Extract

After getting the JSON file from CouchDB, extract is desired for further data manipulation in CloverETL. For this, CloverETL has a special component – JSONExtract. Check out more on JSONExtract in the CloverETL documentation:
http://doc.cloveretl.com/documentation/UserGuide/topic/com.cloveretl.gui.docs/docs/jsonextract.html

Share this article:
Contact Us

Further questions? Contact us.

Forum

Talk to peers on our forum.

Want to keep in touch?

Follow our social media.

Topics

see all

Recent Posts