Category Archives: Others

Java Vulnerabilities: No Impact on CloverETL Products

The recent discoveries of Java’s vulnerabilities have caused concerns for many organizations and individuals alike. With recent questions from multiple customers, we’d like to reassure you that our products are not impacted by these vulnerabilities.

The security holes are related to a Java plug-in on the browser where it can be used by hackers to overtake or silently install malware on the visitor’s computer (more details can be found in this article: http://krebsonsecurity.com/2013/01/what-you-need-to-know-about-the-java-exploit/).

We do not use Java in the frontend (browser plug-in) in any of our products; even for our Server application, the Server Console UI is built on top of Rich Faces – which only uses HTML and basic Javascript in the browser.

However, we recommend that you update your Java package for your own protection in other applications. Please also consider upgrading to our latest stable build: CloverETL version 3.3.1 – http://www.cloveretl.com/resources/releases/3-3-1; the Designer installer package for Windows comes with an updated Java bundle that contains the latest patches.

CloverETL Server – LDAP settings

Introduction

The purpose of this post is to explain the CloverETL Server LDAP configuration and to provide necessary guidance and some how-to’s to learn LDAP and CloverETL Server integration. By going through this guide, you will be able to centralize your CloverETL Server user management into your LDAP/Active Directory.

LDAP is a powerful, standardized concept of organizing information. With that often comes a few major trade-offs:

  • It’s highly complex for beginners
  • There are vendor specific differences against standard
  • Error messages can be cryptical

Connecting to LDAP

Before configuring the CloverETL Server to work with LDAP, you should have a basic understanding of how LDAP works. If you’re familiar with it already, you can skip to the next chapter.

For basic information about LDAP, please read the this Wikipedia article. You may actually want to jump there right now and read the article there before continuing this one.

You also need an LDAP server instance. It makes sense to first check the installation by logging into your OS, mail account, etc. It can also be helpful to visualize your LDAP structure so you can get a better idea about what you’re doing. If your LDAP server provider does not have a good tool for that, you can try JXplorer. It’s free and simple to use. Talk to your system administrator to get a better understanding of your LDAP setup.

Distinguished Names (DN)

LDAP is based on storing objects in its database. These objects are referred to by their Distinguished Names (DN). Basically it’s a record of path through LDAP tree. A sample of such DN can be: cn=John Doe,ou=people,dc=MyCompany,dc=com

To reach such an object you need to drill down in reverse order like this:

  • find an object with attribute “dc” with value “com” in the root of LDAP directory
  • among its children nodes, find one with attribute “dc” set to “MyCompany”
  • among its children nodes, find one with attribute “ou” set to “people”
  • among its children nodes, find one with attribute “cn” set to  “John Doe”, that is it!

Here is an example object with its attributes:

cn=John Doe,ou=people,dc=MyCompany,dc=com
cn John Doe
objectClass organizationalPerson
sn Doe
uid doej
displayName John Doe
givenName John
mail johndoe@mycompany.com

LDAP search filter

The LDAP search filter is an expression used for search in LDAP tree. You can run such expression in JXplorer by Search->Search dialog. Fill Start Searching From with start node (called base in Server) and fill Text Filter. You can see the usage in the following images.

An example of user search:

An example of groups assigned to user search:

When a search is successful, you should see found nodes in the left tab “Results”. Click on it to see the object (select simple or text mode of HTML View).

LDAP authentication workflow in the CloverETL Server

The following steps describe how CloverETL Server and LDAP work together:

  1. The user fills the username and password into the login form and submits it; the Server checks the user type: either local (managed by the Server) or LDAP (this is what we’ll set up). Assume the user type is LDAP;
  2. The Server tries to connect to the configured LDAP server;
  3. If the connection is successful, the Server tries to find user by username (see below);
  4. If the user is found, the Server tries to list groups assigned to this user
  5. The Server tries to match the user’s LDAP groups to Server groups; i.e., all groups that are defined in the Server are intersected with user’s LDAP groups. All other are ignored. Thus, LDAP groups identify Server user groups which define user’s permissions;
  6. The user might need to be a member of an allowed group (see security.ldap.allowed_ldap_groups below);
  7. The password is checked against LDAP;
  8. User is logged in if password is correct and has at least one group assigned.

CloverETL Server LDAP settings

Let’s take a look at what you need to do in the Server configuration to enable LDAP authentication. For details on Server settings, please refer to this page in the documentation. It’s important to restart your application server after each change of configuration to apply it.

First, locate the clover.xml configuration. For example, on Tomcat server, each setting mentioned below would be written to [tomcat]\conf\Catalina\localhost\clover.xml file. Refer to the documentation to find the file in your installation.

A sample LDAP configuration in clover.xml:

<?xml version="1.0" encoding="UTF-8"?>
<Context path="/clover" crossContext="true">
<Manager pathname=""/>
<Parameter name="security.authentication.allowed_domains" value="clover,LDAP" override="false" />
<Parameter name="security.ldap.url" value="ldap://mail.mycompany.com:389" override="false" />
<Parameter name="security.ldap.user_search.base" value="ou=people,dc=MyCompany,dc=com" override="false" />
<Parameter name="security.ldap.user_search.filter" value="(uid=${username})" override="false" />
...
</Context>

In the XML above, you can see the Parameter tags. The following chapter will describe all LDAP-related parameters you can use in your configuration.

In case you encounter errors in any of the steps below, check your logs. You can read more about the Server logs location here. Interesting files are (for Tomcat):

  • [tomcat]\temp\cloverlogs\userAction.log - contains log of user actions and their results (login succeed/failed)
  • [tomcat]\temp\cloverlogs\all.log - contains more technical information

Enable LDAP log in (workflow – step 1)

security.authentication.allowed_domains

Enabled authentication methods. Can be either clover or LDAP or both separated by a comma (i.e. clover,LDAP).

Test correct settings

You should now be able to create a user with domain set to LDAP. Let’s create a new user doej for further testing.

Connection to LDAP server (workflow – step 2)

security.ldap.ctx_factory

The class name with namespace containing context provider implementations. Use “com.sun.jndi.ldap.LdapCtxFactory” for start.

security.ldap.timeout

Timeout when accessing your LDAP server. Use “5000″ (unit is milliseconds) for start.

security.ldap.records_limit

Limit of records number returned by the server. Use “50″ for start.

security.ldap.url

The URL of your server including port; in format ldap://hostname:port. The default port of the LDAP is 389, or 636 for SSL connection. For Microsoft Active Directory specifics, please see paragraph in the “Pitfalls” section. You can check your URL using JXplorer.

security.ldap.userDN
security.ldap.password

If your LDAP server requires authentication, fill in the credentials of the user who will perform the queries. It’s recommended that you create a dedicated user with minimal rights sufficient for this purpose. You can test the credentials by logging in via JXplorer to your LDAP server. “userDN” is the abbreviation for User Distinguishable Name.

Test correct settings

Try to log in to the CloverETL Server user with the newly created user (‘doej’ in our example). If the settings were incorrect, then “Login failed” will appear with a message similar to one of these:

  • my.domainX.com:3893 (server was not found)
  • my.domain.com:38931; socket closed (cannot open connection to found server)
  • …or another connection-related message

Use JXplorer and experiment to get working settings.

User lookup (workflow – step 3)

security.ldap.user_search.base

This property describes the DN of a node where the search for user object starts. Use the topmost node which contains all required users (as subnodes). For our example mentioned above, you would use “ou=people,dc=MyCompany,dc=com”.

security.ldap.user_search.scope

This setting specifies the behavior of your search:

  • SUBTREE – search recursively all child nodes of security.ldap.user_search.base
  • ONELEVEL – search just immediate child nodes of security.ldap.user_search.base
  • OBJECT – the object is selected directly, i.e. only security.ldap.user_search.base node is checked

security.ldap.user_search.filter

The filter to find proper object. The value can be something like (uid=${username}), where ${username} is substituted by the login that was typed into the login form, uid is the name of LDAP attribute which should match against the user’s login.

Test correct settings

Again, you can test this filter in your LDAP tool. See the LDAP search filter section. Just replace “${username}” with the user login. For example, (uid=doej) should find our user. Do not forget to set the correct start point for your search (see security.ldap.user_search.base).

It should also be possible to log into the Server already,  but the user will not have any group assigned yet.

Groups assigned to User lookup (workflow – step 4)

security.ldap.groups_search.base

This property describes the DN of a node where the search for group objects starts. Use the topmost node which contains all required groups (as subnodes). For our example, ou=groups,dc=MyCompany,dc=com.

security.ldap.groups_search.filter

This filter should return all group objects for the user found in step 3 where this user is a member. For example, (&(objectClass=group)(member=${userDN})). This search returns all objects of class “group” which contain attribute “member” set to the user’s DN. (${userDN} is replaced with the DN of the user object from step 3. You may want to change this query if your class or attribute names differ.

security.ldap.groups_search.scope

Same as security.ldap.user_search.scope.

Test correct settings

You can test this filter in your LDAP tool. See LDAP search filter section. Just replace “${userDN}” by some user DN. For example, (&(objectClass=group)(cn=John Doe,ou=people,dc=MyCompany,dc=com)) should find our user. Do not forget to set the correct start point for your search (see security.ldap.user_search.base).

Group binding (workflow – steps 5 and 6)

security.ldap.groups_search.attribute.group_code

Once groups are found, each value of an attribute given by this setting is extracted. For example a value “cn” and its value “my_test_group”, when extracted, are matched against the Server groups field “code”. In our example, the “code” field of the Server group must be set to “my_test_group” to be matched. The user gets a membership assigned in all matched groups.

security.ldap.allowed_ldap_groups

This field may contain a list of group DNs which the user must be a member of to be able to log in.

For example, cn=test1,ou=groups,dc=MyCompany,dc=com;cn=test2,ou=groups,dc=MyCompany,dc=com will allow access only to users with membership in “test1″ and “test2″.

Can be set to _ANY_ which turns off this feature and allows any LDAP user to log in.

Test correct settings

Now you should be able to log in to the CloverETL Server and be assigned proper groups based on your LDAP groups

Pitfalls

SSL access

For the CloverETL Server running in an application server, the SSH setting is fully transparent and managed on the application server level. Please follow the instructions for your application server.

Here are some useful tips:

  • Add a server certificate to Java default truststore. See here (section “Importing Certificates”)
  • Create your own trust store and replace the Java default trust store via a system property (sample here)

You also need to set the system property com.sun.jndi.ldap.connect.pool.protocol=ssl in a place where you normally set other system properties (e.g. in “[tomcat]/bin/catalina.sh” add to “CATALINA_OPTS” -Dcom.sun.jndi.ldap.connect.pool.protocol=ssl). Please note that this will not work if set in environment variables. In the end, you should have:

  • -Djavax.net.ssl.trustStore=keystore.ks (if you have your own trust store)
  • -Djavax.net.ssl.trustStorePassword=MY_PASSWORD (if you have your own trust store)
  • -Dcom.sun.jndi.ldap.connect.pool.protocol=ssl
  • -Djavax.net.debug=true (recommended for setup phase)

Microsoft Active Directory (AD) – Global Catalog and Referrals

If your Microsoft AD is using “referrals,” you can be getting a message like Unprocessed Continuation Reference(s); remaining name 'DC=ad,DC=mycompany,DC=org' in the CloverETL Server log. Referrals is a technique of linking information scattered across the LDAP directory into one node. See LDAP Referrals for details. For example, when users in your LDAP directory are placed in multiple locations, you are able to virtually aggregate them into a single location by using referrals. By default, Microsoft AD is running services on these ports:

  • 389 – the default LDAP port; you can use this if you don’t have referrals
  • 3268 – global catalog port which is able to follow referrals. See What Is the Global Catalog? for details.

Thus, if you suffer the error mentioned above, it may help to change the Server setting security.ldap.url to use the global catalog. So for example:security.ldap.url=ldap://my.domain:3268

Known issues

All versions of the Clover Server prior to 3.3.0-M3 contain bugs reported in issue CLS-735. This means that when attribute values used for DNs contain a comma “,” character, the login fails. When “cn” in the example above is changed to “John, Doe” then it’s incorrect and will cause problems during the log in process. This is important mainly for DN used in the “member” attribute of “group” object.

CloverETL Visions for 2012: Evolution and Revolution in Data Integration

Part one – celebrating 10 years

In 2012, CloverETL will celebrate its 10th anniversary as an open source project. It all started back in 2002. On October 3rd, 2002, version 0.1 was first announced on the Freshmeat (now Freecode) portal. That day, CloverETL’s official life began.

I don’t want to look into Clover’s history too much, though. I do, however, want to take this time to make a few comments about the principles on which CloverETL was established and how these principles continue to determine its future.

Principle number 1: Elegant and robust architecture guarantees a stable foundation

CloverETL started more as a framework on which other projects could be based, rather than as an end-user product with a “sexy” GUI. As a matter of fact, the real GUI was built in 2005, almost three years after the release of first CloverETL engine, which is now present in every tool of the CloverETL family – the Designer, the Server and also CloverETL Profiler.

Even though we are now on version 3.2, there has, so far, only been one change which significantly broke backward compatibility: when we switched from Java 1.4 to Java 1.5 and changed some key interface definitions.

This particular principle is what gives a certain peace of mind to the projects and software products embedding or otherwise deploying Clover, as they know there won’t be any sudden surprises with future versions. It also proves that the original architecture was robust and flexible enough at the outset to support all the later additions and improvements.

Principle number 2: Less is better

CloverETL is based on idea of cooperating components, each specialized with one certain functionality only. However each component is flexible enough to support various “outer” conditions in which the component works.

For example, our UniversalDataReader is meant for parsing text data. The data can come in variations like fixed-length, delimited, or combined; can be read locally or from remote locations; and can be available in plain form or compressed. All these variations are supported, which means that subtle changes, like data becoming available through a different protocol or perhaps being suddenly compressed, require only slight reconfiguration of our DataReader. Contrast this with other players, whose hundreds of different components require architecture changes in transformation (replacement of one component with other) when small shift in input data happens (e.g. due to moving from DEV to PROD environment) and you’ll notice the difference.

It also means that a programmer or analyst designing data transformations in Clover does not need to carry a dictionary of components; a short list covers all possible scenarios.

Principle number 3: Agility is sexy, but long term planning is wise

CloverETL is used in many applications by many customers. Some of them are large, global corporations that embed Clover in their products. Through our OEM program, we work with many customers with a very agile approach to the development of their applications. Some of them have release cycle as short as two weeks where they must  not only develop & debug, but also release new features. Clover’s development team tries to keep up with this sprint, but we still take our time to plan, architect, and develop new, fundamental features to extend CloverETL’s capabilities and help our customers do their jobs faster and simpler.

The reason we insist on thinking through every new feature request, beyond simple tweaks, is that sometimes relatively small and quick change may break compatibility somewhere or prevent future extensions. Whenever our development team touches the core (engine) we make sure the change is properly evaluated from several points of view, including:

  • Backward compatibility – at least at transformation graph level.
  • Performance – Slowdown of just a few percent on big data can mean extra kW of energy consumed by data crunching servers.
  • Future extensibility – We hate deprecating APIs or components just because we might not be able to continue enhancing and improving them.

This principle is further supported by the fact that CloverETL continues to be developed by the same, stable development team year in and out. Many team members have been around since 2005, when the commercial life of Clover began.

Part two – What will appear on the menu in 2012

In short, there will be evolution and, in certain areas, some revolution. We are always sorting out the dilemma of whether to break from the “past” and come up with something completely new and revolutionary – at least in our minds – or continue to improve the old-faithful engine architecture laid out years ago.

As we weren’t able to choose one or the other, we decided to continue improving what works well (and should continue to, even in future) and overhaul some things that have had occasional hiccups with modern data structures and formats brought to us by the CLOUD.

Evolution

Expanding CloverETL OEM program

As CloverETL attracts new OEM customers, we continue improving our OEM program by making it simpler to embed, modify, white-label, or otherwise enhance our technology stack. This includes better documentation, example projects, and extended training.
We are also investing in our support team, which has always strived to provide timely and accurate answers to all support requests submitted through various channels, from e-mail to the technology forum and hotline.

Our support staff is comprised of experienced consultants and programmers who have real-life experience with our technology—they aren’t just people a few manual pages ahead of a user seeking an answer.

GUI – continuous improvement of the user experience

We will continue our effort to make the Designer more and more user-friendly. Our motto is: CloverETL is built by professionals for professionals and, truly, professional DI experts or Java programmers usually give us high marks. Nonetheless, we want to make our technology accessible to the broadest possible audience seeking solutions to certain data needs.

Enhancing CloverETL Cluster – our BigData recipe

These days, BigData is usually mentioned together with Hadoop as the solution. As much as we like Hadoop for various reasons, we have our own recipe for processing BigData, and we think it’s better suited for classical data integration/ETL tasks. It is based on a split/transform/merge idea, where big input data are partitioned and then processed in parallel on multiple nodes of the CloverETL Cluster. The advantage of this, as opposed to Hadoop, is that the transformation may be developed & debugged locally, then easily deployed onto CloverETL Cluster for fast execution. Even if executed in a cluster environment, all the debugging and monitoring options of our Designer are available. It is also worth mentioning that deploying CloverETL Cluster is much easier than setting up the Hadoop cluster.

Our big enhancement of CloverETL Cluster in 2012 will be the merging of our technology with Hadoop – more precisely HDFS filesystem – which should combine the best from both worlds. HDFS provides some cool features, namely robustness and high performance, and we want to utilize its automated data partitioning to make it easier to grow (or shrink) the storage of data depending on actual needs.

Revolution

Rich data structures – trees, unstructured data, etc.

It has to come with age, but I can’t resist and must admire those who devised Cobol and CopyBook. In those times, every byte of storage counted and CPUs were slow, yet programmers were still able to process rich data structures. Then relational databases came and brought the idea of tables and normal forms. Well, today, we are back to rich structures, but this time, we’ve stopped counting bytes or CPU cycles (which has a huge impact on power consumption of servers, but that’s a different story.) That is why XML, JSON, or other rich structures are becoming the norm today.

In order to support these structures and formats as first class passengers, we decided to overhaul our metadata and record storage model and allow direct support of tree structures, multi-values of fields, and even loosely typed data organized in maps/properties collections.

This independently constitutes as a big adventure, as every single piece of our technology platform will be affected, and thus will have to be adapted. The effort will be huge, and necessary regression testing of the whole platform will be endless. Despite this, the prize is enticing: almost any type of data (and the cloud will be bonanza for this) will be 1:1 representable by Clover. That will include XML, JSON, POJO, and complex properties – and, in the future, who knows what else!

—–

We have always claimed that CloverETL is future-proof. Therefore, in 2012, we will be improving our foundations so they withstand the next 10 years.

If what I’ve talked about above is of interest to you, then please stay tuned. We will be publishing more details on our new functionality as we implement it.

For now, I wish everyone a very successful 2012!

A Look Back: CloverETL and Data Integration in 2011

As 2011 comes to a close, we’d like to take the time to reflect on what this year has brought CloverETL, its users, and our customers.

Since CloverETL is, after all, a data integration platform, the world of integration is at our core. We’re constantly striving to challenge ourselves in new ways and improve how we approach data integration. This year was no different.

Enhancing Our Core – Two Upgrades of CloverETL

In the past six months, we released two upgraded versions of CloverETL. CloverETL 3.1, published in June, brought significant changes to the platform in several areas. With a deeper focus on connectivity and enhanced support of various data formats, CloverETL 3.1 helped users better process data with complex structure, emails, and Lotus documents, to name a few. The latest version of CloverETL, version 3.2, offered further enhancements to the user experience, as well as improved the processing of large data records.

Data Integration Meets Data Quality – CloverETL Profiler

This year was also a year for new products. With Clover, we’ve moved forward with an evolved sense of the data world. Because data integration, data quality, and other data disciplines are becoming more and more intertwined, we developed the CloverETL Profiler, data profiling application. Released in beta back in October, the profiler helps users make informed decisions on how to improve the quality of transformed data, which is particularly useful as precursor to a greater data integration projects. CloverETL also integrates more easily with the AddressDoctor solution to improve the quality of geographical information.

Strengthening CloverETL Presence in the US Market

In June, Javlin, the developer of CloverETL, opened up its new office in the Washington D.C. area, which became the headquarters of Javlin Inc., our US presence. Javlin Inc., with both a dedicated sales and customer service force, brings Clover to a whole new market of possibilities.

Last but not least, we are pleased to see that our OEM data integration offer will have a number of important implementations in the upcoming year. (But more on that later. Stay tuned.)

As we leave 2011, we can say that this past year was a whirlwind of hard work, exciting releases, and interesting customers and stories. We’re looking forward to another great year with CloverETL. Cheers to the New Year.

Speed-up Installation of Plugins in Eclipse

Installing plug-ins into Eclipse can in some situations take a long time. This can also affect users of CloverETL Designer if they choose the Online or Offline Eclipse Plugin Installation download type. This blog post describes a workaround for the slow install process.

The reason for the long installation time is that by default Eclipse contacts all available update sites to try to resolve dependencies of the plugin being installed. There can be a large number of update sites, some can be not responding or slow and overall the connection can be bad. To disable contacting of all update sites, uncheck the “Contact all update sites during install to find required software” checkbox in the installation dialog:

Contact all update sites checkbox

This workaround can help not only when installing CloverETL Designer, but also when using Eclipse in general. However, it does have a drawback – some dependencies of the plugin being installed might not be resolved. This can also happen when installing CloverETL Designer, because it depends on the GEF and RSE plugins. The plugins are found in the main Eclipse update site which would not be contacted when using the described workaround. Eclipse will detect that some dependecies of CloverETL Designer are not met and will not proceed with the installation:

CloverETL dependencies not met

In case some dependencies are not resolved, there are 2 options:

  •  find and install the dependencies manually (in case of GEF and RSE they can be found in the main Eclipse update site)
  •  accept the long installation time and enable the checkbox back. Eclipse will resolve the dependencies automatically

Hopefully this hint will help some users of CloverETL Designer or Eclipse with the slow installation. However it’s important to understand that it’s NOT mandatory to use the workaround as the installation is quick in many cases – use it only in case of issues.

2010 in Review

Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

Building DWH with CloverETL: Slowly Changing Dimension Type 2 May 2010

2

Export from a database to Excel March 2010

3

DataDirect’s OracleDB JDBC driver speed test January 2010
1 comment

4

Integration of Clover with PHP June 2010
1 comment

5

How to easily enrich data using CloverETL’s Auto-filling feature June 2010

Win iPad with CloverETL Community

CloverETL Community Edition has been met by a huge interest from the data integration community and confirmed that there was a need on the market for top, free-of-charge data integration software. During the first days after its release,  downloads of CloverETL Community reached almost 70% of all downloads.

You can help us to spread this fantastic news and win many interesting prizes including iPad and Asus netbook. Participating is simple; you can use Twitter, Facebook, LinkedIn, your own blog, or our fancy poster to enter the contest.

Take part in the contest. Good luck!

Javlin Featured in a Prestigious Economic Magazine

Javlin and its CEO David Pavlis are featured in what many consider the best and most important economic magazine in the Czech Republic, EKONOM.

The two page article was released in the prestigious magazine today and discusses how Javlin survived last year’s crisis and even managed to make a profit.  David Pavlis, the CEO of Javlin, explains why Javlin is so successful and shares his tips on how to survive an economic crisis.

Tips to survive an economic crisis:

  1. Early monitoring and identification of the crisis signals
  2. Geographic risk diversification:   have customers around the world.
  3. Industry risk diversification:  have customers from various industries.
  4. Product portfolio risk diversification:  have a product that is usable in all industries in every company size.
  5. Start saving in the good times:  have backup money to use in a crisis.
  6. Have a good vision and strategy:  communicate it within the whole company.
  7. Invest in good people. It’s the only way to have a loyal and efficient team.

Javlin’s main product, CloverETL, is so universal it is usable by any company in all industries. This surely helped us  survive the crisis. Even though the revenue fell last year from 2,6 mil USD to 1,9 mil USD, Javlin was still very successful year considering the big crisis and the bankruptcy of many  companies.

Javlin has shown stable profit for many years and grows constantly worldwide. Currently having offices in Washington DC, Atlanta, Prague and Brno, Javlin plans to expand to Asia and South America as well. It is important to have local offices with local employees who understand the market and the business culture. Opening offices in the US and setting Javlin’s own US subsidiary helped the business growth enormously. The new daughter company Javlin Inc. was immediately profitable from selling CloverETL licenses on the American continent.

David Pavlis also explains what helps to have a successful company even in bad times. The key is a good, loyal efficient team who identify themselves with the product and the whole company.  As encouragement, the management divided some of the company’s shares between the best employees.

For those who understands Czech :-), here is the original article from EKONOM: http://ekonom.ihned.cz/c1-44868760-americke-mimikry-funguji

Internship at Javlin from an American Perspective

As the second intern for Javlin, I wanted to share my wonderful experience so far working as a developer here. I am a senior attending Georgia Institute of Technology and I have been abroad to Europe many times before on vacation, and two years ago to travel throughout Europe with a study abroad group. I have to admit, I had my doubts as to how the summer with the internship would turn, but it truly is shaping out to be amazing.

First, the company. Javlin is composed of a group of great people and awesome developers. The development team has been very helping, all speak english quite well, and are eagar to talk java. We go out to lunch together everyday where we explore the various restaurants around Prague.

Prague is an exciting city. I think the best way to describe it is as a perfect blend of Western European and Eastern European culture. It has all the comforts and modern elements of countries such as Germany and Italy, while still having the prices and natural elements of Slavik countries to the east. The streets are filled with people from all over the world, some enjoying the nightlife while others are exploring the extrodinary architechture and learning about the city’s rich history. I think its this atmosphere that makes my walk through the city’s center on my way to the office so enjoyable.

The programming has been a great experience as well. I was nervous at first, because when I came in I had little knowledge about data warehousing or ETL. After learning how CloverETL works, I had my first task on editing the program’s GUI. I was told the internship will be challenging, which I was hoping for, and it turned out to be just that. I dove into a sea of code that makes up the program, referencing multiple APIs and reading developer guides as I started to figure out how things worked. My supervisor was very helping, explaining anything I had questions about which was great for those concepts that are hard to grasp from a manual. After completing this task, I was assigned another task which involved understanding serveral different database systems which I am still working on. The internship has been able to remain challenging and educational while still providing the satisfaction of making a difference.

Looking ahead, my supervisor has told me about future assignments which include working on the installer and shell scripts for the installation process. I have already started reading about NSIS, the installer system Javlin uses, and am excited about experimenting with it. The shell scripts will finally fulfill my longtime to-do task of learning and using Linux. I truly feel this experience will more than prepare me for any challenging entry position I may attain when I graduate in December.

Data Quality at a Glance Conference

Javlin a.s., producer of CloverETL, took part in a Data Quality at a Glance Conference held on April 20th, at PriceWaterhouseCoopers’s premises in Prague. This conference was organized by IDG and Javlin served there as the professional supervisor partner.

Javlin together with other conference partners PriceWaterhouseCoopers, SAS and Ataccama each held presentations on Data Quality topics.

The first presentation was held by Mr. Snytr. It discussed Personal data and its quality from the view of the Office for the Protection of Personal Data.

Mr. Maly, senior manager at PriceWaterhouseCoopers talked in his presentation about Optimizing business processes. According to him, data quality is closely related to quality processes. Poor data has an impact on strategic decision-making and can cause a loss of business opportunity and/or profit. Therefore, it is important to find the source of data errors and set the right data processes. Mr. Maly emphasized that it is essential to cleanse data continuously.

Mr. Kyjonka from SAS held a presentation on One version of truth for all or MDM in business life. He highlighted importance of Data integration, Data Warehousing and MDM for getting the right data on time. He showed 3 important parts of MDM – System of Record, MDM Hub and Integration infrastructure. Further it was shown that 4 different types of MDM solutions can be used for various purposes – Registry style, Transaction style, Hybrid style, and Consolidation style (ETL). Choosing the right style of MDM solution depends on budget and how much time the company has. Quality cleansed data, technical infrastructure, Data Governance program, willingness to share data, etc are some of the important factors for MDM.

Mr. Matous, Javlin’s consultant had a presentation on Data Cleansing. According to him data quality is the process of detection, reporting and correction of the invalid or missing values in data. Mr. Matous made it clear how important it is to do data audits and why to use a data quality scorecard. A data quality scorecard tracks the financial impact of poor data and estimates the return on investment into Data Quality activities. It helps managers determine whether or not to invest into data cleansing tools. Mr. Matous talked about several Data Quality benefits in business. Some of them include increased efficiency of marketing campaigns, early warning system, increased credibility and reputation among customers. It was also shown how data cleansing could be done using CloverETL.

Mr. Vojtek, VP of Engineering at Javlin, discussed a specific case study where Javlin had undertaken data quality improvements. He emphasized how data quality is essential for quality business results. It was shown what kinds of pitfalls could be experienced in a data quality implementation in multinational companies.

The last presentation of the conference was hold by Mr. Kyjonka from SAS. He discussed a case study named The clever way to cleanse data. He argues that when companies think they have only 10% of poor or bad data, they usually have about 40% bad data. In his particular case 90% of the data had to be corrected during the data cleansing process.

Live from the Conference – @CloverETL on Twitter. Follow the tag #DQPWC.