Tom Breur wrote an article entitiled “ETL is Dying” in which he summarizes causes for the demise of the Extaction, Transformation, and Loading (ETL) process. Two reasons were the following: (1) ETL packages are simply an abstraction of the architecture, and could potentially be relegated to native code and (2) that ETL focuses on the wrong areas of development—namely, at the software level versus the actual source and target data models. He argues that due to ETL software dependencies, many quality-driven practices such as native code design, Test-Driven development, and alternative ways of thinking become constrained. To move forward, he states, ETL must give way to a different strategy such as those used through Data Warehouse automation or manual customization such as coding.

It is easy to understand how Mr. Breur could arrive at his conclusions. The ETL tools that have developed across throughout the last 20 years are mainly based on waterfall or smaller dataset models and do not always provide the support needed to support an agile or BigData solution. But we do need to separate the difference between the ETL tools versus the ETL concept. To this point, I argue that we will always need ETL as a concept and we will always need to use the software behind it as well, albeit in a different format.

It is true that ETL packages are an abstraction of today’s software. But there is a big difference between the tool versus the concept. Concluding that “ETL is dying” is akin to concluding that “the telephone is dying”. Yes, the gigantic Yellow Pages phone book may not be a necessity in every household as in the past, and the traditional LAN line may be going away, but the concept of “telephone” itself never has died, it is just transitioning into a new form such as Voice Over Internet Protocol. Let’s compare this with a technology that has truly given way to newer innovations–the horse-and-buggy (and this is arguable, given that any day of the year, they are still being used for recreation). The horse-and-buggy was supplanted by the automobile in the beginning at the 20th century and therefore is no longer required to achieve the necessary goal of moving from Point A to Point B. Is the “ETL is dying” argument more like the displacement of the telephone or more like the 19th century concept of horse-and-buggy? I argue that ETL is still necessary, and increasingly so. It is the software itself that may be archaic. Put another way, don’t hate the concept, hate the tool. And as most software engineers will agree, a good toolset is one that includes frequent updates.

Mr. Breur’s second argument is that the focus on ETL coding constrains the ability required for a more agile brand of thinking and automating. Again, he makes some good high-level deductions, but only if the tools themselves are used in an out-of-the-box fashion. As noted, the big development companies are slow to upgrade their mainstay products for several reasons, not the least of which is breaking their original codebase which might have been around for years. But to their credit, most vendors include methods to run scripts through web services or other API calls. It is quite realistic (and strongly suggested!) for ETL designers to retrofit what normally would be a lengthy ETL flow process and break it into multiple parts, each of which then could be tuned to support object-oriented best-practices such as encapsulation, abstraction, and inheritance. Thus, by meshing traditional ETL tools with more modern approaches such as callouts and scheduling applications, teams can achieve the best of both worlds—a development tool with greater agility than just the use of an ETL tool or manual code alone.

Mr. Breur’s does touch upon an important concept. The takeaway from this debate is that all developers, from those who build the actual software to those who do the actual work, should not rest on their laurels and should not be complacent with the old ways of design and deployment. Progress requires us to strive and to achieve the most efficient and optimal ways of automating routine and complex tasks. If taking some hits in our traditional ways of thinking is the key, then so be it. But this change does not require renovating our concept of ETL, at least not in its pure form. From that standpoint, ETL as a concept remains very much alive and kicking.

So what do you think? Should we continue to use ETL in its present format or are major modifications necessary? Are there any other technologies out there that have gone the way of the horse-and-buggy? Please provide your thoughts below.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>