Data Warehouses and the Flying Car Dilemma

DZone's Guide to

Data Warehouses and the Flying Car Dilemma

It is a bad idea to modify a car to fly, and you can't replace a car with a jet. Learn how this is analogous to expecting legacy systems to handle modern data workloads.

Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

Traditional data warehouses and databases were built for workloads that manifested 20 years ago. They are sufficient for what they were built to do, but these systems are struggling to meet the demands of modern business with the volume, velocity, and user demand of data. IT departments are being challenged from both ends. On one side, companies want to analyze the deluge of data in real-time (or near-real-time). On the other side — on the consumption end — the need to analyze and get value out of data is increasing exponentially. A decade ago, companies had a handful of analysts who ran reports and a few dashboards. Today, enterprises have armies of data scientists, analysts, and savvy business users wanting to slice and dice the latest data.

The Flying Car Dilemma

Companies have invested millions of dollars in procuring and customizing legacy systems. In addition to the capital expense, these companies have spent years hiring and training resources to support and maintain these systems. Thus, IT is under tremendous pressure to leverage these investments to meet the evolving needs of the business.

However, these systems were not designed for modern data workloads that emphasize real-time insights to changing customer or machine conditions. Expecting legacy systems to do this is analogous to expecting a car to suddenly adapt and begin flying like a jet. Putting wings on the car and making modifications might work temporarily, but it's unlikely to work long term. In essence, it is a bad idea to modify a car to fly. This is the flying car dilemma.

Can You Replace a Car With a Jet?

This is the architecture of a traditional data warehouse. The source data starts with a transaction system, which is typically a database, such as Oracle or SQL Server. Periodically, you have to use ETL tools, such as Informatica, to batch load data from your transactional system into a data warehouse product, such as Teradata, Oracle, or Sybase IQ. Then, you run reports in the data warehousing environment. This worked fine for nightly and weekly analytics, but in today's world, this approach has the following challenges:

The solution to this problem is not "rip and replace." Replacing a legacy system is similar to trading your car in for a jet aircraft. First off, jets are not affordable. Secondly, even if you buy a jet aircraft, you need new infrastructure such as runways, mechanics, pilots, and to set up new processes for the jet to work.

Ripping and replacing the legacy system has similar challenges:

There is a reason why we still have mainframes in modern enterprises.

Augment Instead of Replace: The FedEx Drone Approach

The sensible solution to the flying car dilemma is to take the approach a company would take with shipping. For example, let's say a company is using FedEx Overnight for packages that need to be delivered the next day, and FedEx Ground for packages that need to be delivered in a week.

Taking this approach with a data warehouse would mean isolating workloads that contain real-time analytics, high ingest, and high concurrency (i.e. FedEx Overnight) from the workloads that do not have these requirements (i.e. FedEx Ground). Now, we put the Fedex Overnight workloads on a modern real-time data warehouse such as MemSQL and allow the FedEx Ground workloads to continue on a legacy system. Here, we are not using an expensive jet to replace a car; rather, we are using a drone that is inexpensive, easy to use, and — most of all — easy to adopt.

Using a product such as MemSQL has the following advantages:

Some of the technical advantages include:

Augmenting the current architecture is a pragmatic approach to solving the flying car problem. Over time, as the demand for real-time requirements increase, more and more workloads can be added to MemSQL (the drone slowly transitions into a jet). Since MemSQL converges transactional and analytical workloads, the architecture is simplified over time. As shown in the diagram below, this reduces, and sometimes eliminates, the need for costly ETL tools and complex processes.

So, if you have a flying car dilemma, take the FedEx approach: start with a drone and, over time ,transition into a jet.

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

DOWNLOAD
Topics:
data warehouse ,big data ,data analytics ,data science ,real-time data ,architecture ,legacy systems ,etl ,tutorial

Published at DZone with permission of Prashanth Ponnachath, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.