Data lake concept needs firm hand to pay big data dividends

Article 2 of 4

Problem solve

Seven steps to a successful data lake implementation

by
David Loshin

Knowledge Integrity Inc.

Flooding a Hadoop cluster with data that isn't organized and managed properly can stymie analytics efforts. Take these steps to help make your data lake accessible and usable.

The concept of the data lake originated with big data's emergence as a core asset for companies and Hadoop's arrival...

Step 2 of 2:

as a platform for storing and managing it. However, blindly plunging into a Hadoop data lake implementation won't necessarily bring your organization into the big data age -- at least, not in a successful way.

That's particularly true in cases where data assets of all shapes and sizes are funneled into a Hadoop environment in an ungoverned manner. A haphazard approach of this sort leads to several challenges and problems that can severely hamper the use of a data lake to support big data analytics applications.

For example, you might not be able to document what data objects are stored in a data lake or their sources and provenance. That makes it difficult for data scientists and other analysts to find relevant data distributed across a Hadoop cluster and for data managers to track who accesses particular data sets and determine what level of access privileges are needed on them.

Organizing data and "bucketing" similar data objects together to help ease access and analysis is also challenging if you don't have a well-managed process.

None of these issues have to do with the physical architecture of the data lake or the underlying Hadoop environment. Rather, the biggest impediments to a successful data lake implementation result from inadequate planning and oversight on managing data.

This was last published in February 2018

David Loshin asks:

What advice do you have on organizing the data in a Hadoop data lake and making it available to users for analysis?

Join the Discussion

E-Handbook: Data lake concept needs firm hand to pay big data dividends

Article2 of 4

Up Next

E-Handbook:

Data lake concept needs firm hand to pay big data dividends

Article 2 of 4

Seven steps to a successful data lake implementation

Flooding a Hadoop cluster with data that isn't organized and managed properly can stymie analytics efforts. Take these steps to help make your data lake accessible and usable.

Data management mistakes can ruin your data lake journey

Seven steps to a successful data lake implementation

Three ways to turn old files into Hadoop data sets in a data lake

Building a data lake takes meticulous planning -- and flexibility