What is Data Lake? | TapUp Digital Glossary

A data lake is an infrastructure for storing all kinds of data in one place, regardless of format.
Its biggest feature is that it keeps data in its original, raw state, without any processing or organizing.

It's often compared to a Data Warehouse, which is more like a neatly organized warehouse.
A data lake, by contrast, doesn't lock the information into a fixed format when storing it, which makes it flexible enough to support all kinds of analysis later on.

It's well suited for gathering large volumes of data generated in real time, like smartphone usage logs or factory sensor readings.
In the fields of Machine Learning and AI, it's often used as the foundation for handling the massive amounts of raw data needed as source material.

That said, if you just keep piling data in without any structure, it can turn into what's known as a "data swamp"—a mess where nobody can tell what's in there.
The trick to using a data lake well is to set clear rules for managing where things are and keep the contents organized.

Data Lake

In Simple Terms

Behind the Name

Take a Closer Look!