Lakehouse
Data Lakehouse is a modern data architecture that combines the benefits of data lakes and data warehouses.
What was the origin of the Lakehouse concept?
The term was coined in 2020 by Databricks, the co-founders of Apache Spark, embossed. The aim was to reduce the restrictions of Data Lakes and data warehouses to overcome and create a unified platform that supports various data types and workloads. Data lakehouses address the weaknesses of Data Lakes (missing Governance) and data warehouses (high costs, inflexibility) and are considered a future solution for central data storage and analysis.
What is the core concept of Lakehouse?
A data lakehouse stores structured, unstructured and semi-structured data cost-effectively in central storage. At the same time, it provides data management capabilities and enables structured queries.
What are the benefits of Lakehouse?
- Cost efficiency: By using cost-effective cloud object storage, data lakehouses are cheaper to operate than Data warehouse.
- Streaming: It supports real-time data streams and enables real-time analyses.
- Open file formats: Most data lakehouse structures are based on open-source formats such as Delta Lake, Apache Iceberg, and Apache Hudi.
- Lower data redundancy: Uniform data storage minimizes data movements between different systems.
- Schema enforcement and data governance: Data lakehouses address typical data governance challenges of data lakes by enforcing defined data collection schemes.
- Separate storage and processing: The architecture decouples storage and processing to ensure scalability for various workloads.
- Transaction support: ACID transactions (Atomicity, Consistency, Isolation, Durability) ensure consistency when data is read or written simultaneously by multiple users (often based on SQL).
Architecture Lakehouse
- Storage layer: Here, all raw data is stored cost-effectively in object storage (independent of processing resources).
- Staging layer: This acts as a metadata hub and catalogs the stored data objects. It enables essential data management features such as schema enforcement, ACID properties, indexing, caching, and access control.
- Semantic layer: At the forefront of architecture is the lakehouse layer. Here, data is made available for user interactions through client applications and analysis tools (experiments and business intelligence).
Provider Lakehouse
Snowflake Apache Iceberg - https://www.snowflake.com/en/data-cloud/snowflake-for-data-lakehouse/
Google Biglake - https://cloud.google.com/biglake?hl=en More information https://research.google/pubs/biglake-bigquerys-evolution-toward-a-multi-cloud-lakehouse/
Let's find the right solution together.
Thomas Borlik
Do you have questions aroundLakehouse?
Passende Case Studies
Zu diesem Thema gibt es passende Case Studies
Which services fit toLakehouse?
Follow us on LinkedIn
Stay up to date on the exciting world of data and our team on LinkedIn.