Clearing the Skies for Cloud Data Warehousing
I remember presenting in December 2021 on the Knowledge Area: Data Storage and Operations. The main reasons for doing the presentation were:
Understanding the latest developments and hype about the different approaches
Helping others understand the latest developments
It disturbs my contentment when people are shouting about the latest data management silver bullet, and I don't know what they are talking about.
When I saw the first slide from Barry, I was relieved that I was not the only person who felt the same way.
These three approaches are receiving much attention from vendors and thought leaders.
Three weeks ago, Bill Inmon spoke to us about Data LakeHouse.
This week, Dr. Barry Devlin talked to us about all three approaches.
Barry did a much better job than me, and I appreciated his constant alignment with the Data Warehouse (DW) Architecture he has been working on since 1988 with IBM. Barry continues developing and mapping his Architecture and will publish Version 2 of his book in January 2024.
The first time I spent time understanding his DW Architecture was when I bought Business UnIntelligence. The aspect I appreciated the most was the integration and alignment of the information pillars:
Machine-Generated Data (MGD)
Process-Mediated Data (PMD)
Human-sourced Information (HSI)
And, of course, the inclusion of different data: Measures from Machines, Business Events from Systems and Business Messages from Humans.
The bridge across these three pillars is Unified Metadata, my current favourite topic.
Context Sensitive Information (CSI) and Choreography assimilate data and information across the three pillars.
Using the above architecture, Barry explained the different patterns. I will attempt to summarise Barry's mapping in
Data Lakehouse Pattern
All the data across the three pillars is loaded into a data lake and then assimilated into a logical Data Warehouse. This pattern relies on technical Metadata to load and transform the data.
Data Fabric Pattern
Advanced Analytics creates Active Metadata. Active Metadata assimilates the data into the Data Warehouse structures. The fabric consists of several layers that build the Active Metadata and automagically transform the data into data products.
Data Fabric is an AI-driven Logical Data Warehouse.
Data Mesh Pattern
Data Mesh pays much more attention to organisational structures, data products and domain-driven development. One of the core drivers is decentralisation and the desire to address bottlenecks in centralised systems. It follows the Microservices approach that must scale to support millions of concurrent users across all time zones.
Each domain aligns with elements of the business and its value proposition and value chain.
I like the alignment with the business, but I am concerned about the allocation and interaction between the domains.
In Summary
The Data Lakehouse pattern is the most straightforward jump from where most organisations currently find themselves.
The Data Fabric sounds exciting but is dependent on the AI Silver Bullet. AI is undoubtedly achieving breakthroughs, but can it integrate and unify metadata in a way we have failed to do without our help?
Data Mesh has gone entirely decentralised, but it aligns with business value.
Someone positions the Data Fabric and Mesh; this way, do we get Machines to harmonise our Metadata or break it up and do it ourselves?
Gartner's hype cycle was not confident that Data Mesh would survive; they may be too close to Data Fabric.