Clearing the Skies for Cloud Data Warehousing

The different Cloud Data Platforms

Figure 1. The different Cloud Data Platforms

I remember presenting in December 2021 on the Knowledge Area: Data Storage and Operations. The main reasons for doing the presentation were:

  1. Understanding the latest developments and hype about the different approaches

  2. Helping others understand the latest developments

It disturbs my contentment when people are shouting about the latest data management silver bullet, and I don't know what they are talking about.

When I saw the first slide from Barry, I was relieved that I was not the only person who felt the same way.

These three approaches are receiving much attention from vendors and thought leaders.

Three weeks ago, Bill Inmon spoke to us about Data LakeHouse.

Progression from DW to DL to DLH

Figure 2. Progression from Data Warehouse to Data Lake to Data Lake House

This week, Dr. Barry Devlin talked to us about all three approaches.

Barry did a much better job than me, and I appreciated his constant alignment with the Data Warehouse (DW) Architecture he has been working on since 1988 with IBM. Barry continues developing and mapping his Architecture and will publish Version 2 of his book in January 2024.

The first time I spent time understanding his DW Architecture was when I bought Business UnIntelligence. The aspect I appreciated the most was the integration and alignment of the information pillars:

  1. Machine-Generated Data (MGD)

  2. Process-Mediated Data (PMD)

  3. Human-sourced Information (HSI)

And, of course, the inclusion of different data: Measures from Machines, Business Events from Systems and Business Messages from Humans.

The bridge across these three pillars is Unified Metadata, my current favourite topic.

Context Sensitive Information (CSI) and Choreography assimilate data and information across the three pillars.

Managing the different forms of Data to bring it all together

Figure 3. Managing the different forms of Data to bring it all together

Using the above architecture, Barry explained the different patterns. I will attempt to summarise Barry's mapping in

Data Lakehouse Pattern

All the data across the three pillars is loaded into a data lake and then assimilated into a logical Data Warehouse. This pattern relies on technical Metadata to load and transform the data.

The Applied Data Lakehouse pattern

Figure 4. The Applied Data Lakehouse pattern

Data Fabric Pattern

Advanced Analytics creates Active Metadata. Active Metadata assimilates the data into the Data Warehouse structures. The fabric consists of several layers that build the Active Metadata and automagically transform the data into data products.

Data Fabric is an AI-driven Logical Data Warehouse.

The Applied Data Fabric Pattern

Figure 5. The Applied Data Fabric Pattern

Data Mesh Pattern

Data Mesh pays much more attention to organisational structures, data products and domain-driven development. One of the core drivers is decentralisation and the desire to address bottlenecks in centralised systems. It follows the Microservices approach that must scale to support millions of concurrent users across all time zones.

Each domain aligns with elements of the business and its value proposition and value chain.

I like the alignment with the business, but I am concerned about the allocation and interaction between the domains.

The Applied Data Mesh Pattern

Figure 6. The Applied Data Mesh Pattern

In Summary

The Data Lakehouse pattern is the most straightforward jump from where most organisations currently find themselves.

The Data Fabric sounds exciting but is dependent on the AI Silver Bullet. AI is undoubtedly achieving breakthroughs, but can it integrate and unify metadata in a way we have failed to do without our help?

Data Mesh has gone entirely decentralised, but it aligns with business value.

Someone positions the Data Fabric and Mesh; this way, do we get Machines to harmonise our Metadata or break it up and do it ourselves?

Gartner's hype cycle was not confident that Data Mesh would survive; they may be too close to Data Fabric.

Data Mesh - Obsolete before Plateau

Figure 7. Data Mesh - Obsolete before Plateau

Previous
Previous

Navigating the NDMO Transformation – Together.

Next
Next

Unified Metamodel for Data Management Specialists