CDMP Specialist RMD (Reference & Master Data) – Golden Rules

Figure 1. Golden = Most Accurate & Reliable

It is no use dragging your organisation through the Reference and Master Data gauntlet if you do not radically improve the quality of your data and simplify data sharing and integration.

The critical success factor to Master Data Management is keeping your data GOLDEN! If you can not achieve this level of reliability you are selling empty promises and dreams.

Figure 2. Getting to GOLDEN requires Matching and Merging/Linking

There has been lots of talk about the Golden Record, but there is a lot more to address:

  1. Defining the Trust Rules for potential System of Record data sources.

  2. Understanding the Golden Data Quality Expectations for your Golden Values (attributes)

  3. Defining the appropriate set of Record Identifiers that are required to MATCH to the correct resource

  4. Once you have matched the resource, you may need to MERGE multiple incoming records.

  5. Or you may LINK instead of merge, and then your Survivorship challenges are much more straightforward.

  6. If you aim to have a unique Golden Record for each instance, you must merge your incoming Golden Values correctly.

  7. Your Survivorship rules and approach determine the correct merge process.

These seven golden principles will help you with more than just Reference and Master Data Management. It will help your Data Models, Data Quality monitoring and Data Integration & Interoperability.

Figure 3. Building a RMD Business Case requires the right motivation

Assuming that your business case has provided evidence of the challenges facing the business

and the business case has been accepted and approved by the Data Governance Steering Committee, the planning and design stages of Reference and Master Data can begin.

I enjoy this phase of evaluating the current level of trust in a data estate. It does depend on a complete Data Catalog with the appropriate metadata describing the data attributes and the business context (processes, systems, people) of how the data goes through its lifecycle. A comprehensive state reconstruction will provide with a great deal of background information about the data you need.

Figure 4. Data Quality and Trust starts at SOURCE

With the evaluation complete you can start applying the Trust Rules to determine which data sources will be appointed as System of Records (SOR). Only these SORs will be used to populate the System of Reference (Authoritative Source). Applying the Trust Rules will guide your Data Collection, Cleansing and Integration processes.

The data values will be collected and used according to the agreed Trust Rules.

Figure 5. Master Data Field Types and possible Implementation Styles

We should apply the Trust Rules to the different field types of Master Data:

  1. Identifiers

    1. Critical for Entity Resolution or Matching

    2. These fields represent the possible set of Business Keys to be used in the Master Data Resource Data Model

  2. Core

    1. The set of fields that will be used to support Business Decision-Making

  3. All

    1. All Master Data fields that could be used for Insights and Analytics

    2. These fields represent the current context for Transaction (Event) Data

 

I like to do the Trust Evaluation in an iterative manner, starting with the Identifiers, then the core and moving to all when required. All will continue to grow and develop as we add more data sources to our estate.

 

Now that we understand the SORs we can start collecting the Master Data and building the GOLDEN state that will be found in the System of Reference.

Figure 6. System Of... Terminology Clarification

The System of Reference is built according to the MDM Processing depicted below.

Figure 7. MDM Processing

During the Entity Resolution Stage, we need to decide whether or not we will merge or link. The merge, will apply Survivorship rules, to create the golden record. The Link approach, connects the records from the different source that relate to Entity Instance. Linking aims to recognise the connections between different source records.

Figure 8. Match, Merge, Link, & Survive

I personally prefer the MERGE approach as it provides a simplified accurate view of our resources. It does keep a history of original records that we used to establish the golden record for reversal purposes.

Figure 8. Rules that can be applied to complete the MERGE

The MERGE requires a well-defined set of Survivorship rules to achieve the golden record in the quickest possible time. If the rules are not able to able to complete the MERGE, the data stewards will need to make the final decision about which values win.

 

An example of applying the MDM process with MERGE is described below:

  1. Data Acquisition

Figure 9. Three different source records holding resource values located at the same address

 2. Data Cleansing

Figure 10. Address and Telephone attributes have been cleansed using Standard representations of the values

 3. Candidate Matching

Figure 11. Determine the set of Candidate Records that could be MERGED

We now face a challenge that we cannot choose which Candidate option is correct: ABC or XYZ

There is not enough information to know for sure. It may even be possible that all 3 records from the different sources represent the same resource. “john” or “jane” may have been entered incorrectly.

At this stage we need a Data Steward to resolve the collision.

 4. Entity Resolution

Figure 12. The Data Steward clarifies the Party ID of the two entities

After further investigation, the Data Stewards resolves the collision by allocating two Party IDs and saying that J. Smith is the same as Jane Smith due to the Telephone Number. The Data Stewards also resolves the fact that John Smith resides at the same location but was unable to collect John’s Telephone Number.

 

      4.1 Golden Record Merge

Figure 13. Two Golden Records with MERGED Values

Note that we have added Party ID to maintain and represent the unique set of golden records.

 

The MERGE process has not lost track of the Cleansing and Stewardship Decision. We need retain all the Source Records so that we could reverse the fact that Source ID: 234 is the same resource as the resource identified in source ID: 345.

This means that the original records from the different sources are not discarded, but they don’t not get published or involved the reverse ETL (standardising the source systems).

Figure 7. MDM Processing

In this example, we see the application of all the steps of MDM Processing

  1. Data Model Management

    1. The Core Elements of the Party Model

  2. Data Acquisition

    1. Three different data sources

  3. Data Validation, Standardisation & Enrichment

    1. Validation: Address Validation and Correction

    2. Standardisation: Address Standard Format

    3. Enrichment: Country Code added to Telephone Number

  4. Entity Resolution

    1. MATCH: Candidate IDs – ABC, XYZ

  5. Data Stewardship

    1. MERGE: Party ID 2 reflects the consolidated version of J. Smith and Jane Smith

 

Previous
Previous

DMBOK Revised Edition – Episode 2

Next
Next

Just the Facts - Information Modeling with Business Communication with Marco Wobben