CDMP Specialist RMD (Reference & Master Data) – Golden Rules
It is no use dragging your organisation through the Reference and Master Data gauntlet if you do not radically improve the quality of your data and simplify data sharing and integration.
The critical success factor to Master Data Management is keeping your data GOLDEN! If you can not achieve this level of reliability you are selling empty promises and dreams.
There has been lots of talk about the Golden Record, but there is a lot more to address:
Defining the Trust Rules for potential System of Record data sources.
Understanding the Golden Data Quality Expectations for your Golden Values (attributes)
Defining the appropriate set of Record Identifiers that are required to MATCH to the correct resource
Once you have matched the resource, you may need to MERGE multiple incoming records.
Or you may LINK instead of merge, and then your Survivorship challenges are much more straightforward.
If you aim to have a unique Golden Record for each instance, you must merge your incoming Golden Values correctly.
Your Survivorship rules and approach determine the correct merge process.
These seven golden principles will help you with more than just Reference and Master Data Management. It will help your Data Models, Data Quality monitoring and Data Integration & Interoperability.
Assuming that your business case has provided evidence of the challenges facing the business
and the business case has been accepted and approved by the Data Governance Steering Committee, the planning and design stages of Reference and Master Data can begin.
I enjoy this phase of evaluating the current level of trust in a data estate. It does depend on a complete Data Catalog with the appropriate metadata describing the data attributes and the business context (processes, systems, people) of how the data goes through its lifecycle. A comprehensive state reconstruction will provide with a great deal of background information about the data you need.
With the evaluation complete you can start applying the Trust Rules to determine which data sources will be appointed as System of Records (SOR). Only these SORs will be used to populate the System of Reference (Authoritative Source). Applying the Trust Rules will guide your Data Collection, Cleansing and Integration processes.
The data values will be collected and used according to the agreed Trust Rules.
We should apply the Trust Rules to the different field types of Master Data:
Identifiers
Critical for Entity Resolution or Matching
These fields represent the possible set of Business Keys to be used in the Master Data Resource Data Model
Core
The set of fields that will be used to support Business Decision-Making
All
All Master Data fields that could be used for Insights and Analytics
These fields represent the current context for Transaction (Event) Data
I like to do the Trust Evaluation in an iterative manner, starting with the Identifiers, then the core and moving to all when required. All will continue to grow and develop as we add more data sources to our estate.
Now that we understand the SORs we can start collecting the Master Data and building the GOLDEN state that will be found in the System of Reference.
The System of Reference is built according to the MDM Processing depicted below.
During the Entity Resolution Stage, we need to decide whether or not we will merge or link. The merge, will apply Survivorship rules, to create the golden record. The Link approach, connects the records from the different source that relate to Entity Instance. Linking aims to recognise the connections between different source records.
I personally prefer the MERGE approach as it provides a simplified accurate view of our resources. It does keep a history of original records that we used to establish the golden record for reversal purposes.
The MERGE requires a well-defined set of Survivorship rules to achieve the golden record in the quickest possible time. If the rules are not able to able to complete the MERGE, the data stewards will need to make the final decision about which values win.
An example of applying the MDM process with MERGE is described below:
Data Acquisition
2. Data Cleansing
3. Candidate Matching
We now face a challenge that we cannot choose which Candidate option is correct: ABC or XYZ
There is not enough information to know for sure. It may even be possible that all 3 records from the different sources represent the same resource. “john” or “jane” may have been entered incorrectly.
At this stage we need a Data Steward to resolve the collision.
4. Entity Resolution
After further investigation, the Data Stewards resolves the collision by allocating two Party IDs and saying that J. Smith is the same as Jane Smith due to the Telephone Number. The Data Stewards also resolves the fact that John Smith resides at the same location but was unable to collect John’s Telephone Number.
4.1 Golden Record Merge
Note that we have added Party ID to maintain and represent the unique set of golden records.
The MERGE process has not lost track of the Cleansing and Stewardship Decision. We need retain all the Source Records so that we could reverse the fact that Source ID: 234 is the same resource as the resource identified in source ID: 345.
This means that the original records from the different sources are not discarded, but they don’t not get published or involved the reverse ETL (standardising the source systems).
In this example, we see the application of all the steps of MDM Processing
Data Model Management
The Core Elements of the Party Model
Data Acquisition
Three different data sources
Data Validation, Standardisation & Enrichment
Validation: Address Validation and Correction
Standardisation: Address Standard Format
Enrichment: Country Code added to Telephone Number
Entity Resolution
MATCH: Candidate IDs – ABC, XYZ
Data Stewardship
MERGE: Party ID 2 reflects the consolidated version of J. Smith and Jane Smith