Working with Data in South Africa Challenges and Opportunities with Daan Steenkamp

Executive Summary

‘Working with Data in South Africa Challenges and Opportunities with Daan Steenkamp’ highlights the importance of data management, the value of intangible assets, and the significance of reliable and timely data in decision-making. Dr Daan Steenkamp focuses on South Africa's data ecosystem, its comparison with global best practices, and the challenges and opportunities in data management and open science. He emphasises the importance of data maturity and infrastructure, data sharing, governance, privacy concerns, and utilising alternative data for real-time understanding. The webinar discusses metadata implementation in time series data, community-driven data collaboration, and basic data management.

Webinar Details

Title; Working with Data in South Africa Challenges and Opportunities with Daan Steenkamp

Date: 22 April 2024

Presenter: Dr Daan Steenkamp

Meetup Group:  

Write-up Author: Howard Diesel

Contents

Data Management and Value of Intangible Assets.

The Importance of Data in Emerging Markets.

The Importance of Reliable and Timely Data in Decision-Making.

South Africa's Data Ecosystem.

Comparison of South Africa's Ecosystem to Global Best Practices.

Open Data Policies and Availability in South Africa.

Data Management and Open Data Policy.

Data Prioritization and Governance in South Africa.

Data Sensitivity and Confidentiality in South Africa.

Challenges and Opportunities in Data Management and Open Science.

The Democratization of Data in South Africa.

Importance of Data Maturity and Infrastructure.

Data Sharing and Infrastructure in South Africa

Data Governance and Privacy Concerns.

Data Management Community Collaboration.

Utilising Alternative Data for Real-Time Understanding.

Basic Data Management

Data Management Challenges and Solutions.

Implementing Metadata in Time Series Data.

Community-Driven Data Collaboration.

Data Management and Value of Intangible Assets

Dr Daan Steenkamp presents updated data on the value of intangible assets in various countries, including Saudi Arabia, Central Africa, the US, and South Africa. The comparison of intangible asset value between the US and South Africa was alarming, indicating a challenge for data management professionals. Understanding the value of data as an asset is crucial for making strategic decisions and monetising data in both the public and private sectors. However, the lack of growth in intangible assets in South Africa compared to major economies should be a major concern, indicating a failure to leverage technology and falling further behind. The conference also included a discussion on the data ecosystem in South Africa, comparing it to both advanced economies and other developing nations.

The Importance of Data in Emerging Markets

In a discussion, Daan focuses on the data gaps and regulatory challenges faced by both the public and private sectors in Emerging Markets, with a specific emphasis on South Africa. The availability of public domain data is essential for practitioners in the private sector as it serves as a source of information and benchmarking. Still, the regulatory environment also plays a crucial role in creating trust and enabling data sharing. Along with Aiden, Daan represents CODERA and advocates for democratising data in South Africa by emphasising the value of public domain data in combination with proprietary information. Daan also mentions that his platform, Econ Data, brings together economic, financial, and socioeconomic data for South Africa, which is currently not easily accessible or usable.

Figure 1 The Importance of Data in Emerging Markets

Figure 1 The Importance of Data in Emerging Markets

The Importance of Reliable and Timely Data in Decision-Making

EconData is a platform that provides free access to public domain data, allowing researchers to conduct historical analysis and automate workflows. It is built on an SDMX information model, aligning with international best practices for data management. It aims to eliminate the need for replication of work by providing easy-to-use, highly governed, and validated data for the community. The platform measures ROI at an institutional level, considering specific outcomes that must be achieved through public domain data and models to generate decision analytics. The reliable and timely data provided by EconData is crucial for evidence-based decision-making in the public sector and for informing business decisions in the private sector.

South Africa's Data Ecosystem

Access to reliable data is crucial for innovation and success in the private sector. Studies show that evidence-based policy and better data standards can positively impact the economy. In South Africa, the data ecosystem comprises three dimensions: availability, quality, and regulatory comparison. However, data practitioners in the country face both challenges and opportunities. It is important to introduce the FAIR principles for data management and public policy to address these issues. These principles emphasise that data should be findable, accessible, interoperable, and reusable. By following these guidelines, individuals and organisations can ensure that data is used effectively and efficiently to drive innovation and value.

Figure 2 South Africa's Data Ecosystem - Data Matters

Figure 2 South Africa's Data Ecosystem - Data Matters

Figure 3 FAIR Principles for data governance & sharing

Figure 3 FAIR Principles for data governance & sharing

Comparison of South Africa's Ecosystem to Global Best Practices

According to the Global Data Barometer survey, South Africa is recognised internationally for its budget transparency, detailed budget data availability, and frameworks for protecting citizens' data rights. The country's public finance data ranks high by global standards for its detail, timeliness, and support for governance in the public sector. The World Bank's cross-country socioeconomic data is used to benchmark data availability, with South Africa's ranking affected by its relative wealth compared to other countries.

Figure 4 Vital Statistics & Data Protection

Figure 4 Vital Statistics & Data Protection

Figure 5 Public Finance Data

Figure 5 Public Finance Data

Figure 6 Availability indicators

Figure 6 Availability indicators

Open Data Policies and Availability in South Africa

South Africa has an extensive range of data series and concepts available but struggles with the timeliness of the data. Accessing timely data is crucial for making informed comparisons and decisions. However, South Africa ranks lower internationally regarding open data policies and initiatives in both the public and private sectors, which can limit data availability. This issue is particularly concerning as many public institutions with a mandate to make their information public do not do so, impacting accountability and decision-making in both sectors.

Figure 7 Weak on open data

Figure 7 Weak on open data

Data Management and Open Data Policy

Several countries have established a coordination mechanism to ensure minimum standards and rules for data management in the public and private sectors. However, South Africa's civil services capabilities and data management performance are poor, with zero scores in areas such as open data policy, data sharing governance, rules around accessibility coverage, and data access. Many emerging markets with similar or fewer resources score higher than South Africa, indicating a lack of coordination and prioritisation. Senior policy institutions in South Africa should focus on this issue, particularly private intent data, to address public and private data use.

Figure 8 Frameworks and accessibility

Figure 8 Frameworks and accessibility

Figure 9 Regulatory frameworks

Figure 9 Regulatory frameworks

Data Prioritization and Governance in South Africa

Daan raises the issue of data prioritisation in South Africa, particularly in relation to the country's ability to effectively utilise and maintain data for future use. While South Africa's spending levels on data are favourable compared to larger countries, the lack of machine readability hinders automation and the use of advanced computing methods. Additionally, the lack of machine readability also makes it difficult to maintain data for long-term use, hindering the ability to revisit and analyse past data. The speaker emphasises the need to prioritise data governance and address regulatory environments to enable data use for market players while safeguarding the rights of data subjects.

Figure 10 Data Prioritization and Governance in South Africa

Figure 10 Data Prioritization and Governance in South Africa

Figure 11 Data Prioritization and Governance in South Africa continued

Figure 11 Data Prioritization and Governance in South Africa continued

Figure 12 Data Prioritization and Governance in South Africa continued

Figure 12 Data Prioritization and Governance in South Africa continued

Data Sensitivity and Confidentiality in South Africa

South Africa's regulatory environment for data use aligns with international best practices, but it scores poorly on data sensitivity and confidentiality, particularly on safeguards. Safeguards are necessary to protect data subjects and promote trust and governance, while an enabling environment for data use allows exploration and building. The focus on safeguards and an enabling environment is crucial for taking advantage of the fourth industrial revolution and promoting appropriate data sharing and use.

Figure 13 Safeguards and Enablers

Figure 13 Safeguards and Enablers

Figure 14 Details of Safeguards and Enablers

Figure 14 Details of Safeguards and Enablers

Challenges and Opportunities in Data Management and Open Science

Access to public domain data in South Africa has been hindered by several factors, including public institutions not responding to access requests or hiding behind confidentiality. There is also a challenge around public sector institutions' capacity to create and manage statistics from their data. However, private intent data, such as digital IDs, anonymised data, and portability rights, can enable data to be shared and cleaned in a responsible way. Open science is an important enabler of open data and is crucial for replicating results across domains. Despite this, South Africa scores low regarding regulatory issues related to public and private data, indicating a need to align with international best practices in data management.

Figure 15 Challenges and Opportunities in Data Management and Open Science

Figure 15 Challenges and Opportunities in Data Management and Open Science

The Democratization of Data in South Africa

Daan notes that In South Africa, there is a need for appropriate safeguards to protect consumers' identities and data, considering recent large-scale identity hacks and data breaches. Despite gaps in data collection and limited information on cross-border transactions, ongoing efforts by organisations like the World Bank and IMF are being made to address these data gaps and improve accessibility. The democratisation of data in South Africa involves security, governance, literacy, infrastructure, and culture, which are all essential for making data accessible and usable within appropriate rules and regulations. Security protocols and governance are crucial for balancing data usability and discoverability while maintaining adequate access controls and privacy protection measures.

Figure 16 The Democratization of Data in South Africa

Figure 16 The Democratization of Data in South Africa

Importance of Data Maturity and Infrastructure

Data governance is essential for ensuring appropriate and effective utilisation of data. Data maturity is crucial to data governance, which involves attaching metadata to data, understanding data concepts, identifying data owners, and classifying sensitive data. Best practice policies and technology can reduce the burden of data governance and enable responsible data use. Unfortunately, many jurisdictions, including South Africa, lack clear rules and responsibilities for data requests and access. Data stewards in the public sector must balance data governance, security, and access and keep up with technological advancements and best practices in data management. Infrastructure plays a vital role in data accessibility and usability, and technological advancements have enabled institutions to progress, but various considerations are still important. Data users must be able to discover available data, understand data concepts, and access data in different forms using various software. Overall, data maturity and infrastructure advancements are ongoing, with potential for further improvements.

Data Sharing and Infrastructure in South Africa

The value of data is increasingly recognised in emerging markets, particularly in the public sector, where evidence-based policy is prioritised. Executive-level commitment to tracking information over time, bottom-up buy-in for data sharing, and open data sharing are crucial. However, true commitment to open data sharing is missing in South Africa. Legislative reform is in progress in the form of the establishment of a cloud policy for government. It is important to recognise the rights of data citizens and mandate sharing by public institutions. National data standards and templates for data sharing need to be established. Celebrating successful use cases can build momentum and demonstrate the value of appropriate data sharing. As seen in many international use cases, universities can play a key role in data sharing and infrastructure development.

Figure 17 Opportunities

Figure 17 Opportunities

Figure 18 Opportunities details

Figure 18 Opportunities details

Data Governance and Privacy Concerns

Daan notes a growing emphasis on creating a coordination mechanism between major public sector institutions such as the Reserve Bank, national treasury, statistical agency, and information regulator to work together in creating guidelines, data agreements, and standardisation of protocols across the industry. However, the debate remains on whether statistical agencies should make more detailed surveys available to enable regional views of data, such as GDP differences between cities like Cape Town and Durban. Spatialising information raises privacy concerns and challenges around data provision and identification of providers. Technology is making it easier to solve challenges related to data combination, both in the public domain and private domain, enabling researchers and corporations to understand and make good business decisions. Metadata is becoming as important as the data itself, and sharing metadata is critical for proper understanding and conditions. Lastly, banks have been forced to share open data, reflecting a shift in attitudes towards data sharing and privacy concerns.

Data Management Community Collaboration

A call for private sector involvement in sharing processes for the country's benefit is highlighted. Challenges in the public and private sectors regarding data management are discussed, with a proposal to form work parties to tackle these challenges collaboratively. Andrew suggests establishing work groups to create a data-sharing hub with relevant metadata. Examples of successful data initiatives, including spatial tax data and partnerships with other countries, are also presented.

Utilising Alternative Data for Real-Time Understanding

A collaborative role of the private and public sectors in sourcing and pooling data can lead to accurate forecasting. Aiden notes that the challenges in aggregating and anonymising sensitive data require managerial-level expertise in data engineering. Additionally, the importance of data management not being executed or owned by government institutions can lead to outsourced projects with limited ownership.

Basic Data Management

Basic data management is crucial for the success of organisations and governments, especially before adopting advanced technologies like AI and machine learning. The lack of ownership is a common problem in organisations, resulting in the absence of a proper data glossary. Consultants doing data management as a one-time task is risky, and maintaining processes, standards, and data quality is crucial. Basic data management involves technological, internal culture, and strategic challenges, and in-house capabilities are necessary. The availability of technologies makes data management easier, but best practices, appropriate metadata, and classification are essential for success. Therefore, there seems to be a lack of appreciation and awareness of the importance of basic data management, leading to problems. The government's 4IR strategy promises job opportunities and economic growth, but basic data management should be a priority for its success.

Data Management Challenges and Solutions

Data management poses several challenges, such as deciding between centralizing data or using a Federated system to achieve business outcomes. A cross-departmental and strategic focus is essential to prioritise data management and budget. Howard notes that the Italian government has developed a system called Istat, which includes a centralised data catalogue and real-time data quality assessment. Istat's DQ framework provides real-time responses and allows users to request the highest quality data in specific dimensions or concepts. With limited resources, the SDMX program in Mexico has successfully implemented data governance, validation, and quality assurance. At the same time, Brazil has also adopted similar initiatives that have had a significant impact on society and the data community. Although initially focused on getting data into the system, SDMX has shifted its focus to metadata and definitions to improve the speed at which data can be managed.

Implementing Metadata in Time Series Data

Implementing metadata in time series data can be achieved strategically, even with limited resources. It is essential to collaborate with domain experts to capture metadata accurately. While creating metadata structures takes time, automation through software can reduce the day-to-day data management burden. Once software is in place, automating ingestion and validation processes can streamline metadata management. Using executable environments like SDMX can simplify the integration of metadata and data for a more efficient data management process.

Community-Driven Data Collaboration

A community-driven platform, Econ data, has been developed to add more public domain data and improve accessibility. The platform emphasizes collaborative effort, working with others to enhance the data available. Daan and Aiden put together the project for whom acknowledgement and appreciation are expressed. Commitment has been made to spending more time understanding and improving the platform for the benefit of the community.

If you want to receive the recording, kindly contact Debbie (social@modelwaresystems.com)

Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!

Previous
Previous

Business Measurement Sophistication

Next
Next

Riskiest Risks - Data Protection for DM Professionals