Exploring the different roles LLM can play in Semantic Modelling & Knowledge Graphs.

Executive Summary

This webinar delves into Large Language Models (LLMs) and Knowledge Graph Development, emphasising the significance of Semantic Data Modelling and the evolution of Knowledge Graphs in technology. Panos Alexopoulos explores the applications of Knowledge Graphs in data integration and machine learning, focusing on their use in natural language processing and information retrieval. Additionally, he shares on the challenges of vagueness in human thinking and its impact on data science, alongside the potential improvements of Semantic Modelling in knowledge representation. Lastly, Panos discusses the efficacy and usage of LLMs as knowledge developers while addressing the intersection of knowledge engineering and Artificial Intelligence, concluding with insights into upcoming courses and concepts in this dynamic field.

Webinar Details

Title: Exploring the different roles LLM can play in Semantic Modelling & Knowledge Graphs.

Date: 07 October 2024

Presenter: Panos Alexopoulos

Meetup Group: INs & OUTs of Data Modelling

Write-up Author: Howard Diesel

Contents

Executive Summary

Webinar Details

Role of Large Language Models (LLMs) in Knowledge Graph Development

Understanding Semantic Data Modelling

Understanding the Concept and Importance of Knowledge Graphs

The Evolution and Impact of Knowledge Graphs in Technology

The Role and Applications of Knowledge Graphs in Data Integration

The Use and Enhancement of Knowledge Graphs in Machine Learning Applications

Meaning Accuracy, Meaning Exclusivity, and Agreement in Knowledge Graphs

Building Knowledge Graphs and Ontology

The Challenge of Vagueness in Human Thinking and Its Impact on Data Science

Building and Maintaining Knowledge Graphs in Changing Domains

Value and Application of Ontologies

Ambiguity and Vagueness in Data Modelling and Knowledge Graph Development

LLMs and Their Application in Natural Language Processing

Knowledge Graph for Information Retrieval

The Differences between Knowledge Graphs and Learning Structures in Machine Learning

Deductive and Inductive Reasoning in Knowledge Graphs and Ontology

Natural Language Question & Answering and Knowledge Graph Development

LLM as Knowledge Providers for Knowledge Graph Development

Ambiguity in Knowledge Source Networks

Knowledge Modelling and Semantic Web Languages

Limitations and Potential Improvements of Semantic Modelling in Knowledge Representation

Meta-modelling and Knowledge Graph Modelling

Techniques of Information Extraction with Machine Learning Models

Relation Extraction in Machine Learning Models

Understanding the Efficacy and Usage of LLMs as Knowledge Developers

Knowledge Engineering and Artificial Intelligence: A Discussion on Upcoming Courses and Concepts

Closing Discussion on Artificial Intelligence

 Role of Large Language Models (LLMs) in Knowledge Graph Development

Panos Alexopoulos open the webinar and shares that he is a data and AI practitioner and educator. He mentions that the presentation focuses on the interplay between Knowledge Graphs, Semantic Models, and conceptual models with Large Language Models (LLMs). With over 15 years of experience as an ontologist, Panos currently heads the ontology team at Textkernel, a Dutch company specialising in software for analysing and matching people's profiles with job vacancies. His team is responsible for developing and maintaining a large Knowledge Graph that supports advanced machine-learning techniques and algorithms for the company's services. In addition to his professional work, Panos has been delivering courses for professionals in the field of data semantics and AI since 2018 and authored a book, Semantic Modeling for Data, focusing on practical advice for Semantic Modelling, aiming to address common pitfalls and push the limits of semantic representation of data.

About the Speaker Panos Alexopoulos

Figure 1 About the Speaker Panos Alexopoulos

Understanding Semantic Data Modelling

Semantic Data Modelling is an umbrella concept that involves developing descriptions and representations of data to accurately convey its meaning in a universally understood way. It aims to bridge gaps in understanding across teams, individuals, and organisations by defining terms and creating artefacts such as taxonomies, thesauri, ontologies, vocabularies, and Knowledge Graphs. The practice encompasses various methods for describing and representing data, including entity relationship models, to facilitate effective communication and data interpretation among humans and systems.

Semantic Data Modelling

Figure 2 Semantic Data Modelling

Understanding the Concept and Importance of Knowledge Graphs

Knowledge Graphs have been around for a long time despite the recent hype. They are essentially a rebranding of concepts such as semantic networks and knowledge bases from the 80s and 90s. In essence, Knowledge Graphs are interconnected entities described in an entity and concept-centric way rather than through traditional tables or data formats. The crucial aspect often overlooked is the need for semantic shareability, ensuring that both humans and systems can understand and share descriptions of data and domains. Google's Knowledge Graph, seen in search results as knowledge cards, is a prominent example. It presents structured information about entities, their relations, and attributes, demonstrating the practical application of Knowledge Graphs.

Figure 3 Knowledge Graph Definition

Example of Knowledge Graphs

Figure 4 Example of Knowledge Graphs

Bloomberg Knowledge Graph

Figure 5 Bloomberg Knowledge Graph

The Evolution and Impact of Knowledge Graphs in Technology

In addition to Google, numerous organisations are now developing Knowledge Graphs, including private, public, and governmental entities. One notable example is Bloomberg, a financial data service company that aims to provide valuable data to investors and financial professionals. Over the years, they have built a comprehensive Knowledge Graph containing information about companies, industries, people, geographical locations, products, and financial instruments. This Knowledge Graph consists of concepts, relations, and more, and while the term itself is not new, it has gained renewed attention. Although previously considered a high-tech concept, it is now viewed as a practical approach with its own advantages and disadvantages when handling data.

Gartner Hype Cycle for Artificial Intelligence

Figure 6 Gartner Hype Cycle for Artificial Intelligence

The Role and Applications of Knowledge Graphs in Data Integration

The use of Knowledge Graphs serves three main high-level purposes. Firstly, they provide a semantic layer to integrate heterogeneous data within an organisation, enabling uniform access. This integration process, facilitated by Knowledge Graphs, aims to create a common understanding of the data and can take considerable time due to its complexity. Additionally, Knowledge Graphs can be utilised as a virtual semantic layer to access data across different sources, transforming it into a logical model. Once the data is integrated using Knowledge Graphs, it allows for more accurate and valuable insights through data analytics, data science algorithms, and question-answering capabilities.

Uses of Knowledge Graphs

Figure 7 Uses of Knowledge Graphs

Knowledge Graph Dimensions

Figure 8 Knowledge Graph Dimensions

The Use and Enhancement of Knowledge Graphs in Machine Learning Applications

Utilising Knowledge Graphs alongside machine learning applications is crucial for capturing domain-specific knowledge that may not be present in the training data or captured by the algorithms. Knowledge Graphs provide a top-down approach to impart domain knowledge to machine learning systems, combining encyclopaedic and declarative knowledge with inductive learning. They are capable of addressing the black box problem in machine learning by offering explanations for decisions and facilitating easier troubleshooting. Furthermore, Knowledge Graphs enable the control of machine learning system outputs by enforcing explicit constraints and ensuring consistency with the Knowledge Graph. This approach helps mitigate issues such as model hallucinations and can enhance system performance.

Meaning Accuracy, Meaning Exclusivity, and Agreement in Knowledge Graphs

The crucial dimensions of a Knowledge Graph go beyond the technical aspects. It's about ensuring meaning, Accuracy, Explicitness, and Agreement. This means that Accuracy refers to the correctness and accuracy of each entity and relation in the Knowledge Graph to avoid incorrect information, similar to the "garbage in, garbage out" concept. Explicitness involves making artefacts understandable for both machines and human users, emphasising the importance of meaningful names and descriptions. Lastly, Agreement pertains to how widely accepted the meanings of entities and relations are among the users and systems utilising the graph.

Challenges of achieving Accuracy, Explicitness and Agreement

Figure 9 Challenges of achieving Accuracy, Explicitness and Agreement

Building Knowledge Graphs and Ontology

Panos discusses a problematic agreement encountered in the field of recruitment and professions when considering the reuse of a Knowledge Graph created by the European Commission. The issue arises from the Commission's classification of various professions as equivalent when they are not, leading to disagreements over definitions of roles such as data scientist and data analyst. Panos then emphasises the importance of defining the scope of intended agreement when building a Knowledge Graph and highlights the challenges posed by linguistic and semantic phenomena, such as ambiguity and multiple expressions for the same concept in human language and communication. It stresses the need for Knowledge Graphs to explicitly capture the ambiguity in a domain and encompass all possible meanings of important concepts.

Can I use an LLM instead of a Knowledge Graph?

Figure 10 “Can I use an LLM instead of a Knowledge Graph?”

The Challenge of Vagueness in Human Thinking and Its Impact on Data Science

Vagueness is a prevalent issue in human thinking, characterised by the lack of unique truth criteria for concepts and predicates. For instance, defining a "tall person" is challenging due to the absence of a universal threshold for height. This vagueness leads to disagreements and hinders the ability to reach agreements, especially when discussing job roles like that of a data scientist, where the essential skills and responsibilities can vary widely. Conceptual modellers, ontologists, and data specialists often grapple with this issue in their everyday work, striving to create clear and precise definitions and classifications.

Building and Maintaining Knowledge Graphs in Changing Domains

The issue of semantic change is a significant challenge in Knowledge Management, particularly in domains with high volatility. New concepts and ideas emerge frequently, leading to a shift in the meaning of existing terms over time. This dynamic nature of knowledge necessitates continuous maintenance of Knowledge Graphs to ensure their accuracy and relevance. Compounding this challenge is the presence of suboptimal development practices, where different teams may employ varying techniques and methodologies, resulting in disparate artefacts. This diversity in approaches poses difficulties in aligning and merging knowledge, emphasising the need for reconciliation and standardisation efforts.

Value and Application of Ontologies

During a conference in 2019, a person expressed strong criticism of a particular ontology's structure and development. The individual labelled the ontology as "useless", which raised concerns about the lack of standardised approaches in ontology development. Panos explains that while identifying problems in ontologies is common, the key consideration is how these issues impact the end application. He highlights the trade-off between completeness and precision in Knowledge Graphs, emphasising the challenge of achieving both at scale. Additionally, Panos relates this to evaluating the European Commission's project, which revealed that it didn't align with their specific needs but acknowledged that it could still be valuable for others.

Ambiguity and Vagueness in Data Modelling and Knowledge Graph Development

An attendee shares on their challenges of addressing "enemies" such as ambiguity and vagueness in their environment. Panos emphasises the importance of not eliminating these challenges but rather managing and handling them. He highlights the significance of detecting vagueness when creating a model. He discusses the people-centric nature of data modelling, stating that it is not just an engineering challenge but a people's challenge. Panos also mentions the difficulty of scaling semantics and raises questions about whether LLMs could replace the need to develop Knowledge Graphs.

LLMs and Their Application in Natural Language Processing

The Large Language Model (LLM) is a powerful machine learning model based on Transformers architecture, trained on massive internet text data to generate human-like text and understand human language. LLMs are used for tasks like text generation, classification, sentiment analysis, and summarisation. They are popular due to their simplicity of use, as users can input natural language prompts to get output. However, there is ongoing discussion about the effectiveness of prompt engineering and the scientific basis of LLM functionality. Despite these concerns, LLMs are currently valued for their practical applications rather than their scientific underpinnings.

LLMs Definition

Figure 11 LLMs Definition

LLM Prompting

Figure 12 LLM Prompting

LLMs are Bad at Knowledge Providing

Figure 13 LLMs are Bad at Knowledge Providing

Knowledge Graph for Information Retrieval

The limitations of using LLMs instead of Knowledge Graphs are evident due to LLMs' lack of proficiency in providing accurate knowledge. LLMs were not designed for information retrieval or database querying but rather for generating probabilistic text, leading to potential issues like hallucination. As an example, Panos had asked chat GPT to provide a list of books published by O'Reilly or another publisher about data engineering. The LLM inaccurately listed books, including one with incorrect title and author information. This highlights the unreliability of LLMs in providing accurate information.

LLMs vs. Knowledge Graphs

Figure 14 LLMs vs. Knowledge Graphs

Can LLMs help with the Development of a Knowledge Graph

Figure 15 “Can LLMs help with the Development of a Knowledge Graph?”

The Differences between Knowledge Graphs and Learning Structures in Machine Learning

The differences between Knowledge Graphs and Large Language Models (LLM) lie in their strengths and weaknesses. Knowledge Graphs contain structural, explicit knowledge represented by symbols and descriptions, while LLMs consist of numerical weights in a neural network. Interacting with a Knowledge Graph provides clear knowledge based on its content, whereas querying an LLM may result in made-up responses. Inaccuracies in a Knowledge Graph stem from the content and its connections to the ontology, while LLM inaccuracies arise from the inference process. In conclusion, issues in a Knowledge Graph are content-related, whereas LLM issues are connected to both content and inference.

Deductive and Inductive Reasoning in Knowledge Graphs and Ontology

In Knowledge Graphs and ontologies, deductive reasoning is the primary form of logic. It operates on the principle that if a premise is true and the rule is correct, the conclusion is also true. This is illustrated in the classic example: 'All humans are mortal; Socrates is a human; therefore, Socrates is mortal.' However, in machine learning models, reasoning is not always deductive; it can also be inductive or abductive, leading to answers with varying levels of confidence. While Knowledge Graphs offer transparency and interpretability, they are not well-suited for understanding language, as they are optimized for conceptual knowledge and facts.

Natural Language Question & Answering and Knowledge Graph Development

LLMs are transforming the way we engage with data. A significant application of LLMs is in natural language question answering, enabling users to ask questions in everyday language rather than crafting complex SQL queries. LLMs excel at capturing linguistic patterns, enhancing the interpretation of natural language text. While LLMs cannot entirely replace Knowledge Graphs, they can streamline and scale the development of Knowledge Graphs, a traditionally challenging task based on domain and scope.

Three Potential Roles for LLMs

Figure 16 Three Potential Roles for LLMs

LLM as Knowledge Providers for Knowledge Graph Development

Panos discusses three main roles an LLM can play in developing a Knowledge Graph. The first role involves using an LLM as a direct factual and domain knowledge source. The second role is knowledge modelling or Semantic Modelling, which entails transforming requirements about knowledge representation into formal representations using natural language. The third role is knowledge mining, where an LLM is used to extract information from text and add it to the Knowledge Graph. Panos also addresses the challenges with using LLMs, such as the risk of unreliable or hallucinated facts due to overfitting, bias in training data, and conflicting training data.

Practise using LLMs as Knowledge Provider

Figure 17 Practise using LLMs as Knowledge Provider

LLMs might Hallucinate Information into Existence

Figure 18 LLMs might Hallucinate Information into Existence

The Hallucination Problem

Figure 19 The Hallucination Problem

Ambiguity in Knowledge Source Networks

The issue of ambiguity in an LLM, particularly in answering questions with multiple possible answers, is discussed. Panos creates an experiment where he provides additional context and clarifications to help the language model understand and handle ambiguity better. It was found that the LLM struggles to detect and address ambiguity effectively without explicit guidance. Panos thus recommends cross-referencing with other reliable sources to verify the accuracy of the information provided by the LLM.

LLMs and Ambiguity Experiment One

Figure 20 LLMs and Ambiguity Experiment One

LLMs and Ambiguity Experiment Two

Figure 21 LLMs and Ambiguity Experiment Two

LLMs and Ambiguity Experiment Three

Figure 22 LLMs and Ambiguity Experiment Three

LLMs and Ambiguity Experiment Four

Figure 23 LLMs and Ambiguity Experiment Four

LLMs are unable to Detect Ambiguity without the Adequate Context

Figure 24 LLMs are unable to Detect Ambiguity without the Adequate Context

The challenge of Accessing an LLM’s Factual and Domain Knowledge

Figure 25 The challenge of Accessing an LLM’s Factual and Domain Knowledge

When using an LLM have reliable information at hand to double-check

Figure 26 When using an LLM have reliable information at hand to double-check

Knowledge Modelling and Semantic Web Languages

Panos then discusses an evaluation of an LLM and its ability to transform competency questions into a Knowledge Graph. He highlights the LLM's success in modelling basic questions about movie directors and actors but points out its failure in handling a more complex example involving different types of clients for a company. The evaluation emphasises the importance of accurate naming and proper subclassing in knowledge modelling. Additionally, the discussion briefly touches on the significance of glossaries and taxonomies in defining information for LLMs and the role of learning in this context.

LLMs as Knowledge Modellers

Figure 27 LLMs as Knowledge Modellers

Demonstration of Transformation into OWL

Figure 28 Demonstration of Transformation into OWL

LLMs struggle with Semantics

Figure 29 LLMs struggle with Semantics

Results of LLMs and Formal Semantics

Figure 30 Results of LLMs and Formal Semantics

Limitations and Potential Improvements of Semantic Modelling in Knowledge Representation

The challenges of using LLMs to build an ontology that meets specific requirements are discussed. Panos highlights issues with the LLM's ability to generate accurate formal representations based on given prompts, particularly in capturing definitions and creating correct relationships between concepts. Additionally, he points out discrepancies in translating hierarchical taxonomies into formal Semantic Models, emphasising the LLM's limitations in differentiating between classes and individual entities. The example reflects the LLM's efficacy in knowledge representation and Semantic Modelling, revealing areas where it struggles to produce accurate and reliable results.

Creating a Location Hierarchy in SKOS

Figure 31 Creating a Location Hierarchy in SKOS

Creating a Location Hierarchy in OWL

Figure 32 Creating a Location Hierarchy in OWL

Meta-modelling and Knowledge Graph Modelling

LLM suffer from limitations when trying to understand the formal semantics of meta-modelling. Panos highlights that while an LLM can assist in Knowledge Graph modelling, it may not fully comprehend the semantics of the modelling language. Additionally, Panos points out that the LLM has encountered various examples of modelling from the data it has been trained on, some of which are good and some bad. He also mentions that there are conceptual problems with ontologies and Knowledge Graphs related to classes and individuals. It refers to the LLM's role in identifying minors and underscores the importance of considering these limitations when utilising LLM for modelling tasks.

Another challenge for the LLM

Figure 33 Another challenge for the LLM

LLM struggle with Generating OWL

Figure 34 LLM struggle with Generating OWL

Techniques of Information Extraction with Machine Learning Models

Three main approaches may be employed when utilising LLMs for information extraction. The first technique, zero-shot learning, involves prompting the LLM without examples and leveraging its existing reasoning capabilities. The second approach, few-shot learning, entails providing the LLM with both good and bad examples of a task to enhance its understanding. The third and most advanced approach involves building a custom dataset with positive and negative examples for the extraction task and fine-tuning the LLM. However, it's important to note that the LLM may combine prior knowledge with information from the text, leading to potential discrepancies in the extracted data. Therefore, carefully considering the application and scenario is essential when employing LLM for information extraction.

Approaches to Prompting LLMs

Figure 35 Approaches to Prompting LLMs

Zero-shot Entity Extraction

Figure 36 Zero-shot Entity Extraction

Zero-shot Entity Extraction Two

Figure 37 Zero-shot Entity Extraction Two

Zero-Shot Entity Extraction Three

Figure 38 Zero-Shot Entity Extraction Three

Relation Extraction in Machine Learning Models

Panos has also experimented with relation extraction and found that machine learning models often struggle with handling linguistic phenomena such as negation and uncertainty. He has tested the model's ability to understand uncertain statements and found that it performed well only when provided with clear and detailed instructional examples. The conclusion was that while the machine learning model can be a competent knowledge miner, it requires specific and detailed input to extract relations and accurately handle semantic phenomena. Panos thus emphasises the need for proper evaluation data to assess the model's performance across different relations and semantic phenomena.

LLMs Combine Factual Knowledge they have with Information in Ingested Text

Figure 39 LLMs Combine Factual Knowledge they have with Information in Ingested Text

Zero-Shot Relation Extraction

Figure 40 Zero-Shot Relation Extraction

Few-Shot Extraction

Figure 41 Few-Shot Extraction

LLM as a Knowledge Miner with Clear and Detailed Instructions

Figure 42 LLM as a Knowledge Miner with Clear and Detailed Instructions

Understanding the Efficacy and Usage of LLMs as Knowledge Developers

When using LLMs for knowledge development, it's important to consider their limitations. LLMs can provide structured examples, but they lack a deep understanding of semantics and may not be reliable knowledge providers. Panos recommends the use of LLMs in conjunction with other knowledge sources and the practice of carefully evaluating their output. Additionally, LLMs can be valuable for accelerating the modelling process, but they require thorough inspection and fine-tuning for specific tasks. While LLMs can offer clear examples, traditional approaches may be more efficient for certain information extraction tasks. Therefore, it is essential to use LLMs judiciously and supplement their output with human expertise in knowledge engineering and Semantic Modelling.

LLMs as Knowledge Graph Developers

Figure 43 LLMs as Knowledge Graph Developers

Knowledge Engineering and Artificial Intelligence: A Discussion on Upcoming Courses and Concepts

Panos shares two upcoming courses. The first, "Knowledge Graphs & Large Languages Models Bootcamp," is free for existing members of the O'Reilly platform. The second, "Ontology Engineering Strategies and Solutions," will take place in November and delves into conceptual modelling challenges. Additionally, Panos addresses the relationship between LLMs and Artificial Intelligence (AI), explaining that LLMs fall under generative AI and are trained in natural language.

Knowledge Graphs & Large Language Models Bootcamp with Panos Alexopoulos

Figure 44 Knowledge Graphs & Large Language Models Bootcamp with Panos Alexopoulos

Ontology Engineering Strategies & Solutions with Panos Alexopoulos

Figure 45 Ontology Engineering Strategies & Solutions with Panos Alexopoulos

Closing Slide and Contact Details

Figure 46 Closing Slide and Contact Details

Closing Discussion on Artificial Intelligence

Panos and attendees then discuss various topics related to Artificial Intelligence (AI), including its various techniques and applications. Panos shares that AI is a broad term that encompasses different techniques such as Knowledge Graph, machine learning, and reinforcement learning. An attendee touches on this, shares their challenges of trusting information, and expresses interest in training courses related to AI.

If you want to receive the recording, kindly contact Debbie (social@modelwaresystems.com)

Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!

Previous
Previous

Mastering Legal and Ethical AI Compliance: Europe and the World

Next
Next

Ethics & Data It's about People with Daragh O Brien