Exploring the different roles LLM can play in Semantic Modelling & Knowledge Graphs.
Executive Summary
This webinar delves into Large Language Models (LLMs) and Knowledge Graph Development, emphasising the significance of Semantic Data Modelling and the evolution of Knowledge Graphs in technology. Panos Alexopoulos explores the applications of Knowledge Graphs in data integration and machine learning, focusing on their use in natural language processing and information retrieval. Additionally, he shares on the challenges of vagueness in human thinking and its impact on data science, alongside the potential improvements of Semantic Modelling in knowledge representation. Lastly, Panos discusses the efficacy and usage of LLMs as knowledge developers while addressing the intersection of knowledge engineering and Artificial Intelligence, concluding with insights into upcoming courses and concepts in this dynamic field.
Webinar Details
Title: Exploring the different roles LLM can play in Semantic Modelling & Knowledge Graphs.
Date: 07 October 2024
Presenter: Panos Alexopoulos
Meetup Group: INs & OUTs of Data Modelling
Write-up Author: Howard Diesel
Contents
Executive Summary
Webinar Details
Role of Large Language Models (LLMs) in Knowledge Graph Development
Understanding Semantic Data Modelling
Understanding the Concept and Importance of Knowledge Graphs
The Evolution and Impact of Knowledge Graphs in Technology
The Role and Applications of Knowledge Graphs in Data Integration
The Use and Enhancement of Knowledge Graphs in Machine Learning Applications
Meaning Accuracy, Meaning Exclusivity, and Agreement in Knowledge Graphs
Building Knowledge Graphs and Ontology
The Challenge of Vagueness in Human Thinking and Its Impact on Data Science
Building and Maintaining Knowledge Graphs in Changing Domains
Value and Application of Ontologies
Ambiguity and Vagueness in Data Modelling and Knowledge Graph Development
LLMs and Their Application in Natural Language Processing
Knowledge Graph for Information Retrieval
The Differences between Knowledge Graphs and Learning Structures in Machine Learning
Deductive and Inductive Reasoning in Knowledge Graphs and Ontology
Natural Language Question & Answering and Knowledge Graph Development
LLM as Knowledge Providers for Knowledge Graph Development
Ambiguity in Knowledge Source Networks
Knowledge Modelling and Semantic Web Languages
Limitations and Potential Improvements of Semantic Modelling in Knowledge Representation
Meta-modelling and Knowledge Graph Modelling
Techniques of Information Extraction with Machine Learning Models
Relation Extraction in Machine Learning Models
Understanding the Efficacy and Usage of LLMs as Knowledge Developers
Knowledge Engineering and Artificial Intelligence: A Discussion on Upcoming Courses and Concepts
Closing Discussion on Artificial Intelligence
Role of Large Language Models (LLMs) in Knowledge Graph Development
Panos Alexopoulos open the webinar and shares that he is a data and AI practitioner and educator. He mentions that the presentation focuses on the interplay between Knowledge Graphs, Semantic Models, and conceptual models with Large Language Models (LLMs). With over 15 years of experience as an ontologist, Panos currently heads the ontology team at Textkernel, a Dutch company specialising in software for analysing and matching people's profiles with job vacancies. His team is responsible for developing and maintaining a large Knowledge Graph that supports advanced machine-learning techniques and algorithms for the company's services. In addition to his professional work, Panos has been delivering courses for professionals in the field of data semantics and AI since 2018 and authored a book, Semantic Modeling for Data, focusing on practical advice for Semantic Modelling, aiming to address common pitfalls and push the limits of semantic representation of data.
Understanding Semantic Data Modelling
Semantic Data Modelling is an umbrella concept that involves developing descriptions and representations of data to accurately convey its meaning in a universally understood way. It aims to bridge gaps in understanding across teams, individuals, and organisations by defining terms and creating artefacts such as taxonomies, thesauri, ontologies, vocabularies, and Knowledge Graphs. The practice encompasses various methods for describing and representing data, including entity relationship models, to facilitate effective communication and data interpretation among humans and systems.
Understanding the Concept and Importance of Knowledge Graphs
Knowledge Graphs have been around for a long time despite the recent hype. They are essentially a rebranding of concepts such as semantic networks and knowledge bases from the 80s and 90s. In essence, Knowledge Graphs are interconnected entities described in an entity and concept-centric way rather than through traditional tables or data formats. The crucial aspect often overlooked is the need for semantic shareability, ensuring that both humans and systems can understand and share descriptions of data and domains. Google's Knowledge Graph, seen in search results as knowledge cards, is a prominent example. It presents structured information about entities, their relations, and attributes, demonstrating the practical application of Knowledge Graphs.
The Evolution and Impact of Knowledge Graphs in Technology
In addition to Google, numerous organisations are now developing Knowledge Graphs, including private, public, and governmental entities. One notable example is Bloomberg, a financial data service company that aims to provide valuable data to investors and financial professionals. Over the years, they have built a comprehensive Knowledge Graph containing information about companies, industries, people, geographical locations, products, and financial instruments. This Knowledge Graph consists of concepts, relations, and more, and while the term itself is not new, it has gained renewed attention. Although previously considered a high-tech concept, it is now viewed as a practical approach with its own advantages and disadvantages when handling data.
The Role and Applications of Knowledge Graphs in Data Integration
The use of Knowledge Graphs serves three main high-level purposes. Firstly, they provide a semantic layer to integrate heterogeneous data within an organisation, enabling uniform access. This integration process, facilitated by Knowledge Graphs, aims to create a common understanding of the data and can take considerable time due to its complexity. Additionally, Knowledge Graphs can be utilised as a virtual semantic layer to access data across different sources, transforming it into a logical model. Once the data is integrated using Knowledge Graphs, it allows for more accurate and valuable insights through data analytics, data science algorithms, and question-answering capabilities.
The Use and Enhancement of Knowledge Graphs in Machine Learning Applications
Utilising Knowledge Graphs alongside machine learning applications is crucial for capturing domain-specific knowledge that may not be present in the training data or captured by the algorithms. Knowledge Graphs provide a top-down approach to impart domain knowledge to machine learning systems, combining encyclopaedic and declarative knowledge with inductive learning. They are capable of addressing the black box problem in machine learning by offering explanations for decisions and facilitating easier troubleshooting. Furthermore, Knowledge Graphs enable the control of machine learning system outputs by enforcing explicit constraints and ensuring consistency with the Knowledge Graph. This approach helps mitigate issues such as model hallucinations and can enhance system performance.
Meaning Accuracy, Meaning Exclusivity, and Agreement in Knowledge Graphs
The crucial dimensions of a Knowledge Graph go beyond the technical aspects. It's about ensuring meaning, Accuracy, Explicitness, and Agreement. This means that Accuracy refers to the correctness and accuracy of each entity and relation in the Knowledge Graph to avoid incorrect information, similar to the "garbage in, garbage out" concept. Explicitness involves making artefacts understandable for both machines and human users, emphasising the importance of meaningful names and descriptions. Lastly, Agreement pertains to how widely accepted the meanings of entities and relations are among the users and systems utilising the graph.
Building Knowledge Graphs and Ontology
Panos discusses a problematic agreement encountered in the field of recruitment and professions when considering the reuse of a Knowledge Graph created by the European Commission. The issue arises from the Commission's classification of various professions as equivalent when they are not, leading to disagreements over definitions of roles such as data scientist and data analyst. Panos then emphasises the importance of defining the scope of intended agreement when building a Knowledge Graph and highlights the challenges posed by linguistic and semantic phenomena, such as ambiguity and multiple expressions for the same concept in human language and communication. It stresses the need for Knowledge Graphs to explicitly capture the ambiguity in a domain and encompass all possible meanings of important concepts.
The Challenge of Vagueness in Human Thinking and Its Impact on Data Science
Vagueness is a prevalent issue in human thinking, characterised by the lack of unique truth criteria for concepts and predicates. For instance, defining a "tall person" is challenging due to the absence of a universal threshold for height. This vagueness leads to disagreements and hinders the ability to reach agreements, especially when discussing job roles like that of a data scientist, where the essential skills and responsibilities can vary widely. Conceptual modellers, ontologists, and data specialists often grapple with this issue in their everyday work, striving to create clear and precise definitions and classifications.
Building and Maintaining Knowledge Graphs in Changing Domains
The issue of semantic change is a significant challenge in Knowledge Management, particularly in domains with high volatility. New concepts and ideas emerge frequently, leading to a shift in the meaning of existing terms over time. This dynamic nature of knowledge necessitates continuous maintenance of Knowledge Graphs to ensure their accuracy and relevance. Compounding this challenge is the presence of suboptimal development practices, where different teams may employ varying techniques and methodologies, resulting in disparate artefacts. This diversity in approaches poses difficulties in aligning and merging knowledge, emphasising the need for reconciliation and standardisation efforts.
Value and Application of Ontologies
During a conference in 2019, a person expressed strong criticism of a particular ontology's structure and development. The individual labelled the ontology as "useless", which raised concerns about the lack of standardised approaches in ontology development. Panos explains that while identifying problems in ontologies is common, the key consideration is how these issues impact the end application. He highlights the trade-off between completeness and precision in Knowledge Graphs, emphasising the challenge of achieving both at scale. Additionally, Panos relates this to evaluating the European Commission's project, which revealed that it didn't align with their specific needs but acknowledged that it could still be valuable for others.
Ambiguity and Vagueness in Data Modelling and Knowledge Graph Development
An attendee shares on their challenges of addressing "enemies" such as ambiguity and vagueness in their environment. Panos emphasises the importance of not eliminating these challenges but rather managing and handling them. He highlights the significance of detecting vagueness when creating a model. He discusses the people-centric nature of data modelling, stating that it is not just an engineering challenge but a people's challenge. Panos also mentions the difficulty of scaling semantics and raises questions about whether LLMs could replace the need to develop Knowledge Graphs.
LLMs and Their Application in Natural Language Processing
The Large Language Model (LLM) is a powerful machine learning model based on Transformers architecture, trained on massive internet text data to generate human-like text and understand human language. LLMs are used for tasks like text generation, classification, sentiment analysis, and summarisation. They are popular due to their simplicity of use, as users can input natural language prompts to get output. However, there is ongoing discussion about the effectiveness of prompt engineering and the scientific basis of LLM functionality. Despite these concerns, LLMs are currently valued for their practical applications rather than their scientific underpinnings.
Knowledge Graph for Information Retrieval
The limitations of using LLMs instead of Knowledge Graphs are evident due to LLMs' lack of proficiency in providing accurate knowledge. LLMs were not designed for information retrieval or database querying but rather for generating probabilistic text, leading to potential issues like hallucination. As an example, Panos had asked chat GPT to provide a list of books published by O'Reilly or another publisher about data engineering. The LLM inaccurately listed books, including one with incorrect title and author information. This highlights the unreliability of LLMs in providing accurate information.
The Differences between Knowledge Graphs and Learning Structures in Machine Learning
The differences between Knowledge Graphs and Large Language Models (LLM) lie in their strengths and weaknesses. Knowledge Graphs contain structural, explicit knowledge represented by symbols and descriptions, while LLMs consist of numerical weights in a neural network. Interacting with a Knowledge Graph provides clear knowledge based on its content, whereas querying an LLM may result in made-up responses. Inaccuracies in a Knowledge Graph stem from the content and its connections to the ontology, while LLM inaccuracies arise from the inference process. In conclusion, issues in a Knowledge Graph are content-related, whereas LLM issues are connected to both content and inference.
Deductive and Inductive Reasoning in Knowledge Graphs and Ontology
In Knowledge Graphs and ontologies, deductive reasoning is the primary form of logic. It operates on the principle that if a premise is true and the rule is correct, the conclusion is also true. This is illustrated in the classic example: 'All humans are mortal; Socrates is a human; therefore, Socrates is mortal.' However, in machine learning models, reasoning is not always deductive; it can also be inductive or abductive, leading to answers with varying levels of confidence. While Knowledge Graphs offer transparency and interpretability, they are not well-suited for understanding language, as they are optimized for conceptual knowledge and facts.
Natural Language Question & Answering and Knowledge Graph Development
LLMs are transforming the way we engage with data. A significant application of LLMs is in natural language question answering, enabling users to ask questions in everyday language rather than crafting complex SQL queries. LLMs excel at capturing linguistic patterns, enhancing the interpretation of natural language text. While LLMs cannot entirely replace Knowledge Graphs, they can streamline and scale the development of Knowledge Graphs, a traditionally challenging task based on domain and scope.
LLM as Knowledge Providers for Knowledge Graph Development
Panos discusses three main roles an LLM can play in developing a Knowledge Graph. The first role involves using an LLM as a direct factual and domain knowledge source. The second role is knowledge modelling or Semantic Modelling, which entails transforming requirements about knowledge representation into formal representations using natural language. The third role is knowledge mining, where an LLM is used to extract information from text and add it to the Knowledge Graph. Panos also addresses the challenges with using LLMs, such as the risk of unreliable or hallucinated facts due to overfitting, bias in training data, and conflicting training data.
Ambiguity in Knowledge Source Networks
The issue of ambiguity in an LLM, particularly in answering questions with multiple possible answers, is discussed. Panos creates an experiment where he provides additional context and clarifications to help the language model understand and handle ambiguity better. It was found that the LLM struggles to detect and address ambiguity effectively without explicit guidance. Panos thus recommends cross-referencing with other reliable sources to verify the accuracy of the information provided by the LLM.
Knowledge Modelling and Semantic Web Languages
Panos then discusses an evaluation of an LLM and its ability to transform competency questions into a Knowledge Graph. He highlights the LLM's success in modelling basic questions about movie directors and actors but points out its failure in handling a more complex example involving different types of clients for a company. The evaluation emphasises the importance of accurate naming and proper subclassing in knowledge modelling. Additionally, the discussion briefly touches on the significance of glossaries and taxonomies in defining information for LLMs and the role of learning in this context.
Limitations and Potential Improvements of Semantic Modelling in Knowledge Representation
The challenges of using LLMs to build an ontology that meets specific requirements are discussed. Panos highlights issues with the LLM's ability to generate accurate formal representations based on given prompts, particularly in capturing definitions and creating correct relationships between concepts. Additionally, he points out discrepancies in translating hierarchical taxonomies into formal Semantic Models, emphasising the LLM's limitations in differentiating between classes and individual entities. The example reflects the LLM's efficacy in knowledge representation and Semantic Modelling, revealing areas where it struggles to produce accurate and reliable results.
Meta-modelling and Knowledge Graph Modelling
LLM suffer from limitations when trying to understand the formal semantics of meta-modelling. Panos highlights that while an LLM can assist in Knowledge Graph modelling, it may not fully comprehend the semantics of the modelling language. Additionally, Panos points out that the LLM has encountered various examples of modelling from the data it has been trained on, some of which are good and some bad. He also mentions that there are conceptual problems with ontologies and Knowledge Graphs related to classes and individuals. It refers to the LLM's role in identifying minors and underscores the importance of considering these limitations when utilising LLM for modelling tasks.
Techniques of Information Extraction with Machine Learning Models
Three main approaches may be employed when utilising LLMs for information extraction. The first technique, zero-shot learning, involves prompting the LLM without examples and leveraging its existing reasoning capabilities. The second approach, few-shot learning, entails providing the LLM with both good and bad examples of a task to enhance its understanding. The third and most advanced approach involves building a custom dataset with positive and negative examples for the extraction task and fine-tuning the LLM. However, it's important to note that the LLM may combine prior knowledge with information from the text, leading to potential discrepancies in the extracted data. Therefore, carefully considering the application and scenario is essential when employing LLM for information extraction.
Relation Extraction in Machine Learning Models
Panos has also experimented with relation extraction and found that machine learning models often struggle with handling linguistic phenomena such as negation and uncertainty. He has tested the model's ability to understand uncertain statements and found that it performed well only when provided with clear and detailed instructional examples. The conclusion was that while the machine learning model can be a competent knowledge miner, it requires specific and detailed input to extract relations and accurately handle semantic phenomena. Panos thus emphasises the need for proper evaluation data to assess the model's performance across different relations and semantic phenomena.
Understanding the Efficacy and Usage of LLMs as Knowledge Developers
When using LLMs for knowledge development, it's important to consider their limitations. LLMs can provide structured examples, but they lack a deep understanding of semantics and may not be reliable knowledge providers. Panos recommends the use of LLMs in conjunction with other knowledge sources and the practice of carefully evaluating their output. Additionally, LLMs can be valuable for accelerating the modelling process, but they require thorough inspection and fine-tuning for specific tasks. While LLMs can offer clear examples, traditional approaches may be more efficient for certain information extraction tasks. Therefore, it is essential to use LLMs judiciously and supplement their output with human expertise in knowledge engineering and Semantic Modelling.
Knowledge Engineering and Artificial Intelligence: A Discussion on Upcoming Courses and Concepts
Panos shares two upcoming courses. The first, "Knowledge Graphs & Large Languages Models Bootcamp," is free for existing members of the O'Reilly platform. The second, "Ontology Engineering Strategies and Solutions," will take place in November and delves into conceptual modelling challenges. Additionally, Panos addresses the relationship between LLMs and Artificial Intelligence (AI), explaining that LLMs fall under generative AI and are trained in natural language.
Closing Discussion on Artificial Intelligence
Panos and attendees then discuss various topics related to Artificial Intelligence (AI), including its various techniques and applications. Panos shares that AI is a broad term that encompasses different techniques such as Knowledge Graph, machine learning, and reinforcement learning. An attendee touches on this, shares their challenges of trusting information, and expresses interest in training courses related to AI.
If you want to receive the recording, kindly contact Debbie (social@modelwaresystems.com)
Don’t forget to join our exciting LinkedIn and Meetup data communities not to miss out!