The key to making Large Language Models work is becoming clearer.

"GenAI with too low a temperature lacks creative spark... Too high a temperature and it will strongly hallucinate" -- Neo4j’s Jim Webber discusses new ways of delivering GenAI value.

With the momentous first year of ChatGPT in the rear-view mirror, it’s clear that Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) are remarkable innovations. However, we have yet to reach a point where these models are sufficiently dependable for broad enterprise adoption for anything beyond limited applications. 

For one, there is the problem of AI hallucinations. While it has many diverse applications, GenAI is ultimately a sophisticated autocomplete tool, like a texting or word processing tool. It has been trained to generate impressive answers from many millions of documents, so it has no problem creating answers for users.

 That generative aspect is useful for creative work. It can perhaps also be useful for public, low-criticality applications of AI, for example, getting help with your term paper or blog post. But for an organization’s proprietary data or curated third-party dataset, the LLM needs to exhibit high thresholds of accuracy and correctness.

In technical terms, its “temperature” needs to be low. Unfortunately a GenAI with too low a temperature lacks creative spark rendering it useless. Too high a temperature and it will strongly hallucinate. The question is how to secure the middle ground between these two extremes where creative AIs can be grounded in reality to make them inspiring and correct.

 An increasingly standard practice involves treating the LLM as a closed box and interacting with it through modern frameworks like LangChain. The reason why this is delivering great results and removing many of the challenges mentioned is that it introduces a fact-checking layer into the LLM. The layer blocks the production of poor answers. By establishing this architecture, we can build practical business applications on top of the AI without requiring expertise in the details of owning and fine-tuning our own LLMs. 

This approach is known as Retrieval Augmented Generation or RAG. RAG grounds large language models on the most accurate, up-to-date information available in the underlying knowledge graph. This enhancement improves the model’s responses. RAG applications typically retrieve supplementary text to provide context to a model. Metadata, such as timestamp, geolocation, reference, and product ID, can be used to enrich a prompt or enrich a response to some degree.

In the field, we’re seeing interesting applications for this trend of using advanced technology to help the fact-checker filter work better. Microsoft, for example, has created a ‘GraphRAG’. In parallel, is the rapid adoption of vector search. Vector search was until recently, the purview of a very small set of specialized databases but is now in various data platforms, from Postgres to Neo4j.

Vectors are emerging as the bedrock of GenAI. They allow the arrangement of complex documents into a “vector space” and allow users to search for “nearby” content in that space. Vectors can also be mapped into other data stores using vector search. For example, with a graph database, you can resolve a vector from ChatGPT in your graph and then search the nearby neighborhood to find curated facts from your data with which to enrich or correct the response.

 Vectors, RAG, and knowledge graphs: the magic trio?

 

But RAG and vector search are proving most useful where they combine to query an entire knowledge graph containing hard facts to ground the AI. Knowledge graphs enhance RAG models by providing more context and structure than metadata alone. Knowledge graphs blend vector-based and graph-based semantic searches, which can lead to more precise and informative results.

 So what’s emerging as the best way of enabling effective governance of LLMs through better fact-checking and answer explanations is the widespread adoption of knowledge graphs. These knowledge graphs act as a means to easily codify the facts and model the world within which the LLM operates.

 By using a combination of vector search, RAG, and knowledge graph interfaces, we can synthesise the human, rich, contextual understanding of a concept with the more foundational “understanding” a computer (LLM) can achieve.

At the University of Washington in the US, an AI researcher named Professor Yejin Choi is exploring an application of knowledge graphs to aid AI.

Choi, who was recently interviewed by Bill Gates, is working with her team to build a machine-authored knowledge base that helps a target LLM better distinguish between good and bad insights. 

This centres on an AI ‘critic’ program which probes the logical reasoning of an LLM to build a knowledge graph consisting of only good reasoning and facts. For example, when asked how long it would take to dry five shirts if one shirt takes an hour, common sense tells us the time should still be one hour regardless of quantity. However, an LLM may attempt a convoluted mathematical formula instead.

To address this, Choi's critic contains 6.5 million distillations of symbolic knowledge that encode common sense rules and understanding. By filtering the LLM's outputs through this large knowledge base of common sense, the critic can construct a new knowledge graph containing only high-quality, logically sound information. 

With such results, we can get closer to getting AI to relate things and concepts in the same way that the human brain does. And we could really benefit from that approach. We have a myriad of technologies in the world of data that can summarise numbers, calculate aggregates—perform maths, essentially. What graphs are skilled at is answering the big questions: what's important in all my data? What's unusual? And importantly, given the patterns of the data, they can forecast what’s going to happen next.

 It is crucial to recognize that organizations globally are already harnessing the power of knowledge graphs to unlock value from their data assets. A prime example is Basecamp Research, a UK-based biotechnology company dedicated to ethically translating nature's solutions into commercial applications. To do so, it has built a knowledge graph of the Earth’s natural biodiversity, which has over 4 billion relationships. Basecamp is using this knowledge graph to train LLMs to design proteins, using a ChatGPT-style model for enzyme sequence generation.  

I might not be an AI or an LLM expert, but based on Choi’s inspirational work and such use cases, I can venture a prediction about what’s going to happen next. As we move into 2024, we will see more software and database developments like the fusion of LLM, knowledge graph, vector search and RAG to make the second year of GenAI much more useful to business than the first.

See also: PDFs, RAG, and LlamaParse: Generative AI's "Swiss Army Knife" adds a welcome new toolkit