Are AI models doomed to always hallucinate?

Large language models (LLMs), such as OpenAI’s ChatGPT, share a common issue: they generate fabricated information. These errors vary from benign, like asserting the improbable transport of the Golden Gate Bridge through Egypt in 2016, to severe and potentially harmful.

For instance, an Australian mayor contemplated legal action against OpenAI when ChatGPT mistakenly claimed his involvement in a bribery scandal. Researchers have also uncovered instances where LLMs can be manipulated to distribute malicious code to unsuspecting software developers. Furthermore, these models frequently provide incorrect mental health and medical advice, such as suggesting that wine consumption prevents cancer.

This propensity to invent “facts” is referred to as hallucination and arises from the way today’s LLMs, as well as all generative AI models, are developed and trained.

Generative AI models lack real intelligence and instead rely on statistical systems to predict words, images, speech, music, or other data. These models are trained on vast datasets, often sourced from the public web, to learn the likelihood of data occurrences based on patterns and context.

For instance, given a typical email ending with “Looking forward…,” an LLM might complete it with “…to hearing back,” following the pattern of numerous emails it has encountered. However, this doesn’t imply that the LLM possesses any genuine anticipation.

The current training framework involves concealing or “masking” previous words for context, prompting the model to predict suitable replacements for these concealed words. This approach is conceptually similar to using predictive text in mobile devices.

While this probability-based approach generally works well on a large scale, it is not foolproof. LLMs can generate grammatically correct but nonsensical text, propagate inaccuracies present in their training data, or merge conflicting information sources, including fictional ones.

These inaccuracies are not driven by malice on the part of LLMs. They lack the capacity to distinguish between true and false information and instead associate certain words or phrases with concepts, even if those associations are incorrect.

The term “hallucinations” in this context relates to the LLM’s inability to estimate the uncertainty of its predictions. LLMs are typically trained to produce an output regardless of how different the input is from their training data. They lack the capability to determine if they can reliably answer a query or make an accurate prediction.

Addressing hallucination is a complex matter. It depends on the definition of “solved.” Some experts believe that LLMs will always exhibit some degree of hallucination, but there are ways to reduce it depending on how the model is trained and deployed.

One approach involves engineering question-answering systems to have high accuracy by curating a high-quality knowledge base and connecting it with an LLM for accurate answers. The quality of the knowledge base can significantly impact the LLM’s performance.

Reinforcement learning from human feedback (RLHF) is another technique used to mitigate hallucinations in LLMs. This method involves training the LLM, gathering human feedback to create a reward model, and fine-tuning the LLM using reinforcement learning based on this reward model. However, RLHF is not without its limitations.

Some argue that hallucination may not necessarily be a problem, as it can spur creativity by acting as a “co-creative partner,” offering outputs that, while not entirely factual, contain useful elements for exploration. This can be valuable in creative and artistic tasks.

Ultimately, LLMs should be approached with a degree of skepticism, and the focus should be on maximizing their utility while acknowledging their imperfections, rather than expecting flawless performance.