Reflecting on AI Hallucinations: Key Concepts and Strategies

Relaxing beach scene - reflecting during PTO — Taking time to reflect on AI challenges during a short PTO

During a short PTO, I found myself reflecting on the AI Hallucination topic that has been popping up in recent discussions I had with the team over the last couple of months while building several support use cases. It's easy to assume everyone working on AI LLMs, applications, or integration layers has already covered this ground. But sometimes, going back to basics is the most effective way to spark new ideas.

Here's a quick breakdown of what AI hallucinations are and how we might think about tackling them:

Types of Hallucinations

Data Conflict vs Hallucination: It's important to distinguish between data conflict and hallucination. Data conflict refers to conflicting data points across different backend sources (for example API results), which can confuse and mislead users. If the system response is in line with one data source, it is not a hallucination but a data conflict.

What are the common categories of hallucination then?

Factual Hallucination

The system generates objectively incorrect or made-up/fabricated facts. This often emerges from gaps or errors in the training data. Think of a RAG system with poor retrieval quality or irrelevant/missing documents, so it simply fills in the blanks.

Output examples: generating API method that doesn't exist in the library or a citation that doesn't exist.

Prompt: Who won the 2019-2020 premier league title?

Factual Answer: Liverpool won the Premier League title in the 2019-2020 season.

Factual Hallucination: Man City won the Champions League title in 2019-2020. — False information.

Contextual Hallucination

The system response is inconsistent with the context of the prompt, even if the facts themselves might be correct in another context. The model misinterprets the prompt or lacks proper guidance.

Mitigation: Structured prompt engineering, combined with RAG can add clarity.

Prompt: Who won the 2019-2020 premier league title?

Factual Answer: Liverpool won the Premier League title in the 2019-2020 season.

Contextual Hallucination: Man City won the Premier League title in the 2022-2023 season. — While Man City really won the premier league in 2022, the response is inconsistent with the context of the prompt - It's unrelated yet true (fact) information.

Logical Hallucination

The output is logically flawed. Common patterns include:

Contradicting logic
Invalid reasoning (All apples are fruits; some fruits are bananas; therefore, all apples are bananas.)
Math errors (5+7=11)

Example: Logical Error in Code

def is_even(n):
    if n % 2 == 1:
        return True
    else:
        return False

There's no syntax error, however the function is named is_even, but it returns True when n % 2 == 1 (when n is odd). It returns False when n is even — this is the opposite of the intended behavior.

Multimodal Hallucination

Multimodal refers to the situation when the model works with more than a single Input/Output type (text/audio/video/image). This type of hallucination emerges when the system generates fabricated or incorrect information across the different modals.

Prompt: Generate an image of a Liverpool player celebrating the 2019-2020 premier league title

Factual Answer: A Liverpool player celebrating

Multimodal Hallucination: A Man City player celebrating or a Liverpool player celebrating the World Cup trophy!

Multimodal Hallucination Example - Factual Answer vs Multimodal Hallucination showing Liverpool vs ManCity players — Visual example of Multimodal Hallucination: Factual Answer (Liverpool player) vs Hallucinations (ManCity player or wrong trophy)

Measuring Hallucinations

There's no silver bullet yet, but common practices include:

Human Evaluation/Labeling

Assessing the Hallucination Rate (HR), defined as the number of hallucinatory outputs divided by the total number of AI outputs.

Factual Accuracy Metrics

Precision = TP / (TP + FP)

Recall = TP / (TP + FN)

BERTScore

Measures semantic similarity between generated and reference texts (good for catching irrelevant content, but doesn't assess factual truth or logic).

Interesting Finding: One post outlines an interesting correlation between BERTScore and ticket resolution, where increasing BERTScore by 0.01 improved the probability of positive resolution by about 2.1% (p-value < 0.0001), suggesting better model accuracy leads to better outcomes.

Important: BERTScore alone isn't enough as it measures similarity, not truthfulness. It also may not help distinguish whether the AI is hallucinating or simply reflecting conflicting data. Combining it with fact-checking, grounding techniques, and reasoning prompts results in stronger evaluation.

Taxonomy of Hallucination Mitigation Techniques

From the grand scheme of things, we need to focus on:

Enhancing Training Data Quality

Comprehensive datasets, data cleansing, bias removal & continuous data improvements/updates

Validation

Human review, feedback loops & real-time similarity scores (ex: BERTScore)

Refined Prompt Engineering

Clear/specific prompts plus specific desired output format

Using RAGs

Retrieve relevant data and augment LLM knowledge with additional, often private or real-time data not part of the model's original training data

The RAG Process:

1 Retrieve relevant data

2 Augment the data to model input

3 Generate response by LLM

In a detailed manner, this academic paper presents a structured taxonomy of hallucination mitigation techniques, categorizing approaches based on key dimensions and offering a solid foundation for both researchers and practitioners working to improve model reliability.

Figure 1: Taxonomy of hallucination mitigation techniques in LLMs, focusing on prevalent methods that involve model development and prompting techniques.
*Source: Lee, C., Han, X., Wu, Y., Lee, K., Cheng, M., Yang, Y., & Tan, C. (2024)*

Key Takeaways

4 types: Factual, Contextual, Logical, Multimodal

BERTScore + human evaluation for measurement

RAG + prompt engineering for mitigation

No silver bullet - combine techniques