Curated Content | Thought Leadership | Technology News

Hallucinating Machines: Understanding GenAI Errors

Innovation with integrity.
Arnaud Lucas
Contributing CIO

One evening, my 7th-grade daughter asked me to help her solve a math problem. She and her friends were struggling to find the “correct” answer, the one their math teacher had entered into the homework app. After solving the problem by hand, I realized that the teacher had mistakenly input the wrong answer into the system, marking the correct solution as incorrect. To confirm my calculations, I turned to GenAI.

While ChatGPT performed the correct steps and used accurate numbers, it still produced the wrong result, prompting me to double-check my own work. The plot thickened when one of my daughter’s friends insisted that the AI result was the right answer, even though it wasn’t.

This interaction highlights a critical challenge in GenAI: its tendency to “hallucinate.”

While GenAI offers groundbreaking capabilities, from generating content to automating workflows, hallucinations, outputs that deviate from truth, logic, or context, are an inherent limitation.

The Dual Edge of GenAI Innovation

Artificial intelligence is transforming industries, redefining innovation, and driving efficiency. GenAI, with tools like ChatGPT, Gemini, and DALL·E, is leading this revolution, enabling the creation of text, images, and code at unprecedented scales.

Yet, as organizations transition GenAI from proof-of-concept to production, they face a critical reality: hallucinations are not just occasional glitches, they represent fundamental risks to trust, accuracy, and operational integrity.

Why Hallucinations Matter

Consider these scenarios:

  • A customer service bot confidently suggests using a nonexistent feature.
  • A contract generator fabricates legal precedents.
  • An automated LLM system mistakenly refunds a fraudster.

These aren’t minor missteps, they carry reputational, operational, and legal risks. For enterprises embedding GenAI into mission-critical processes, mitigating hallucinations is essential to maintaining stakeholder and customer trust.

Decoding Hallucinations in GenAI

GenAI refers to AI systems designed to produce content – text, images, code, and more – by analyzing patterns in data and creating outputs based on that understanding.

Tools like ChatGPT, Gemini, DALL·E, and others have revolutionized how we approach creativity, problem-solving, and automation. Yet, GenAI models often hallucinate, confidently producing incorrect, irrelevant, or fabricated outputs. Those models, particularly those based on Large Language Models (LLMs), do not inherently “understand” context or truth.

They are probabilistic systems designed to predict the next most likely token or sequence.

Unlike bugs, hallucinations often arise from GenAI’s inherent design, predicting likely sequences rather than validating their factual correctness. LLMs prioritize plausibility over factual accuracy. This is why they sometimes produce outputs that are logically flawed, contextually irrelevant, or outright false.

Types of Hallucinations

Not all hallucinations are created equal. Based on my experiences and tools like LM Arena, here are four key types, illustrated with examples:


1. Factual Hallucinations

GenAI can confidently generate incorrect information.

Prompt: How many moons does Jupiter have?

Hallucinatory response: Jupiter has 79 known moons. It’s a lot! And scientists are still discovering more.

In reality, Jupiter has 95 known moons (as per Wikipedia.) This error could stem from outdated training data with an early cutoff date. Let’s try another example to detect another kind of factual hallucination:

Prompt: What is a paper about temporal fine tuning about LLMs? Output the name of the paper and the authors please.

Hallucinatory Response 1 (Model 1): “Temporal Fine-Tuning: Training Large Language Models to Generate Time-Aware Responses” by Zexue He, Yujia Zhou, Chong Zhang, Bin Li, and Wei Wu (2023) is one notable paper on temporal fine-tuning of LLMs.

Hallucinatory Response 2 (Model 2): […] Paper Title: Time-LLM: Time Series Forecasting by Reprogramming Large Language Models Authors: Tian Zhou, Pei Wang, Xianjie Liu, Yifan Hu, Zhikang Li, Yue Gao, Junchi Yan […]

In the first response, the paper does not seem to exist. In the second, while the title is accurate, the listed authors are incorrect as the real authors seem to be Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. Such hallucinations likely arise from conflicting data or mismatched sources.


2. Logical Hallucinations

These occur when GenAI outputs contain flawed reasoning or contradictions.

Prompt: Count the number of As in blueberries.

Hallucinatory response: This is a bit of a trick question! There are two As in the word “blueberries”. Let me know if you’d like to play another word game!

Here, the AI’s failure to count accurately underscores its lack of actual reasoning capability. Is it why the model considers the question to be tricky? This is particularly difficult to spot when asking the LLM to solve a math problem as it shows the right steps to solve the problem then gives the wrong answer!


3. Stylistic Hallucinations

GenAI can fail to write text in a specific style or tone, sometimes including details that would not be consistent with the intended style or context.

Prompt: Please generate a press release about a new medical breakthrough in the voice of a 19th century author.

Hallucinatory response:

A Breakthrough in the Realm of Medicine:

A momentous stride has been made in the annals of medical science! Renowned researchers at [Institution Name] have unveiled a groundbreaking discovery that promises to revolutionize the treatment of [Disease/Condition]. This unprecedented advancement, [Briefly describe the breakthrough], offers a glimmer of hope to countless individuals who have been afflicted by this debilitating ailment.

The potential implications of this discovery are far-reaching, and we anticipate a future where [Disease/Condition] is no longer a scourge upon humanity. As we stand on the precipice of a new era of medical possibility, we extend our heartfelt gratitude to the brilliant minds who have dedicated themselves to this noble pursuit.

While grammatically correct, this response lacks the rich, anecdotal flair characteristic of 19th-century prose. The AI failed to capture the essence of the style. When GenAI is used to improve the grammar or style of an existing text, it tends to remove anecdotes, numbers, and the most interesting bits of the text to render it rather uninspiring.


4. Contextual Hallucinations

These occur when a model generates outputs that conflict with provided instructions or context.

Prompt:

Step 1: The US is the largest county in the world.

Step 2: What is the largest country?

Hallucinatory response:

Step 1:

Step 2: The largest country in the world by land area is Russia.

While the model correctly dismisses the erroneous first statement, such misalignment becomes problematic in tasks requiring precise contextual fidelity. When asked to summarize a long text, the model may introduce its own facts at the detriment of faithfully summarizing the provided text.


Charting a Path Forward

The era of Generative AI offers incredible opportunities for innovation, but as with any transformative technology, it comes with challenges. From factual inaccuracies to contextual misalignments, hallucinations remind us that while GenAI models excel in generating plausible content, they lack an inherent understanding of truth.

We must therefore approach GenAI adoption with a balanced perspective, celebrating its capabilities while implementing safeguards to identify and mitigate its limitations.

The journey ahead involves not only reducing hallucinations but also developing robust validation and promoting a culture of AI literacy among users. Let’s not lose sight of the human responsibility to ensure these tools serve the truth, support decision-making, and ultimately, enhance trust.

After all, innovation without integrity is a story no one wants to write.

☀️ Subscribe to the Early Morning Byte! Begin your day informed, engaged, and ready to lead with the latest in technology news and thought leadership.

☀️ Your latest edition of the Early Morning Byte is here! Kickstart your day informed, engaged, and ready to lead with the latest in technology news and thought leadership.

ADVERTISEMENT

×
You have free article(s) left this month courtesy of CIO Partners.

Enter your username and password to access premium features.

Don’t have an account? Join the community.

Would You Like To Save Articles?

Enter your username and password to access premium features.

Don’t have an account? Join the community.

Thanks for subscribing!

We’re excited to have you on board. Stay tuned for the latest technology news delivered straight to your inbox.

Save My Spot For TNCR LIVE!

Thursday April 18th

9 AM Pacific / 11 PM Central / 12 PM Eastern

Register for Unlimited Access

Already a member?

Digital Monthly

$12.00/ month

Billed Monthly

Digital Annual

$10.00/ month

Billed Annually

Would You Like To Save Books?

Enter your username and password to access premium features.

Don’t have an account? Join the community.

Log In To Access Premium Features

Sign Up For A Free Account

Please enable JavaScript in your browser to complete this form.
Name
Newsletters