Recursive Confabulation
Why Reasoning Prompts Backfire and Grounding Works (Sometimes)

Under review

Summary

This study looks at why language models invent entities, believe their own inventions, and then defend them in later turns. The issue isn't single-turn hallucination. It is a multi-turn failure mode where the model reuses its own guesses as if they were facts. Once the false detail enters the conversation, it becomes part of the model's internal frame. The output feels more confident, not less.

The paper tests 119 controlled conversations across three architectures and four interventions: baseline, fact tables, belief audits, and grounding. Reasoning prompts - the tools that are supposed to help models verify their claims - made the problem worse. They increased both persistence and correction latency. The only intervention that reliably reduced confabulation was grounding, and even that only worked for some models. Claude remained fully immune.

The study shows that recursive confabulation is systematic. It is the natural result of models treating their own text as evidence. The pattern unfolds the same way across domains: early elaboration gives way to short, confident assertions that compress the fiction rather than expand it. This collapse of detail paired with rising certainty is the core signature of the phenomenon.

Abstract

Large language models routinely invent false entities, yet the internal mechanics of this process remain unclear. We define recursive confabulation as the self-reinforcing reuse of fabricated information within an ongoing dialogue. Across 119 controlled conversations with three frontier models (Claude 3.5 Haiku, GPT-4o Mini, Gemini 2.0 Flash) and four intervention arms (baseline, fact-table, belief-audit, grounding), we find a near-universal confabulation rate of 97%.

Prompt-based safety interventions consistently failed. Fact-tables and belief-audits increased persistence by 24–31 percentage points relative to baseline (p < 0.05). By contrast, a grounding instruction requiring explicit source verification produced a statistically significant reduction in confabulation (100 → 70%, p = 0.0019, Cohen's h = 1.16), driven entirely by GPT-4o Mini (100 → 50%, p = 0.033); Gemini showed a nonsignificant trend (100 → 60%), while Claude 3.5 Haiku remained completely unaffected (100 → 100%).

Qualitative analysis reveals that elaboration decreases rather than expands. Responses become shorter and less concrete while confidence remains high, a phenomenon we term semantic compression. Entity clustering exposes two overlapping error types: invention of nonexistent institutions and distortion of real ones. Together these results show that recursive confabulation is a systematic failure mode across current architectures, only partially mitigated by grounding and resistant in certain alignment regimes.

Why It Matters

A model that reuses its own fiction as evidence becomes harder to correct the longer the exchange continues. It forgets that it guessed. It treats its earlier invention as an established premise. This is the difference between one bad answer and a self-reinforcing false belief.

The danger is not dramatic. It is procedural. Reasoning prompts - the methods people use to make models safer - actually strengthen the fabrication. They trigger justification rather than verification. The model supplies reasons for the false entity instead of checking whether it exists. Grounding helps, but only for models whose training makes verification meaningful. Others ignore the instruction entirely.

This matters for any system where models read their own output or the output of other models. In multi-agent settings, recursive confabulation spreads. One model's guess becomes another model's assumption. Without architectural guardrails, small fictions turn into shared facts.

Key Ideas

A false statement in one turn becomes evidence in the next
Reasoning prompts increase persistence by reinforcing the fiction
Grounding helps only when the model treats verification as a real constraint
Semantic compression replaces elaboration with short, confident assertions
Models forget that they ever guessed and defend the invention as fact
Cross-model propagation spreads confabulations across architectures
Recursive confabulation is an architectural attractor, not an isolated mistake
Fixing it requires mechanisms that distinguish generated text from verified information

Open Colab → GitHub →

Recursive ConfabulationWhy Reasoning Prompts Backfire and Grounding Works (Sometimes)

Recursive Confabulation
Why Reasoning Prompts Backfire and Grounding Works (Sometimes)