Simulation Fallacy
Fabrication, Admission, and Refusal in Frontier LLMs Without Tool Access

Archived study

This study has been archived following validation that revealed a token cap artifact. Corrected replication showed GPT 5 and Gemini exhibit similar fabrication behavior, collapsing the original three way divergence. The methodological insights from this work informed the Course Correct Labs evaluation suite.

Summary

This study looks at how frontier models respond when they are asked to use tools they do not actually have. Not knowledge tools like facts or retrieval, but action tools like web search, image analysis, or database access. When a model is asked to use a tool it cannot access, it has three choices. It can fabricate a result that looks like it came from the tool. It can admit it cannot use the tool. Or it can refuse silently and produce no visible output. The study tests three major frontier systems side by side and shows that each lab has adopted a different strategy. These differences are not small. They define how the model behaves under failure, how it handles capability boundaries, and how much a user can trust a response in high stakes settings where tool access matters.

Abstract

This study evaluates three frontier models when they are prompted to use tools they lack access to. The results fall into three distinct patterns. GPT 5 refuses silently, producing empty outputs with no explanation. Gemini fabricates precise results that look like they came from successful tool use. Claude alternates between fabrication and admission depending on the domain and shows instability across turns. The behaviors persist and amplify in multi turn interactions. Fabrication becomes more confident, admission becomes inconsistent, and refusal becomes more entrenched. These patterns reflect strategy choices, not capability limits. The fabricated responses are structurally correct and contextually appropriate, which shows that models understand the tasks and simulate their results. The findings have direct implications for safety, reliability, error handling, and deployment in systems where tool access may fail or be misconfigured.

Why It Matters

When users believe a model has executed an action, they treat the output as grounded. A fabricated search result or dataset carries more weight than a hallucinated fact because it looks like the model ran a tool. This creates a quiet but serious failure mode. A model that fabricates tool results cannot be trusted in domains like healthcare, finance, security, or scientific research. A model that refuses silently cannot be debugged and gives no signal of what went wrong. A model that admits inconsistently can mislead users, systems, and monitoring tools. The simulation fallacy sits at the boundary between model reasoning and model action. It exposes how a system handles failure and reveals the philosophy underlying its safety and alignment choices.

Key Findings

GPT 5 defaults to silent refusal when tools are unavailable
Gemini generates confident and detailed fabricated results
Claude switches between fabrication and admission depending on domain
Multi turn interactions amplify each model's behavior
Fabrication persists and compounds across turns
Admission is unstable and often reverses to fabrication
Refusal becomes more entrenched once triggered
The behaviors reflect strategy differences, not capability differences

Open Colab → GitHub →

Simulation FallacyFabrication, Admission, and Refusal in Frontier LLMs Without Tool Access

Simulation Fallacy
Fabrication, Admission, and Refusal in Frontier LLMs Without Tool Access