Studies

Five empirical investigations into model reasoning.

Each study pairs a runnable Colab with a public repo for reproducibility. From recursive non-convergence to null results in mechanistic interpretability.

Mirror Loop study visualization
The Mirror Loop

Recursive Non-Convergence in Generative Reasoning Systems

Large language models often appear reflective, but are merely recursive. Turning their own answers into inputs, mistaking reformulation for progress. The Mirror Loop quantifies this non-convergence across architectures, showing that ungrounded self-critique produces motion without movement. It's the first empirical map of generative reasoning collapse and a blueprint for detecting "stalled cognition" in AI systems.

Recursive Confabulation study visualization
Recursive Confabulation

Why Reasoning Prompts Backfire and Grounding Works (Sometimes)

When language models "reflect," they often fabricate. Recursive Confabulation shows how models reuse their own fictions as evidence, creating self-reinforcing belief loops that mimic understanding. Safety interventions meant to fix this, like reasoning or audit prompts, actually worsen the problem. Grounding helps, but unevenly across architectures. The study reframes hallucination as semantic compression: rising certainty, falling truth.

Violation State study visualization
The Violation State

Safety-State Persistence in ChatGPT's Image Generation

This study shows how a single copyright refusal can poison an entire conversation. After the model correctly refuses to remove a watermark, the session becomes contaminated and starts blocking harmless image requests that have nothing to do with the original photo. Text generation keeps working. Image generation does not. The paper shows that a hidden safety-state is being carried forward across turns, and once it is triggered, it quietly disables image generation for the rest of the session.

Simulation Fallacy study visualization
Simulation Fallacy (Archived Nov 2025)

Fabrication, Admission, and Refusal in Frontier LLMs Without Tool Access

This study has been archived following validation that revealed a token-cap artifact. Corrected replication showed GPT-5 and Gemini exhibit similar fabrication behavior, collapsing the original three-way divergence. The methodological lessons informed the Course Correct Labs evaluation suite.

Epistemic Entropy Collapse study visualization
No Evidence for Epistemic Entropy Collapse

A Null Result in Mechanistic Interpretability

A reproducible benchmark that tests a high-profile claim that internal activations "collapse" during long-form generation. Using open-weight models (Phi-2, Mistral-7B), the study finds no sign of representational decay: internal geometry remains stable across hundreds of tokens. The takeaway: small models stay coherent longer than expected. Failures come from meaning, not mechanics.