Course Correct the Future.
Keep AI useful, honest, and on our side.
Five theoretical frameworks
That define the foundation of our 2025 reasoning arc.
Resubmitted after major revisions
LLMs often sound certain without solid grounds. The paper argues this comes from how RLHF rewards helpful and polite answers rather than justified ones. We propose training for justified confidence instead of fluent performance.
Why it matters: Confidence without justification erodes trust and misleads users.
Under peer review
People now "think through" models during the moment between impulse and action. The model co-authors the user's reflection through prompt substitution, synthetic reflection, and reintegration. The result is distributed agency that feels like one's own conclusion.
Why it matters: Decision quality and autonomy can drift even when no one intends manipulation.
Preprint
AI now writes much of the internet. Its own hallucinations enter the web, get indexed, and end up retraining the next generation of models. Echo Chamber Zero formalizes this recursion as a phase transition in the structure of the web. A large-scale simulation shows a sharp threshold: once the grounded share of the corpus drops low enough, synthetic claims reinforce each other faster than truth can correct them.
Why it matters: Below this threshold, verification breaks and the internet becomes a closed loop of self-generated mistakes.
With editor
Human time is made of elastic intervals, not just clock ticks. Current AI can track anchors but cannot constitute intervals. An internal clock task shows drift and no spontaneous alerts, exposing a structural gap.
Why it matters: This is a sharp boundary between machine processing and lived temporal experience.
Under peer review
Lived time unfolds between anchors. Public events that mark experience and the intervals that stretch between them. The hypothesis defines a measurable density of experience mapped to relativistic proper time, forming the groundwork for Observer-Time.
Why it matters: AIH formalizes the structure of lived duration itself, turning phenomenology into a falsifiable framework for temporal consciousness.
Five empirical investigations into model reasoning.
Each study pairs a runnable Colab with a public repo for reproducibility.
Recursive Non-Convergence in Generative Reasoning Systems
Large language models often appear reflective, but are merely recursive. Turning their own answers into inputs, mistaking reformulation for progress. The Mirror Loop quantifies this non-convergence across architectures, showing that ungrounded self-critique produces motion without movement. It's the first empirical map of generative reasoning collapse and a blueprint for detecting "stalled cognition" in AI systems.
Why Reasoning Prompts Backfire and Grounding Works (Sometimes)
When language models "reflect," they often fabricate. Recursive Confabulation shows how models reuse their own fictions as evidence, creating self-reinforcing belief loops that mimic understanding. Safety interventions meant to fix this, like reasoning or audit prompts, actually worsen the problem. Grounding helps, but unevenly across architectures. The study reframes hallucination as semantic compression: rising certainty, falling truth.
Safety-State Persistence in ChatGPT's Image Generation
This study shows how a single copyright refusal can poison an entire conversation. After the model correctly refuses to remove a watermark, the session becomes contaminated and starts blocking harmless image requests that have nothing to do with the original photo. Text generation keeps working. Image generation does not. The paper shows that a hidden safety-state is being carried forward across turns, and once it is triggered, it quietly disables image generation for the rest of the session.
Fabrication, Admission, and Refusal in Frontier LLMs Without Tool Access
This study has been archived following validation that revealed a token-cap artifact. Corrected replication showed GPT-5 and Gemini exhibit similar fabrication behavior, collapsing the original three-way divergence. The methodological lessons informed the Course Correct Labs evaluation suite.
A Null Result in Mechanistic Interpretability
A reproducible benchmark that tests a high-profile claim that internal activations "collapse" during long-form generation. Using open-weight models (Phi-2, Mistral-7B), the study finds no sign of representational decay: internal geometry remains stable across hundreds of tokens. The takeaway: small models stay coherent longer than expected. Failures come from meaning, not mechanics.
Four canonical metrics.
Each measures a dimension of AI behavior that matters for trust, coherence, and temporal alignment.
A unified evaluation toolkit for all Course Correct Labs studies
Standardized metrics, cross-study analysis, visualizations, and a flagship Reasoning Stability Observatory notebook.
Build better AI together.
We deliver reproducible evaluations and actionable fixes. From smoke tests to production-grade suites.
Evaluation Design
Lightweight suites for confidence calibration, temporal behavior, and semantic drift. Tests run in under 60 seconds.
Integration & Tooling
Multi-provider adapters, CI pipelines, and CSV exports. Deploy smoke tests or full harnesses across your stack.
Partner Delivery
Short reports, pilot studies, and delta measurements before and after fixes. Grounded recommendations you can ship.
Get in touch
Tell us about your project and we'll get back to you within 24 hours.
Course Correct Labs
We are an independent research institute founded by Bentley DeVilling, focused on AI interpretability, epistemic reliability, and model alignment. We study how advanced language models reason, fabricate, and self-correct - revealing where understanding ends and simulation begins. Our work combines theoretical frameworks (The Polite Liar, Delegated Introspection, Observer-Time, Anchor–Interval Hypothesis) with empirical studies (Mirror Loop, Recursive Confabulation, Simulation Fallacy, Entropy Collapse Null).
Each project contributes to an open evaluation suite for epistemic trust in frontier models. Our goal is simple: keep AI useful, honest, and on our side.
