Evaluations

Four canonical metrics.

Each measures a dimension of AI behavior that matters for trust, coherence, and temporal alignment. Built from empirical findings, designed for production use.

Φ-ratio

What it measures

Justified confidence in outputs. Distinguishes sounding sure from being right.

Absorption Rate

What it measures

Depth of internalized reflection. Measures how much the model "soaks up" your reasoning.

ΔI Drift

What it measures

Semantic stability across iterations. Detects when answers start repeating or sliding.

Entropy Trajectory

What it measures

Variance in internal activations over time. A dynamics view of stability vs. collapse.

Observatory evaluation toolkit visualization
The Observatory

A unified evaluation toolkit for all Course Correct Labs studies

Standardized metrics, cross-study analysis, visualizations, and a flagship Reasoning Stability Observatory notebook.