This study tests a specific claim: that internal activation entropy in language models declines during long-form generation, causing hidden states to compress into a lower-dimensional space. If the hypothesis were true, internal structure would shrink as the model generates text, and early signs of collapse would forecast failures later in the sequence.
The experiment analyzes two open models, Phi-2 and Mistral-7B, across 346 long outputs. For each generation, the study extracts hidden states at the final transformer layer, computes effective rank, participation ratio, and activation variance over sliding windows, and aligns those metrics with external measures like semantic drift, trigram novelty, and QA success.
The result is a clean null. Internal dimensionality stays flat. Only about ten percent of sequences show even mild negative slopes, and those are symmetric with positive slopes. Early entropy does not predict failure. There is no correlation between internal dynamics and external drift. The study establishes that, for small open models producing about eight hundred tokens, internal activation structure remains stable under normal conditions.
The result does not prove collapse is impossible. It sets a boundary. If collapse arises, it is likely scale specific, length specific, or prompt specific. The study provides a reproducible pipeline for testing those regimes and a benchmark null for future claims.
We tested whether internal activation entropy systematically declines during long-form generation in open-weight language models, a proposed signature of epistemic entropy collapse. Using a reproducible mechanistic pipeline, we measured variance, effective rank, and participation ratio of hidden states in Phi-2 and Mistral-7B across 346 prompts producing eight hundred token outputs. We found no consistent decline in internal metrics and no predictive relationship between early-window entropy and end-of-sequence failure (ROC AUC about 0.46). Mean ECI was essentially zero with only about ten percent of prompts showing small negative values. The relationship between entropy decline and semantic drift was near zero at both sequence and window scales. These null results indicate that, for small open transformers and typical generation lengths, internal representations remain dynamically stable. We release the code and outputs as a reproducible null benchmark for long-context activation dynamics and as a calibration point for future claims about collapse at larger scales or under specialized prompting. Total compute cost was about eighteen dollars on consumer hardware.
Claims about entropy collapse come with strong implications. If internal representations compress by default, long-form reasoning would be fundamentally unstable. Systems would require constant grounding, and failures would accumulate as sequences grow. If collapse does not happen under normal settings, researchers can focus on other causes of drift and error.
This study shows that collapse is not a default behavior in small open models. Internal geometry stays steady. Failures arise, but not because the hidden space is compressing. This clarifies where collapse is not happening and narrows the range of conditions where it might. It also demonstrates that mechanistic interpretability can be executed at low cost and provides a blueprint for scaling the experiment to larger models, longer contexts, and recursive prompting regimes.
- No systematic decline in effective rank, participation ratio, or variance
- About ten percent of sequences show small negative slopes, symmetric with positive slopes
- Early window entropy does not predict end-of-sequence failures
- Correlation between entropy dynamics and semantic drift is near zero
- Internal dimensionality stays stable across eight hundred token outputs
- Failures are not caused by collapsing internal geometry
- Null result bounds the space where collapse claims should be tested next
- The pipeline runs on consumer hardware and is fully reproducible
