Llama-3-8B isolates task knowledge in 68-dimensional activation subspace
New arXiv preprint shows transformer in-context learning compresses task-relevant information into low-dimensional concept subspaces, restoring 78.8% of clean accuracy from just 68 dimensions of Llama-3's 4096-dimensional residual stream.
In-context learning operates by compressing task information into low-dimensional concept subspaces within transformer activations, according to a preprint released May 20. Researchers tested this hypothesis on Llama-3-8B using CounterFact-derived multi-relation prompts and found that a 68–73-dimensional subspace of the model's 4096-dimensional residual stream restored 78.8% of the clean–corrupted accuracy gap when patched, while patching the complementary dimensions restored 0%. That 68-dimensional slice represents just 1.66% of the residual stream's total dimensionality yet captures the majority of recoverable task information.
The theoretical framework decomposes in-context prediction into concept-coordinate regression and off-subspace leakage. Under block-diagonal or near-block-diagonal covariance assumptions, the leading estimation and nuisance-sensitivity terms scale with the dimension of the concept subspace, not the ambient space. Concept swaps successfully redirected predictions toward injected relations, whereas random and cross-task matched-rank controls showed negligible effect. Additional experiments on Qwen2.5-7B and a controlled cross-lingual rule task reproduced the same qualitative pattern, suggesting the phenomenon generalizes across model families and task structures.
The work bridges high-level Bayesian accounts of in-context learning—which explain how demonstrations induce predictors—with low-level mechanistic analyses that identify compact activation directions steering prompted behavior. The 68-dimensional subspace finding provides a concrete target for interpretability researchers working on steering and control of open-weight models.
