Manifold steering reveals neural concepts live on curved surfaces, not lines
Researchers at Goodfire AI and Stanford challenge the Linear Representation Hypothesis with manifold steering, a geometry-aware intervention method tested on LLaMA 3.1 8B and 70B that navigates curved neural representations instead of flat vector space.
A preprint published May 12, 2025 on arXiv introduces manifold steering, a technique that navigates the curved geometric structures inside neural networks rather than treating internal activations as flat Euclidean space. The paper, authored by researchers at Goodfire AI, Stanford, and collaborators, argues that concepts inside foundation models lie on nonlinear manifolds—not straight lines—and that moving along these intrinsic curves produces smoother, more coherent behavior than traditional linear interventions. The team fit splines to both internal activations and external output distributions in LLaMA 3.1 8B and 70B, revealing a bidirectional isometry: the geometry of hidden states mirrors the geometry of model outputs.
Steering along manifold coordinates avoids the "teleportation" problem—sudden jumps through unnatural intermediate states—that plagues linear probing and vector arithmetic. The work directly challenges the Linear Representation Hypothesis, which holds that concepts are encoded as straight directions in activation space. Recent papers from Anthropic on counting geometry have hinted at non-Euclidean structure, but this study provides a practical intervention framework and open-source code. The authors demonstrate that respecting intrinsic manifold coordinates prevents diversity collapse and maintains natural output trajectories during steering. Code is available on GitHub under the Goodfire AI CausalAB repository; the method does not require retraining and works as a post-hoc intervention layer, making it directly applicable to existing open-weight models.
