Latent Thought Flow cuts reasoning token overhead by internalizing CoT in hidden space
New framework models internal reasoning as continuous trajectories in representation space rather than discrete chain-of-thought tokens, slashing inference costs while preserving accuracy on math and logic tasks.
Latent Thought Flow (LTF), presented in an arXiv preprint this week by Xiandong Zou, Jing Huang, Jianshu Li, and Pan Zhou, models internal reasoning as continuous variable-length trajectories in representation space rather than discrete token chains. Instead of generating explicit chain-of-thought (CoT) steps—each requiring tokenization, detokenization, and cache overhead—LTF internalizes intermediate reasoning directly in the model's hidden states. The method trains using a continuous generative flow network (GFlowNet) loss with entropy-weighted subtrajectory balance and reference-prior regularization, learning a diverse posterior distribution over latent reasoning paths rather than collapsing to a single deterministic trajectory.
The approach addresses what the authors call the "language space bottleneck," where CoT's token-by-token generation burns compute on serialization overhead. By skipping the tokenization phase for intermediate thoughts, LTF adapts compute at test time—longer latent trajectories for harder problems, shorter for simpler ones—while maintaining accuracy on mathematical and logical benchmarks. The GFlowNet formulation preserves diversity across reasoning paths, a contrast to standard reinforcement learning methods that often converge to single high-reward trajectories. No code or model weights have been released yet. The paper positions LTF as a step toward more hardware-efficient reasoning models that exploit available compute without the interpretability tax of explicit token chains.




