Latent Thought Flow cuts reasoning token overhead by internalizing CoT in hidden space

New framework models internal reasoning as continuous trajectories in representation space rather than discrete chain-of-thought tokens, slashing inference costs while preserving accuracy on math and logic tasks.

ByAlex Sokoloff·June 20, 2026

Latent Thought Flow cuts reasoning token overhead by internalizing CoT in hidden space

Latent Thought Flow (LTF), presented in an arXiv preprint this week by Xiandong Zou, Jing Huang, Jianshu Li, and Pan Zhou, models internal reasoning as continuous variable-length trajectories in representation space rather than discrete token chains. Instead of generating explicit chain-of-thought (CoT) steps—each requiring tokenization, detokenization, and cache overhead—LTF internalizes intermediate reasoning directly in the model's hidden states. The method trains using a continuous generative flow network (GFlowNet) loss with entropy-weighted subtrajectory balance and reference-prior regularization, learning a diverse posterior distribution over latent reasoning paths rather than collapsing to a single deterministic trajectory.

The approach addresses what the authors call the "language space bottleneck," where CoT's token-by-token generation burns compute on serialization overhead. By skipping the tokenization phase for intermediate thoughts, LTF adapts compute at test time—longer latent trajectories for harder problems, shorter for simpler ones—while maintaining accuracy on mathematical and logical benchmarks. The GFlowNet formulation preserves diversity across reasoning paths, a contrast to standard reinforcement learning methods that often converge to single high-reward trajectories. No code or model weights have been released yet. The paper positions LTF as a step toward more hardware-efficient reasoning models that exploit available compute without the interpretability tax of explicit token chains.

ZenCreator

Latent Thought Flow cuts reasoning token overhead by internalizing CoT in hidden space

More in Research

Qwen3.5-122B abliterated weights debut on HuggingFace

DoRA matches LoRA accuracy while IA³ cuts training memory by 40 percent

Amazon's Strands Agents deploys LeRobot policies to real robots in minutes

ChatGPT Enterprise gains per-team spending caps and usage dashboards

GPT-5.4 powers autonomous AI chemist to optimize drug synthesis