MELT breaks memory scaling in recurrent language models with shared KV cache

Memory-Efficient Looped Transformer shares a single KV cache per layer across reasoning loops, breaking the linear memory-depth scaling that limits recurrent language models.

May 12, 2026

MELT breaks memory scaling in recurrent language models with shared KV cache

Recurrent language models that reason by looping over embeddings—without generating intermediate tokens—hit a memory wall when they scale up reasoning depth. Each iteration adds to a growing key-value cache, forcing memory consumption to climb linearly with the number of reasoning steps. A new preprint proposes Memory-Efficient Looped Transformer (MELT), an architecture that holds memory constant regardless of reasoning depth.

MELT maintains one KV cache per layer, shared across all reasoning iterations. A learnable gating mechanism updates that cache over time instead of appending a fresh cache for each loop. The result: memory consumption stays flat as reasoning depth increases, while standard looped models like Ouro see linear growth. According to the paper, MELT models fine-tuned from pretrained Ouro checkpoints outperform comparable-size standard LLMs on reasoning tasks, with a memory footprint matching those non-looped baselines and dramatically smaller than Ouro's.

The training procedure combines two phases of post-training: interpolated transition from a LoopLM starting model, followed by attention-aligned distillation. Both phases use chunk-wise training to stabilize learning under the shared-cache design. The authors emphasize that MELT requires only lightweight post-training on top of an existing looped checkpoint, making it a practical retrofit for recurrent architectures already in use.

No code or weights have been released yet.

More in Releases