Mem-π adaptive memory model outperforms retrieval systems 30% on web navigation

Researchers introduce a framework that trains a separate model to generate task-specific guidance for LLM agents only when needed, outperforming static retrieval-based memory across web navigation, tool use, and embodied tasks.

May 18, 2026

Mem-π adaptive memory model outperforms retrieval systems 30% on web navigation

A dedicated memory model that learns when to generate guidance and when to stay silent is now outperforming traditional retrieval-based systems in multi-step agent tasks, according to a new preprint released this week.

Mem-π, introduced by researchers at Mila and ServiceNow, replaces the similarity-search memory banks used in most LLM agents with a separate language or vision-language model that produces context-specific guidance on demand. The system trains this guidance model using a decision-content decoupled reinforcement learning objective, enabling it to decide both whether to produce guidance and what that guidance should contain. When generation would not improve performance, the model abstains entirely.

Existing memory-augmented agents typically retrieve static entries from episodic stores or skill libraries based on embedding similarity. These retrieved entries often misalign with the current context because they were written for different scenarios. Mem-π sidesteps this problem by conditioning generation on the live agent state and training the model to produce concise, task-relevant guidance only when needed. Because the guidance model's parameters remain separate from the agent's weights, it can specialize without retraining the downstream policy.

The authors tested Mem-π across web navigation (WebShop, WebArena), terminal-based tool use (InterCode), and text-based embodied environments (ALFWorld, ScienceWorld). On web navigation tasks, Mem-π achieved over 30 percent relative improvement compared to retrieval-based memory and prior RL-optimized baselines. The framework showed consistent gains across all three task categories, suggesting the adaptive generation approach generalizes beyond a single domain.

The preprint was authored by Xiaoqiang Wang, Chao Wang, Hadi Nekoei, Christopher Pal, Alexandre Lacoste, and Spandana Gella.

More in Releases