EvolveMem autonomously tunes LLM agent memory retrieval mid-deployment
A new preprint describes a memory architecture that rewrites its own scoring functions and fusion strategies through iterative diagnosis loops, posting 25.7% relative gains over fixed-config baselines on multi-session benchmarks.

Most long-term memory systems for LLM agents let stored content grow while the machinery that retrieves it—scoring functions, fusion rules, answer-generation policies—stays locked at launch. A team from UC Irvine, Zhejiang University, and UC Santa Cruz now argues that real adaptivity demands both layers evolve in tandem.
EvolveMem, detailed in a preprint released May 15, treats the entire retrieval configuration as a structured action space that an LLM-powered diagnosis module can rewrite on the fly. Each evolution round reads per-question failure logs, identifies root causes, and proposes targeted adjustments; a guarded meta-analyzer applies changes with automatic rollback-on-regression and explore-on-stagnation logic. The authors call this closed loop an AutoResearch process—the system runs iterative research cycles on its own architecture without human intervention.
Starting from a minimal baseline, the process converges autonomously and discovers effective retrieval strategies, including configuration dimensions absent from the original action space. On the LoCoMo benchmark, EvolveMem beats the strongest fixed-config baseline by 25.7 percent relative and the minimal baseline by 78.0 percent relative. On MemBench it posts an 18.9 percent relative gain over the top baseline. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, suggesting the self-evolution captures universal retrieval principles rather than dataset-specific tricks.
The preprint is available on arXiv (2605.13941) and code ships at github.com/aiming-lab/SimpleMem. Authors are Jiaqi Liu, Xinyu Ye, Peng Xia, Zeyu Zheng, Cihang Xie, and Mingyu Ding.