RoboMemArena benchmark evaluates robot memory across 26 tasks with 1,000-step trajectories
Researchers introduce RoboMemArena, a large-scale robotic memory benchmark pairing simulated and real-world tasks to test how robots recall past observations and actions.

RoboMemArena is a robotic memory benchmark that measures how robots recall past observations and actions to complete long-horizon tasks in partially observable environments. The benchmark spans 26 tasks with average trajectory lengths exceeding 1,000 steps per task, and 68.9 percent of subtasks require memory to complete. A vision-language model designs and composes subtasks, generates full trajectories through atomic functions, and provides multimodal annotations including subtask instructions and native keyframe labels. Paired real-world memory tasks support physical evaluation beyond simulation.
The researchers also introduce PrediMem, a dual-system vision-language-action model in which a high-level VLM planner manages a memory bank with recent and keyframe buffers. A predictive coding head improves sensitivity to task dynamics. Experiments show PrediMem outperforms all baselines and offers insights into memory management, model architecture, and scaling laws for complex memory systems. The work addresses gaps in existing robotic memory evaluations, which lack multimodal annotations for memory formation, provide limited task coverage and structural complexity, and remain restricted to simulation.