Survey maps world models across robot learning, navigation, and autonomous driving
Researchers published a comprehensive review examining how predictive world models support policy learning, planning, and simulation in robotics, connecting recent foundation-model advances to embodied AI applications.

A team led by Bohan Hou and Gen Li released a survey paper this week that systematically reviews world models in robot learning, covering everything from policy coupling to video generation and autonomous driving. The paper addresses what the authors describe as a fragmented literature spread across architectures, functional roles, and application domains.
World models are predictive representations of how environments evolve under actions. They've become central to robot learning because they support policy training, planning, simulation, evaluation, and data generation. The survey examines how these models integrate with robot policies, how they function as learned simulators for reinforcement learning, and how robotic video world models have progressed from imagination-based generation to controllable, structured, and foundation-scale formulations.
What stands out
- 01Functional coupling patterns: The paper maps how world models connect to robot policies, distinguishing between model-based reinforcement learning, planning-based control, and imagination-augmented training. Each coupling pattern trades off sample efficiency, computational cost, and generalization differently.
- 02Learned simulators for RL: World models now serve as drop-in replacements for physics engines in reinforcement learning pipelines. The survey reviews evaluation protocols that measure how well these learned simulators support policy optimization compared to ground-truth environments.
- 03Video world model progression: Robotic video generation has evolved from open-loop imagination (predict what happens next) to controllable generation (condition on actions), structured representations (object-centric, compositional), and foundation-scale models trained on internet video. The paper traces this arc and connects it to recent large-scale video diffusion work.
- 04