Training-free wrapper lifts VLA success rates 28.8% in moving-target tasks
Pace-and-Path Correction is a closed-form inference operator that fixes temporal blindness in vision-language-action models without retraining, boosting performance in dynamic environments by up to 28.8 percentage points.

Pace-and-Path Correction is a training-free inference-time operator that wraps any chunked-action vision-language-action model to fix its temporal blindness. The technique addresses a structural flaw in most VLA models: they're trained on single-frame observations and fail when objects or targets move during execution, even after fine-tuning on dynamic data.
The method solves a single quadratic cost function at inference time, decomposing the solution into two orthogonal channels. The pace channel compresses action execution along the planned direction, while the path channel applies a spatial offset perpendicular to that direction. Together they absorb perceived motion within the model's action-chunk window, letting the VLA adapt to moving targets without seeing multiple frames during training.
Tested on MoveBench, a diagnostic benchmark designed to isolate motion as the sole variable, Pace-and-Path Correction beat existing training-free wrappers and dynamic-adaptive methods. It lifted success rates by 28.8 percentage points over baseline VLA models in dynamic-only environments and 25.9 points in mixed static-dynamic scenarios.