RAVEN training framework cuts distribution gap in real-time video diffusion
New preprint introduces RAVEN, a training framework that exposes video diffusion models to their own imperfect outputs during training, plus CM-GRPO, an RL technique that tunes consistency sampling without auxiliary flow processes.

RAVEN is a training framework from researchers Yanzuo Lu, Ronglai Zuo, and Jiankang Deng that addresses a core problem in real-time autoregressive video generation: the mismatch between clean training data and the self-generated history a model must extrapolate from at inference time. Released May 15, 2026, the preprint demonstrates quality gains over recent causal video distillation baselines across semantic and dynamic benchmarks.
Autoregressive video diffusion models generate future frames by conditioning on previously synthesized chunks, enabling streaming output. Distilling these from high-fidelity bidirectional teachers produces few-step generators, but long-horizon quality degrades when the model encounters its own imperfect outputs during inference—a distribution it never trained on. RAVEN repacks each training rollout into an interleaved sequence of clean historical endpoints and noisy denoising states, forcing the model to attend to the same kind of history it will see in production. Downstream chunk losses then directly supervise the representations that future predictions depend on.
The paper also introduces Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online reinforcement learning directly to that kernel. This avoids the Euler-Maruyama auxiliary process used in prior flow-model RL methods, simplifying the optimization pipeline.
What stands out
- Training-time test rollouts. RAVEN runs self-extrapolation during training and interleaves clean reference frames with noisy denoising states, aligning the model's attention with the inference-time distribution it will actually face.
- Consistency-model GRPO. CM-GRPO reformulates consistency sampling as a conditional Gaussian transition and applies online RL directly, avoiding auxiliary flow processes.
- Benchmark gains. RAVEN outperforms recent causal video distillation baselines on quality, semantic coherence, and dynamic degree metrics. Combining RAVEN with CM-GRPO yields further improvements.