Distillation works better on wrong answers than right ones, new gradient analysis shows
Researchers introduce a training-free diagnostic framework that measures on-policy distillation signal quality at the per-token level, revealing that teacher guidance aligns better with ideal gradients on incorrect rollouts than correct ones.

A team led by Mohammadreza Armandpour has published a diagnostic framework that measures the quality of on-policy distillation signals without running expensive training loops. Posted to arXiv on May 12, 2026, the paper introduces a per-token, per-question analysis that compares any distillation gradient to an "ideal" gradient—the parameter update that would maximally increase a student model's probability of success on a given reasoning task. The framework uses a targeted-rollout algorithm to estimate this ideal gradient efficiently, even across long chains of intermediate reasoning steps.
The core finding challenges conventional wisdom: distillation guidance shows substantially higher alignment with the ideal gradient when the student model produces incorrect rollouts than when it already answers correctly. On correct rollouts, the teacher's signal becomes noisy and offers little value, since the student is already performing well. The researchers tested this across multiple self-distillation setups and external teacher models, finding no universal configuration that works across all tasks and model capacities. Instead, the optimal distillation context depends jointly on the student's size and the specific reasoning problem.
The gradient alignment score—defined as the cosine similarity between the ideal gradient and a given distillation gradient—serves as the quantitative measure. The paper argues that aggregate training metrics obscure token-level dynamics, and that practitioners should run per-task, per-token diagnostics before committing to a distillation strategy. The six-author team includes Fatih Ilhan, David Harrison, Ajay Jaiswal, Duc N. M Hoang, and Fartash Faghri.