LEAD dynamically adjusts reasoning length per problem without sacrificing accuracy
New reinforcement learning technique replaces static compression with online adaptive mechanisms, outperforming existing efficient-reasoning methods on five mathematical benchmarks.

LEAD (Length-Efficient Adaptive and Dynamic reasoning) is a reinforcement learning method that addresses verbosity in large reasoning models like OpenAI o1 and DeepSeek-R1. Researchers from multiple institutions propose replacing static reward weights with online adaptive mechanisms that dynamically calibrate the trade-off between correctness and efficiency during training, producing shorter Chain-of-Thought outputs without degrading accuracy on mathematical reasoning tasks.
The core innovation is a Potential-Scaled Instability metric that adjusts the correctness-efficiency balance at each training step. Rather than applying a global length constraint, LEAD estimates an adaptive target length for each problem online based on the model's own correct rollouts. A symmetric efficiency reward then penalizes both overthinking (exceeding the target) and over-compression (cutting steps the problem actually requires).
What stands out
- 01Highest accuracy-efficiency score among RL-trained methods — LEAD outperformed existing efficient-reasoning approaches across five mathematical reasoning benchmarks while producing substantially shorter outputs than the base model.
- 02Per-problem dynamic budgeting — Instead of a global length constraint, the method estimates an adaptive target length for each problem online, avoiding the accuracy-compression compromise that static methods force.
- 03Online calibration throughout training — The Potential-Scaled Instability metric directs optimization capacity to the most informative learning signal at each step, adjusting the correctness-efficiency trade-off as training progresses.
- 04Symmetric penalty structure — The method penalizes both overthinking and over-compression, preventing the model from either inflating reasoning unnecessarily or cutting steps that problems require.