AdaSR splits reasoning into streaming and deliberation phases for real-time LLMs

New adaptive streaming reasoning framework from EIT-NLP lets models reason while reading continuous input streams, optimizing computation allocation across streaming and deliberation phases.

ByAlex Sokoloff·June 15, 2026

AdaSR splits reasoning into streaming and deliberation phases for real-time LLMs

AdaSR is an adaptive streaming reasoning framework that enables large language models to reason while reading continuous input streams — audio, video, or any sequential data — rather than waiting for complete input before thinking. Researchers from EIT-NLP introduce Hierarchical Relative Policy Optimization (HRPO), a training method that splits policy optimization into streaming reasoning and deep reasoning phases, assigning fine-grained advantages to each stage instead of distributing a single sequence-level reward uniformly across all tokens. HRPO combines format enforcement, accuracy preservation, and adaptive thinking rewards to maintain valid reasoning protocols, preserve task performance, and encourage latency-aware computation allocation. The approach moves beyond supervised imitation of pre-constructed trajectories, which previous streaming reasoning methods relied on, giving models flexibility to decide when to think and how much computation to allocate at different stages.

Experiments show AdaSR achieves a better balance among reasoning accuracy, computational efficiency, and streaming latency compared to supervised fine-tuning baselines. The hierarchical advantage assignment in HRPO outperforms uniform reward distribution across reasoning tokens, particularly in dynamic scenarios where information arrives continuously and models must reason under partial observations. Code is available at https://github.com/EIT-NLP/StreamingLLM/tree/main/AdaSR.

ZenCreator

AdaSR splits reasoning into streaming and deliberation phases for real-time LLMs

More in Releases

PAJAMA distills LLM judges into programs, cuts eval cost by 100×

Molt: NVIDIA's PyTorch framework cuts agentic RL iteration cost

Hypernetworks outscale LoRA for train-time knowledge injection in LLMs

Staleness-Adaptive Trust Region cuts asynchronous RL performance loss to 3% at 8× policy lag

Distilled RL transfers knowledge across model families without unconditional imitation