AdaSR splits reasoning into streaming and deliberation phases for real-time LLMs
New adaptive streaming reasoning framework from EIT-NLP lets models reason while reading continuous input streams, optimizing computation allocation across streaming and deliberation phases.

AdaSR is an adaptive streaming reasoning framework that enables large language models to reason while reading continuous input streams — audio, video, or any sequential data — rather than waiting for complete input before thinking. Researchers from EIT-NLP introduce Hierarchical Relative Policy Optimization (HRPO), a training method that splits policy optimization into streaming reasoning and deep reasoning phases, assigning fine-grained advantages to each stage instead of distributing a single sequence-level reward uniformly across all tokens. HRPO combines format enforcement, accuracy preservation, and adaptive thinking rewards to maintain valid reasoning protocols, preserve task performance, and encourage latency-aware computation allocation. The approach moves beyond supervised imitation of pre-constructed trajectories, which previous streaming reasoning methods relied on, giving models flexibility to decide when to think and how much computation to allocate at different stages.
Experiments show AdaSR achieves a better balance among reasoning accuracy, computational efficiency, and streaming latency compared to supervised fine-tuning baselines. The hierarchical advantage assignment in HRPO outperforms uniform reward distribution across reasoning tokens, particularly in dynamic scenarios where information arrives continuously and models must reason under partial observations. Code is available at https://github.com/EIT-NLP/StreamingLLM/tree/main/AdaSR.





