Multi-Stream LLMs let agents read, write, and reason in parallel
New instruction-tuning approach splits language model computation into concurrent streams, letting agents process inputs while generating outputs and reasoning simultaneously.

Researchers Guinan Su, Yanwu Yang, Xueyan Li, and Jonas Geiping propose Multi-Stream LLMs, an instruction-tuning architecture that replaces the sequential bottleneck in current language models with parallel computation. Published on arXiv this week, the work splits model computation into multiple concurrent streams—one for each role in agent workflows—so models can simultaneously read from multiple input streams and generate tokens across multiple output streams within a single forward pass.
Today's chat models, including those powering autonomous coding and computer-use agents, operate on a single sequential stream. They exchange messages one at a time with users, tools, and themselves via chain-of-thought reasoning. That architecture forces hard trade-offs: the model cannot act while reading new information, cannot react to inputs while writing, and cannot think while doing either. Multi-Stream LLMs removes those constraints by instruction-tuning models to handle concurrent streams, each causally dependent on earlier timesteps but processed in parallel.
Efficiency and security gains
The paper argues the shift from sequential to parallel streams delivers three benefits: efficiency through parallelization, better security by separating concerns across streams, and improved monitorability—each stream can be inspected independently. The authors frame this as a data-driven change to instruction-tuning rather than a fundamental model architecture overhaul. Every forward pass now reads and writes across multiple streams simultaneously, unblocking the agent to think, read, and act at the same time.
The preprint contains no code, weights, or benchmark comparisons. Real-world performance data and implementation details remain unpublished, positioning this as a conceptual contribution to agent design rather than a ready-to-deploy system.