EVOCHAMBER evolves agent teams at test time, reaching 63.9% on competition math
A training-free framework lets agent teams self-organize, specialize, and route knowledge asymmetrically during inference, with four to five stable specialists emerging spontaneously from identical initializations.

EVOCHAMBER, a training-free framework from researchers at Microsoft, University of Virginia, and Penn State, evolves multi-agent systems during inference across three scales—individual agents, teams, and populations—without updating model weights. Running on Qwen3-8B, the system reached 63.9% accuracy on competition math, 75.7% on coding tasks, and 87.1% on multi-domain reasoning, outperforming the best baseline by 32% relative on math.
The core mechanism is CODREAM (Collaborative Dreaming), a post-task reflection protocol triggered when a team fails or agents disagree. Rather than broadcasting insights symmetrically to all agents—which erases specialization—CODREAM routes knowledge asymmetrically from stronger agents to weaker ones on the failed niche, preserving the division of labor that makes collaboration valuable. Team-level operators assemble niche-conditioned teams and select collaboration structures on the fly. Population-level lifecycle operators fork, merge, prune, and seed agents under performance pressure.
Starting from several identically initialized Qwen3-8B agents, four to five stable niche specialists spontaneously emerged over the test stream, a structural signature the authors argue no single-agent learner can produce. Ablation studies confirm asymmetric cross-agent transfer as the primary performance driver. The code is available on GitHub. Open questions remain about whether the same lifecycle operators generalize to larger base models and whether emergent specialization persists when task distribution shifts mid-stream.