FlowCompile compiler cuts structured LLM workflow latency by 6.4× through compile-time optimization

A new preprint from MIT and collaborators introduces FlowCompile, a compiler that profiles and composes sub-agent configurations before deployment, delivering up to 6.4× speedup over routing-based baselines without retraining.

May 15, 2026

FlowCompile compiler cuts structured LLM workflow latency by 6.4× through compile-time optimization

FlowCompile, a structured LLM workflow compiler from MIT researchers Junyan Li, Zhang-Wei Hong, Maohao Shen, Yang Zhang, and Chuang Gan, treats multi-agent systems as compilation targets rather than routing problems. Published this week on arXiv (2605.13647), the system profiles individual sub-agents under diverse model and reasoning-budget configurations at compile time, then uses a structure-aware proxy to estimate workflow-level accuracy and latency. The result is a reusable set of configurations spanning the accuracy-latency Pareto frontier, selected in a single pass without online adaptation or per-query routing overhead.

Structured LLM workflows—where specialized sub-agents execute according to a predefined graph—have become a standard abstraction for complex tasks, but optimizing them is combinatorially hard. Existing cost-aware methods typically train a router to pick configurations at inference time for each query, which ties optimization to a fixed accuracy-latency objective and requires retraining when deployment constraints shift. FlowCompile sidesteps this by decomposing the workflow, profiling each sub-agent independently, and composing those measurements into workflow-level estimates. The compiler then identifies a diverse set of high-quality configurations that can be reused across deployments, supporting downstream selection or routing without additional training.

Experiments across multiple workflows and benchmarks show FlowCompile consistently outperforms heuristically optimized baselines and routing-based systems, with speedups reaching 6.4× in some cases. The compiled configuration set is deployment-agnostic—runtime preferences can shift without recompilation, and the artifact can feed into downstream routers or selection logic. The preprint does not specify whether the compiler itself is open-sourced or what the profiling overhead is for large workflows, which will be key to adoption in production multi-agent systems. The next step is likely a reference implementation and benchmarks on real-world agentic frameworks like LangGraph or AutoGPT, where workflow graphs are already standard but optimization remains ad hoc.

More in Releases