Autodata framework trains 4B models to outperform 397B giants on code and law
Meta researchers released Autodata, an agentic framework that generates high-quality synthetic training data through closed-loop evaluation and evolutionary prompt optimization.

Autodata is a framework from Meta that turns large language models into autonomous data scientists capable of generating high-quality synthetic training data. The preprint, authored by Ilia Kulikov, Chenxi Whitehouse, Tianhao Wu, and colleagues, replaces static prompt templates and simple filtering pipelines with a closed loop: generation, solver-based evaluation, error analysis, and instruction refinement. An outer evolutionary cycle automatically optimizes the prompts that drive the agents themselves.
The research addresses a bottleneck that has become acute as frontier commercial models approach human-level performance on standard benchmarks — the supply of high-quality human-generated training data. Standard synthetic generation often produces either trivial or impossibly hard examples. Autodata systematically converts test-time compute into structured curricula that stay in the "zone of proximal development," improving alignment and reasoning efficiency at the token level.
What stands out
- 01Small models outperform giants after fine-tuning. Qwen3.5-4B trained on Autodata-generated synthetic data for code and legal reasoning outperformed the untrained Qwen3.5-397B — a 100× parameter advantage erased by curriculum-driven synthetic data.
- 02Evolutionary meta-cycle optimizes prompts. The outer loop automatically evolves the agent prompts themselves, not just the generated examples. This meta-optimization layer distinguishes Autodata from earlier synthetic-data pipelines.
- 03Solver-based evaluation closes the loop. Instead of relying on human labels or model self-evaluation, Autodata uses task-specific solvers (code interpreters, formal verifiers) to score outputs and feed error signals back into the generation cycle.
- 04Token efficiency gains are real. Fine-tuned models not only solve harder problems but also reason more concisely, cutting redundant chain-of-thought steps and reducing inference cost per query.




