MOCHA breaks through on all six agent-skill benchmarks where existing optimizers stalled
A new multi-objective prompt optimizer uses Chebyshev scalarization and exponential annealing to balance task performance against platform constraints, improving agent skill correctness by up to 14.9 percent on benchmarks where existing methods made zero progress.

MOCHA (Multi-Objective Chebyshev Annealing) is a prompt optimization framework that treats LLM agent skills as multi-objective artifacts. The preprint, posted to HuggingFace Papers on May 21, addresses a problem existing optimizers ignore: agent skills—structured natural-language specifications that govern reasoning, retrieval, and response—must simultaneously maximize task performance and satisfy hard platform constraints like description-field truncation, instruction compaction, and context-window limits. Current optimizers either collapse these trade-offs into a single weighted sum or ignore them entirely, missing Pareto-optimal variants in non-convex objective regions.
MOCHA replaces single-objective selection with Chebyshev scalarization, which covers the full Pareto front including non-convex regions, and pairs it with exponential annealing that shifts from exploration to exploitation over the optimization run. In experiments across six agent tasks—FEVER fact verification, TheoremQA mathematical reasoning, and four others—existing optimizers failed to improve the seed skill on four of six benchmarks even after 1,000 rollouts. MOCHA broke through on every task, achieving a 7.5 percent relative improvement in mean correctness over the strongest baseline, with gains reaching 14.9 percent on FEVER and 10.4 percent on TheoremQA. The method discovered twice as many Pareto-optimal skill variants as any competing approach, all while sharing the same multi-objective mutation operator and per-objective textual feedback as the baselines.