SkillsVote: Lifecycle Governance Lifts GPT-5.2 by 7.9 Points on Terminal-Bench 2.0

New preprint introduces lifecycle governance for LLM agent skills, profiling a million-artifact corpus and filtering updates to prevent context pollution while lifting GPT-5.2 performance 7.9 points on Terminal-Bench 2.0.

May 16, 2026

SkillsVote: Lifecycle Governance Lifts GPT-5.2 by 7.9 Points on Terminal-Bench 2.0

A team spanning multiple research groups has released SkillsVote, a framework that treats agent experience as a governed library of reusable skills rather than raw execution logs. The preprint, posted to HuggingFace Papers on May 19, addresses a core problem in long-horizon LLM agents: trajectories accumulate noise, redundant artifacts pile up in open skill repositories, and indiscriminate updates to the context pool degrade future performance.

SkillsVote couples executable scripts with non-executable procedural guidance, profiles a million-scale open-source skill corpus for environment requirements and quality, then synthesizes verification tasks. Before execution, the system performs agentic library search over a structured skill index to surface relevant instructional context. After execution, it decomposes trajectories into skill-linked subtasks, attributes outcomes to skill use versus agent exploration versus environment factors, and admits only successful reusable discoveries through evidence-gated updates.

In evaluation, offline evolution improves GPT-5.2 by up to 7.9 percentage points on Terminal-Bench 2.0, while online evolution adds 2.6 points on SWE-Bench Pro. The framework ingests and classifies a million open-source agent artifacts for environment sensitivity, quality, and verifiability—filtering out the noise before it pollutes downstream context. By decomposing trajectories, SkillsVote isolates which outcome gains came from using an existing skill, which came from the agent's own reasoning, and which came from environment quirks or result signals, so only genuinely reusable patterns get preserved. When systems control exposure, credit assignment, and preservation rules, external skill libraries can improve frozen agents without touching the weights.

More in Releases