SkillClaw framework lets LLM agents evolve shared skill libraries from collective logs
Researchers propose a system where agents pool execution logs and an autonomous evolver iteratively refines or creates new procedures in a centralized skill base, eliminating static prompt engineering.
SkillClaw is a framework from AMAP-ML that moves LLM agents away from fixed, hand-coded skills toward self-improving ecosystems. The system gathers execution logs from multiple agent instances, then uses an autonomous "agentic evolver" to iteratively refine or generate new procedures in a shared central library. The arXiv preprint and GitHub repository detail the architecture and validation runs.
Today's agents suffer from fragmented learning—different instances trip over the same edge cases repeatedly, with no mechanism to share fixes. SkillClaw formalizes a loop: collect logs, reason about failures, propose skill updates, validate them empirically, and commit improvements back to the shared pool. The result is monotonic accumulation of procedural intelligence without manual prompt engineering.
What stands out
- 01Collective feedback at scale. Every agent instance contributes execution traces to a central log store. The evolver mines those traces for recurring failure patterns and proposes targeted skill patches or entirely new procedures.
- 02Autonomous validation. Proposed skill changes run through a simulation layer that replays historical tasks. Only updates that pass empirical checks land in the production skill library, preventing regressions.
- 03Token cost is the tradeoff. Regular simulation sweeps and multi-turn evolver reasoning drive token consumption significantly higher than static-skill baselines. Practitioners will need to budget for continuous background inference.
- 04 Pooling logs from multiple users means the evolver sees cross-user data. The paper flags the need for strict filters to ensure private context doesn't leak into shared skill code.