SkillOpt boosts GPT-4.5 reasoning by 23.5 points through text-space instruction optimization

Microsoft researchers released SkillOpt, a text-space optimizer that treats agent skills as trainable external state and compiles domain adaptations into human-readable markdown instruction files.

ByAlex Sokoloff·May 29, 2026

SkillOpt boosts GPT-4.5 reasoning by 23.5 points through text-space instruction optimization

SkillOpt, a text-space optimizer from Microsoft Research, treats AI agent skills written in natural language as trainable external state. Instead of manual prompt engineering or chaotic auto-generation, the system structures behavior updates using deep-learning-inspired controls: text-space learning rates implemented as edit budgets, strict validation filters, buffers for rejected edits, and slow meta-updates at the epoch level. The result is stable, reproducible offline optimization for both frozen frontier models and smaller local LLMs.

On reasoning, table-processing, and agent-control benchmarks, SkillOpt lifted GPT-4.5 accuracy by an average of 23.5 percentage points with zero inference-time latency penalty and no extra model calls during inference. The gains come entirely from better instruction files, not from additional compute at runtime. The approach compiles complex domain adaptations into ordinary markdown files that humans can read and audit.

Because optimized skills live in markdown rather than model weights, practitioners can transfer high-quality instruction sets from powerful models to lighter local LLMs like Qwen at no cost. The markdown skill files are compact, version-controllable, and require no fine-tuning of weights. This portability matters for teams running uncensored or domain-specific agents on local hardware—skills optimized once on a frontier model can drop directly into a self-hosted stack.

The system borrows vocabulary from gradient descent but operates entirely in text space. Edit budgets act as learning rates, controlling how aggressively the optimizer rewrites agent instructions. Validation filters reject updates that degrade performance on held-out examples. A buffer tracks rejected edits so the optimizer can learn from failures across epochs. The meta-update layer adjusts strategy parameters slowly, mimicking the outer loop of hyperparameter tuning in neural network training.

The arXiv preprint and code repository were published on May 28, 2026. No pre-trained model weights are distributed—SkillOpt outputs are plain-text instruction files that plug into any compatible agent framework.

ZenCreator

SkillOpt boosts GPT-4.5 reasoning by 23.5 points through text-space instruction optimization

More in Research

Apple accuses OpenAI of soliciting hardware prototypes in job interviews

Lightweight proxy models cut LLM post-training costs while enabling cross-model signal reuse

Colibri runs 744B GLM-5.2 on 25GB RAM by streaming experts from disk

Anthropic extends Fable 5 preview a second week, bumps rate limits 50%

Soofi S 30B activates 3B parameters per token, tops European AI baselines