Claude Opus 4.8 catches 4× more code bugs, leads on browser-agent benchmarks

Anthropic released Claude Opus 4.8 this week with improved code reliability, stronger agentic performance, and new effort-control and dynamic-workflow features for Claude Code.

ByAlex Sokoloff·May 29, 2026

Claude Opus 4.8 catches 4× more code bugs, leads on browser-agent benchmarks

Anthropic released Claude Opus 4.8 on May 28, an upgrade to its flagship model class that catches its own code bugs four times more often and scores 84 percent on Online-Mind2Web, a benchmark for browser-agent tasks where it now leads GPT-5.5. On the Legal Agent Benchmark, Opus 4.8 is the first model to cross 10 percent under the all-pass standard, a strict measure requiring every step in a multi-stage legal workflow to succeed. Anthropic also reported gains in alignment metrics around user autonomy and acting in the user's interest, though it did not publish the underlying evaluation data.

Three interface changes ship with the release. An effort-control slider now appears next to the model picker in claude.ai and Cowork; high effort (the default) triggers extended thinking, while low effort prioritizes speed and token efficiency. Extra and max tiers are available for heavy tasks, with max appearing as "xhigh" inside Claude Code. Dynamic workflows in Claude Code (research preview, Enterprise / Team / Max tiers only) allow the model to spin up hundreds of parallel sub-agents in a single session, verify their output, and report back only when the entire plan completes—Anthropic positions this for codebase migrations spanning hundreds of thousands of lines, from kickoff to merge, with test suites as the acceptance gate. The Messages API now accepts system-role entries inside the messages array, letting developers update agent instructions mid-task without invalidating the prompt cache.

Pricing remains at Opus 4.7 rates, but fast mode now costs one-third what it did on prior models while running 2.5× faster. Token limits in Claude Code rose alongside per-tier effort budgets, though Anthropic did not publish the new caps. The company noted that low-effort Opus 4.8 sometimes outperforms max-effort Opus 4.7 on the same prompt, suggesting the base model improved enough to offset the reduced thinking budget. Anthropic also promised a Mythos release "in the coming weeks" for subscribers, though it did not specify whether Mythos is a model, a feature tier, or a research preview—the next question is whether dynamic workflows will expand to Pro subscribers or remain gated behind Enterprise pricing.

ZenCreator

Claude Opus 4.8 catches 4× more code bugs, leads on browser-agent benchmarks

More in Platform

Apple accuses OpenAI of soliciting hardware prototypes in job interviews

Lightweight proxy models cut LLM post-training costs while enabling cross-model signal reuse

Colibri runs 744B GLM-5.2 on 25GB RAM by streaming experts from disk

Anthropic extends Fable 5 preview a second week, bumps rate limits 50%

Soofi S 30B activates 3B parameters per token, tops European AI baselines