Claude Opus 4.8 catches 4× more code bugs, leads on browser-agent benchmarks
Anthropic released Claude Opus 4.8 this week with improved code reliability, stronger agentic performance, and new effort-control and dynamic-workflow features for Claude Code.

Anthropic released Claude Opus 4.8 on May 28, an upgrade to its flagship model class that catches its own code bugs four times more often and scores 84 percent on Online-Mind2Web, a benchmark for browser-agent tasks where it now leads GPT-5.5. On the Legal Agent Benchmark, Opus 4.8 is the first model to cross 10 percent under the all-pass standard, a strict measure requiring every step in a multi-stage legal workflow to succeed. Anthropic also reported gains in alignment metrics around user autonomy and acting in the user's interest, though it did not publish the underlying evaluation data.
Three interface changes ship with the release. An effort-control slider now appears next to the model picker in claude.ai and Cowork; high effort (the default) triggers extended thinking, while low effort prioritizes speed and token efficiency. Extra and max tiers are available for heavy tasks, with max appearing as "xhigh" inside Claude Code. Dynamic workflows in Claude Code (research preview, Enterprise / Team / Max tiers only) allow the model to spin up hundreds of parallel sub-agents in a single session, verify their output, and report back only when the entire plan completes—Anthropic positions this for codebase migrations spanning hundreds of thousands of lines, from kickoff to merge, with test suites as the acceptance gate. The Messages API now accepts system-role entries inside the messages array, letting developers update agent instructions mid-task without invalidating the prompt cache.
Pricing remains at Opus 4.7 rates, but fast mode now costs one-third what it did on prior models while running 2.5× faster. Token limits in Claude Code rose alongside per-tier effort budgets, though Anthropic did not publish the new caps. The company noted that low-effort Opus 4.8 sometimes outperforms max-effort Opus 4.7 on the same prompt, suggesting the base model improved enough to offset the reduced thinking budget. Anthropic also promised a Mythos release "in the coming weeks" for subscribers, though it did not specify whether Mythos is a model, a feature tier, or a research preview—the next question is whether dynamic workflows will expand to Pro subscribers or remain gated behind Enterprise pricing.

