Claude Code swarms, Gemini 3.5 Flash triples price, Cerebras hits 1,000 tokens/sec
Anthropic shipped multi-agent orchestration and goal-mode in Claude Code, Google released Gemini 3.5 Flash with 3× pricing, and Cerebras demoed 1,000 tokens/sec on Kimi K2.6 for enterprise.

Anthropic rolled out multi-agent orchestration and a persistent goal-mode in Claude Code this month, letting developers run coordinated agent swarms that don't stop until the task is complete. The company also announced free API credits—up to $200 worth—for Pro subscribers using third-party tools built on the Agent SDK. Anthropic leased SpaceX's Colossus datacenter, and Claude's subscriber token limits doubled in return.
Google shipped Gemini 3.5 Flash with native agentic capabilities strong enough to write an operating system in twelve hours, according to internal demos. The model is noticeably smarter than its predecessor but carries a 3× price increase. Google also folded the Veo video line into a new unified Gemini Omni model that handles video generation natively.
Cursor released Composer 2.5 on K2.5 weights—the first model trained in SpaceX datacenters. The fast tier now costs twice as much, matching Sonnet pricing. Cerebras, fresh off its IPO, demoed Kimi K2.6 delivering 1,000 tokens per second on trillion-parameter scale, available only to enterprise customers for now. OpenAI fixed a cache bug that was burning through Codex rate limits and teased a /slow mode for large non-urgent jobs.
Detailed vLLM tests found that TurboQuant KV-cache quantization works locally but kills server throughput by 70 percent during dequantization, making it a non-starter for production inference clusters. Meanwhile, older A100 GPU rentals now cost more than they did two years ago, and H100 availability is near zero—a sign that hardware scarcity is reshaping pricing and partnership deals across the industry.
What stands out
- 01Multi-agent orchestration is production-ready. Claude Code's swarm mode and goal-persistence put coordinated agent workflows in the hands of Pro subscribers, not just research labs.
- 02Inference speed is the new benchmark. Cerebras hitting 1,000 tokens/sec on a trillion-parameter model signals that throughput—not just intelligence—is becoming the competitive edge for enterprise deployments.
- 03

