Cerebras runs trillion-parameter Kimi K2.6 at 1,000 tokens per second
Cerebras deployed Kimi K2.6, a trillion-parameter model, at 1,000 tokens per second for enterprise clients — the first time a model that size has reached that speed on the company's hardware.

Cerebras deployed Kimi K2.6, a trillion-parameter model from Moonshot AI, at 1,000 tokens per second this week — the first time a model that size has run at that speed on the company's CS-3 wafer-scale chips. The deployment is currently available only to enterprise customers.
Kimi K2.6 is Moonshot AI's flagship long-context model, released in March 2026 with a 2.6-million-token window and support for 40 languages. Running it at 1,000 tokens per second on Cerebras hardware represents a substantial leap over typical inference speeds for models this large. Standard H100 GPU clusters serving trillion-parameter checkpoints typically deliver 50–150 tokens per second per user, making Cerebras' claimed speed roughly 7–20× faster than conventional deployments. Cerebras' wafer-scale engine architecture packs 850,000 cores onto a single silicon wafer the size of a dinner plate, designed to eliminate the memory bandwidth bottlenecks that slow down multi-GPU inference clusters.
The company went public last week in a $5.5 billion IPO that valued Cerebras at $60 billion. The Santa Clara chip maker's previous largest offering was GLM 4.7, a 358-billion-parameter model — less than a third the size of Kimi K2.6.