Loading…

1 trillion-parameter Kimi K2.5 runs at 4 tokens/sec on recycled Intel Optane memory | UncensoredHub

Community

1 trillion-parameter Kimi K2.5 runs at 4 tokens/sec on recycled Intel Optane memory

A local inference builder demonstrates that discontinued Intel Optane Persistent Memory can run frontier-class models efficiently: 768GB of secondhand PMem paired with a 12GB GPU achieves 4 tokens per second on Kimi K2.5's trillion-parameter mixture-of-experts architecture.

May 12, 2026

1 trillion-parameter Kimi K2.5 runs at 4 tokens/sec on recycled Intel Optane memory

A local inference enthusiast has successfully run Kimi K2.5, a trillion-parameter mixture-of-experts model, at roughly 4 tokens per second using a build centered on Intel Optane Persistent Memory. The system uses 768GB of secondhand Optane PMem—a discontinued DIMM form factor memory that sits between DRAM and SSD speeds—configured in Memory Mode with standard DRAM acting as cache. The builder sourced the Optane sticks on the secondhand market for significantly less than equivalent DRAM capacity would cost.

The build runs Kimi K2.5's Unsloth Q2_K_XL quant via hybrid GPU/CPU inference in llama.cpp. The model's attention weights, dense layer, shared expert, and routing components fit on a 12GB RTX 3060 using llama.cpp's "override-tensor" flag, while the sparse experts—the bulk of the trillion-parameter weight set—live on PMem and DRAM and get processed on demand. The system pairs the Optane PMem with an Intel Xeon Gold 6246 CPU and a TYAN S5630GMRE-CGN motherboard, both compatible with the discontinued memory standard. Generation speeds of 4 tokens per second on a frontier-class model represent a practical win for memory tiering on a constrained hardware budget, even as Intel has discontinued the Optane line and the community explores SSD offloading and broader memory tiering alternatives.

More in Community