M5 MacBook Pro doubles DGX Spark token throughput in three-day hardware benchmark
A three-day parallel benchmark run compares Apple's M5 MacBook Pro against Nvidia's DGX Spark, AMD's Strix Halo, and a workstation RTX 6000, showing memory bandwidth as the primary driver of local AI inference speed, with the maxed-out M5 pulling ahead of the Spark at its price point.

A three-day parallel benchmark run puts Apple's M5 MacBook Pro, Nvidia's DGX Spark, AMD's Strix Halo, and a workstation RTX 6000 through standardized local AI tests. The maxed-out M5 delivered roughly double the token throughput of the DGX Spark, tracking the 2× gap in unified memory bandwidth—600 GB/s for the M5 versus 256 GB/s for the Spark. The RTX 6000, with 1,800 GB/s of VRAM bandwidth, led the pack in raw tokens per second, following the same bandwidth-to-performance curve.
Thermal performance varied significantly across platforms. The M5 MacBook Pro held steady in the low 80s Celsius under sustained load but ran its fans at gaming-laptop volume—audibly loud when pushed to full capacity. The AMD Strix Halo system (an EVO X2 chassis) hit thermal limits during extended runs, throttling performance. The M5's aluminum unibody dissipated heat more consistently than the Spark's compact form factor, which shares the same 128 GB unified memory ceiling but at half the bandwidth. Despite the noise, the MacBook's sustained thermal stability outpaced both competing integrated solutions.
Price-per-token comparisons favor the M5 when ecosystem lock-in isn't a concern—Apple's silicon undercuts the Spark's enterprise positioning while outperforming it on memory-bound inference tasks. Raw token-per-second data, power draw logs, and thermal profiles for each platform are available in a public repository. The tester notes the RTX 6000 is not identical to the consumer RTX 5090 but shares enough architecture that the numbers may guide buyers choosing between a discrete-GPU workstation and an integrated SoC laptop.
The next round of tests will swap in MLX on the Mac and alternate hosting backends on Strix Halo to measure how software stack choices shift the rankings. Watch for updates on quantization formats, batch sizes, and whether the M5's lead holds when running multimodal models that stress both compute and memory subsystems.