Nvidia RTX Spark Superchip brings 120B local LLMs to Windows PCs

Nvidia and Microsoft announced the RTX Spark Superchip platform, an Arm-based unified-memory design that ships with 128 GB RAM to run 120-billion-parameter models in FP4 locally on laptops and desktops.

ByAlex Sokoloff·June 2, 2026

Nvidia RTX Spark Superchip brings 120B local LLMs to Windows PCs

Nvidia and Microsoft this week positioned their new PC platform as a clean break from the past — "PC 2," in the companies' words — built around on-device AI assistants that tap into Windows application windows without sending data to the cloud.

The RTX Spark Superchip combines GPU, CPU, and unified memory on Arm architecture, mirroring the design Apple has shipped in M-series MacBooks for the past few years. Laptops based on the platform will ship with 128 GB of memory, enough to run 120-billion-parameter language models quantized to FP4 precision. The chip's energy efficiency comes from the Arm instruction set and the shared-memory design, which eliminates the PCIe bottleneck between discrete GPU and system RAM.

The pitch centers on local AI assistants — tools like OpenClaw and Hermes that read screen content and automate workflows — running entirely on the device. Microsoft is promising granular privacy controls for which applications the assistant can access, a response to the backlash that followed earlier attempts at always-on screen recording. Desktop PCs will follow the laptop rollout, though no ship dates were announced.

One open question is how OpenAI and Anthropic fit into the ecosystem. Industry observers have speculated that the vendors could distribute encrypted local weights that require a subscription to unlock, letting users toggle between on-device and cloud models without changing the interface. That would preserve the revenue model while meeting the privacy and latency demands that make local inference attractive.

ZenCreator

Nvidia RTX Spark Superchip brings 120B local LLMs to Windows PCs

More in Platform

Claude Design launches as Anthropic Labs visual collaboration tool

Apple accuses OpenAI of soliciting hardware prototypes in job interviews

Lightweight proxy models cut LLM post-training costs while enabling cross-model signal reuse

Colibri runs 744B GLM-5.2 on 25GB RAM by streaming experts from disk

Anthropic extends Fable 5 preview a second week, bumps rate limits 50%