Live Music Diffusion Models run real-time on consumer laptops
A new preprint shows how audio diffusion models can be repurposed for real-time music generation through block-wise KV caching, matching autoregressive efficiency while enabling live performance on gaming hardware.
Live Music Diffusion Models (LMDMs) is a technique that adapts audio diffusion models for interactive streaming music generation, according to a preprint released May 22. The work addresses a longstanding bottleneck: while discrete autoregressive models dominate state-of-the-art music generation, they demand industrial-scale compute for both training and inference. Audio diffusion models have broader open-source support but were previously considered unsuitable for real-time use due to their bidirectional, non-streaming architecture.
Authors including Zachary Novack, Stephen Brade, and Hugo Flores García identify critical inefficiencies in the standard block-wise outpainting diffusion pipeline that make it computationally worse than autoregressive alternatives during inference. LMDMs solve this through a modification of the generative diffusion process that introduces block-wise KV caching — a technique borrowed from transformer inference optimization. The result is inference complexity that not only matches but outperforms discrete Live Music Models (LMMs), the prior autoregressive benchmark.
Key findings
- 01Consumer hardware viability. The paper demonstrates LMDMs running locally on a consumer gaming laptop, generating music in real time during a live artist-AI collaboration. The model acts as a "generative delay" that transforms a musician's improvisation on the fly for variable timbral effects.
- 02Post-training alignment without RL. LMDMs introduce ARC-Forcing, a novel paradigm that enables stable post-training alignment and reduces error accumulation. Unlike prior methods, this approach requires no explicit reinforcement learning or reward models.
- 03Multiple creative domains. The authors demonstrate LMDMs working across text-conditioned generation, sketch-based music synthesis, and live jamming scenarios. Each application runs with the same underlying architecture.
- 04


