ELF diffusion model generates text in continuous embedding space with fewer sampling steps
Embedded Language Flows (ELF), a new diffusion model from MIT and Harvard researchers, generates text by working in continuous embedding space until the final step, beating existing discrete and continuous diffusion language models on quality and speed.

Embedded Language Flows (ELF) is a diffusion-based language model that operates in continuous embedding space rather than discrete tokens. Published May 12, 2026, the paper by Keya Hu, Linlu Qiu, Yiyang Lu, Hanhong Zhao, Tianhong Li, and Yoon Kim shows that staying continuous until the final decoding step lets the model borrow techniques from image diffusion—like classifier-free guidance—that have been difficult to apply to token-level language models. Most diffusion language models today work at the token level, treating words or subwords as discrete units. ELF uses continuous-time Flow Matching to generate embeddings in a smooth vector space, then maps those embeddings to discrete tokens only at the final step via a shared-weight network. That design choice makes it straightforward to port classifier-free guidance and other image-domain techniques that assume continuous data.
The authors report that ELF substantially outperforms leading discrete and continuous diffusion language models on generation quality while requiring fewer sampling steps. For practitioners running language models locally, fewer steps means faster inference, and continuous embeddings open the door to guidance techniques that have been difficult to apply to token-level diffusion. The preprint is available on arXiv (2605.10938) and HuggingFace Papers.