ZenCreator

Pro-grade AI content creation. Image, video, face-swap, lipsync, and upscaling behind one API.

14 tools

Up to 4K

4.4(288)

Visit

Loading…

Sumi 7B uniform diffusion model trains from scratch on 1.5 trillion tokens | UncensoredHub

ReleasesResearch

Sumi 7B uniform diffusion model trains from scratch on 1.5 trillion tokens

Researchers released Sumi, a 7B-parameter uniform diffusion language model trained from scratch on 1.5 trillion tokens, filling a gap in open research on non-autoregressive architectures.

ByAlex Sokoloff·June 18, 2026

Sumi 7B uniform diffusion model trains from scratch on 1.5 trillion tokens

Sumi is a 7-billion-parameter uniform diffusion language model trained from scratch on 1.5 trillion tokens. The model, whose name means "ink" in Japanese, is the first uniform diffusion language model (UDLM) pretrained at both large parameter scale and large token budget, according to a preprint released this week.

Uniform diffusion language models permit any token in a sequence to be updated at any step during generation, enabling more flexible generation than autoregressive models that produce tokens strictly left-to-right. While autoregressive models and masked diffusion models already have capable open implementations at scale, uniform diffusion has had none—until now.

What stands out

017B parameters, 1.5T tokens. Sumi was trained from scratch on a token budget comparable to recent open autoregressive models, providing a clean reference point for studying scaling behavior in uniform diffusion.
02Competitive on knowledge, reasoning, and coding. The model performs on par with autoregressive models trained at similar token budgets on knowledge-intensive, reasoning, and coding benchmarks.
03Weaker on commonsense tasks. Sumi under-performs on commonsense benchmarks, which the authors attribute to an education-heavy data mixture that may not reflect everyday reasoning patterns.
04Fully open release. The team released model weights, training checkpoints, and the complete training recipe, including a detailed specification of the data mixture over publicly available corpora.
05Research catalyst. The release is intended to enable the community to study native uniform diffusion at scale and investigate aspects of the architecture that remain poorly understood, including generation dynamics, controllability, and trade-offs against established paradigms.

ZenCreator

Sumi 7B uniform diffusion model trains from scratch on 1.5 trillion tokens

What stands out

More in Releases

Google's AMIE matches physicians in chronic disease management, Nature study finds

Anthropic opens Seoul office, expands Claude partnerships across Korea

Supervised Memory Training lets RNNs learn in parallel without backprop through time

PROPEL doubles learnable task generation for code agents without solver rollouts

O'Reilly preprint: mammalian cortex approximates backpropagation via 200-millisecond theta cycles