Sber's GFusion-10B cuts text generation 72% faster with discrete diffusion
Sber's AI team open-sourced GFusion-10B-A1.8B, a discrete diffusion variant of GigaChat that delivers 72% faster generation with minimal quality loss.
GFusion-10B-A1.8B, Sber's discrete diffusion language model, replaces the standard autoregressive sampling loop in GigaChat with a diffusion-based approach. The team reports generation speed gains of 72 percent over the baseline while maintaining benchmark scores within a few points of the original model. The work was led by an intern on Sber's pretraining team and released under an open license this week.
Discrete diffusion models generate text by iteratively refining a noisy token sequence rather than predicting one token at a time left-to-right. The approach has been explored in academic papers for years but rarely deployed at scale in production-grade open-weight releases. GFusion applies the technique to a 10-billion-parameter backbone with a 1.8-billion-parameter active subset, making it one of the first open discrete diffusion models in the 10B class to ship with full weights and inference code. The speed improvement comes from the diffusion sampling path, which can generate multiple tokens in parallel during each denoising step. Traditional autoregressive models must wait for each token before computing the next, creating a serial bottleneck that discrete diffusion sidesteps.
Weights are available on HuggingFace under ai-sage/GFusion-10B-A1.8B. A pull request adding SGLang support is in review, which would bring the model into the same inference stack used by many open-weight practitioners running Llama, Qwen, and other popular architectures. The model runs on consumer GPUs and supports the same context length as the base GigaChat checkpoint.




