ZenCreator

Pro-grade AI content creation. Image, video, face-swap, lipsync, and upscaling behind one API.

14 tools

Up to 4K

4.4(288)

Visit

Loading…

Alibaba DAR doubles early training speed in diffusion transformers via adaptive routing | UncensoredHub

ResearchNSFW

Alibaba DAR doubles early training speed in diffusion transformers via adaptive routing

Alibaba researchers propose Diffusion-Adaptive Routing (DAR), a timestep-dependent layer-merging technique that replaces residual connections in diffusion transformers, preserving high-frequency detail during distillation and accelerating early training by 2× when paired with REPA.

ByAlex Sokoloff·May 27, 2026

Alibaba DAR doubles early training speed in diffusion transformers via adaptive routing

Alibaba researchers have published a preprint introducing Diffusion-Adaptive Routing (DAR), a replacement for residual connections in diffusion transformers. The technique routes layer outputs and denoising steps based on the current timestep, aiming to preserve high-frequency detail when distilling large text-to-image models.

According to the arXiv preprint (2605.20708), DAR doubles training speed during early phases when combined with REPA, another optimization method. The approach dynamically adjusts how layers are combined as the denoising process progresses, rather than using fixed residual paths throughout.

What stands out

01Timestep-dependent routing. DAR adapts its layer-merging strategy to the current diffusion timestep, allowing the network to handle different noise levels with different computational paths.
02High-frequency preservation. The method is designed to retain fine detail during model distillation, a process that often blurs sharp edges and textures in image generation.
032× early training speedup. When paired with REPA, DAR cuts early-stage training time in half compared to standard residual connections, though the preprint does not specify hardware or dataset details.
04Targets large text-to-image models. The technique is framed for distilling and accelerating models at the scale of FLUX, Stable Diffusion 3, and similar transformer-based diffusion architectures.
05Preprint stage only. No code, weights, or implementation details have been released. The arXiv submission remains the sole public artifact.

ZenCreator

Alibaba DAR doubles early training speed in diffusion transformers via adaptive routing

What stands out

More in Research

Apple accuses OpenAI of soliciting hardware prototypes in job interviews

Lightweight proxy models cut LLM post-training costs while enabling cross-model signal reuse

Colibri runs 744B GLM-5.2 on 25GB RAM by streaming experts from disk

Anthropic extends Fable 5 preview a second week, bumps rate limits 50%

Soofi S 30B activates 3B parameters per token, tops European AI baselines