AsymFlow hits 1.57 FID on ImageNet, enables pixel fine-tuning from FLUX.2
A new flow-based architecture restricts noise prediction to low-rank subspaces, achieving state-of-the-art pixel-space image generation and enabling direct fine-tuning from latent models like FLUX.2.
AsymFlow, a flow-based generation method presented in a preprint on arXiv this week, rethinks velocity prediction in high-dimensional spaces by using rank-asymmetric parameterization. Noise prediction happens in a low-rank subspace while data prediction stays full-dimensional. The network then analytically recovers the full velocity field without architectural changes or modified sampling logic. On ImageNet 256×256, AsymFlow reaches 1.57 FID, outperforming prior DiT and JiT pixel diffusion models by a wide margin.
The practical breakthrough is fine-tuning. AsymFlow opens the first working path to convert pretrained latent flow models into pixel-space models. By aligning the low-rank pixel subspace to the latent space, the pixel model inherits the latent model's high-level structure and only needs to learn low-level corrections during fine-tuning. A pixel AsymFlow model fine-tuned from FLUX.2 klein 9B beats its latent base on HPSv3, DPG-Bench, and GenEval benchmarks while showing sharper visual realism in qualitative tests. The 1.57 FID on ImageNet marks the lowest reported score for a pixel diffusion model to date.
