Microsoft Lens generates 1440×1440 images with 3.8B parameters and four-step turbo mode
Microsoft released Lens, a 3.8-billion-parameter text-to-image model that generates 1440×1440 images, alongside a four-step turbo variant for faster inference.
Microsoft released Lens this week, a 3.8-billion-parameter text-to-image model that generates 1440×1440-pixel images. The release includes three variants: a base model, a standard Lens checkpoint, and Lens-Turbo, a four-step distilled version designed for faster generation.
All three checkpoints are available on HuggingFace under Microsoft's research license, which permits academic and non-commercial use; commercial deployment requires a separate agreement. A public demo Space runs the turbo variant using Diffusers pipelines, making it straightforward to integrate into existing ComfyUI or A1111 workflows.
What stands out
- 01Parameter count: 3.8 billion parameters places Lens in the mid-weight class—larger than SDXL's 2.6B U-Net but smaller than FLUX.1's 12B transformer.
- 02Native resolution: 1440×1440 output exceeds SDXL's 1024×1024 default and matches the square format many practitioners use for social media and print.
- 03Turbo inference: The four-step distilled variant targets real-time workflows. Standard diffusion models typically require 20-50 steps; four-step distillation trades some fidelity for 5-10× faster generation.
- 04Open weights: Base, standard, and turbo checkpoints are all downloadable and runnable locally.
- 05Diffusers integration: The demo Space uses Diffusers pipelines, enabling drop-in adoption across existing community tools.


