ZenCreator

Pro-grade AI content creation. Image, video, face-swap, lipsync, and upscaling behind one API.

14 tools

Up to 4K

4.4(288)

Visit

Loading…

Ideogram 4 open-weights: 9.3B DiT with Qwen vision encoder, native 2K | UncensoredHub

ReleasesNSFWPlatform

Ideogram 4 open-weights: 9.3B DiT with Qwen vision encoder, native 2K

Ideogram released its first open-weight text-to-image model, a 9.3B-parameter single-stream Diffusion Transformer trained from scratch with multilingual text rendering, JSON-structured prompts, and explicit color control.

ByAlex Sokoloff·June 6, 2026

Ideogram 4 open-weights: 9.3B DiT with Qwen vision encoder, native 2K

Ideogram 4, a 9.3-billion-parameter text-to-image model, is now open-weight. Released this week, it marks Ideogram's first open-source release and the first model the company trained from scratch rather than fine-tuned from an existing base. The weights ship in nf4 (CUDA) and fp8 formats, with additional quantizations promised. At 9.3B parameters, Ideogram 4 runs on consumer GPUs—substantially smaller than Qwen-Image (20B) or FLUX.2 dev (32B).

The architecture is a fully single-stream Diffusion Transformer with 34 layers. Text and image tokens are concatenated into a unified sequence and processed through the same transformer, with no separate branches. Instead of a text-only encoder like CLIP or T5, Ideogram 4 uses Qwen3-VL-8B-Instruct, a full vision-language model that provides richer understanding of visual concepts. The model was trained on JSON-structured prompt annotations and includes a built-in prompt enhancer and prompt guide.

What stands out

01Native 2K resolution and extreme aspect ratios. The model generates images at 2048px natively and supports aspect ratios up to 6:1, wider than most open-weight competitors.
02Multilingual text rendering. Ideogram 4 ships with what the team calls "best-in-class" multilingual text rendering, a capability that has historically been weak in open models.
03Explicit color palette control. The JSON prompt interface allows direct specification of color palettes, giving users fine-grained control over output aesthetics.
04Vision-language encoder. Using Qwen3-VL-8B-Instruct as the text encoder instead of a text-only model is a structural departure from FLUX, SDXL, and most other open DiTs.
05

ZenCreator

Ideogram 4 open-weights: 9.3B DiT with Qwen vision encoder, native 2K

What stands out

More in Releases

Qwen-Music generates full vocal songs from text and lyrics

LongStraw trains RL models at 2.1M tokens on eight H20 GPUs

ShortOPD cuts pruned LLM recovery time by 75% while raising generation quality 9×

Claude Design launches as Anthropic Labs visual collaboration tool

Apple accuses OpenAI of soliciting hardware prototypes in job interviews