Wan 2.2 Remix fine-tune adds image-to-video control to open-weight 14B model
A new fine-tune of Wan AI's 14B text-to-video model appeared on HuggingFace this week, extending the Wan 2.1 base with image-to-video and remix features.
Wan 2.2 Remix, a fine-tuned text-to-video model from creator rorge120ac, builds on Wan AI's open-weight Wan 2.1 14B checkpoint. The model adds image-to-video conditioning and remix capabilities to the base architecture, targeting practitioners who want more control over video synthesis workflows. The weights went live on HuggingFace on May 15, 2026.
Wan AI's 2.1 release was a 14-billion-parameter diffusion model trained for text-to-video generation. The Remix fine-tune preserves that text-driven pipeline while adding an image input path, letting users seed video clips from a reference frame or stylistic guide. The model card tags the release for art and video-generation use cases, suggesting it's aimed at creative workflows rather than photorealistic output.
Open-weight fine-tuning
Wan 2.1 is an open-weight checkpoint, meaning practitioners can download the full parameter set and fine-tune locally without API restrictions. That unrestricted access is what enabled the Remix variant—rorge120ac could modify the conditioning layers and retrain on a custom dataset. The original 2.1 model supported variable-length video synthesis up to several seconds per clip.
The Remix checkpoint is hosted under rorge120ac's HuggingFace namespace with no explicit license listed on the model card. Wan AI's 2.1 base carried a permissive research license, but fine-tune derivatives sometimes adopt stricter terms. Practitioners planning commercial use should verify licensing before deploying the weights in production.
