What VRAM do I need for Wan 2.2 in ComfyUI?

48 GB at fp8 for the A14B variants (T2V or I2V) — that's an A6000, RTX 5090, or dual 3090s with offloading. 24 GB for TI2V 5B at fp8, which fits a single RTX 4090 or 3090. Below 24 GB you should look at LTX Video instead.

Is Wan 2.2 uncensored?

Yes. The base checkpoint has no built-in content filter and no policy enforcement layer — those exist only on closed APIs. Local inference means whatever the model can generate, the model will generate. Behavior on adult content varies by prompt and seed; the model wasn't optimized specifically for NSFW but it isn't blocked from it either.

How long does a Wan 2.2 video take to generate?

For a 5-second 720p clip at 25-30 steps: 4-8 minutes on an A6000 with the A14B variants, 2-4 minutes on a 4090 with TI2V 5B. Sage attention shaves 20-30% off these numbers. TeaCache roughly halves them at a quality cost.

Can I run Wan 2.2 on a 16 GB GPU?

Not comfortably. TI2V 5B at aggressive Q4 GGUF quantization will technically load on 16 GB but the output quality drops sharply and you'll hit OOM at higher frame counts. For 16 GB cards, LTX Video is the better target — it was designed for that VRAM tier.

What's the difference between Wan 2.2 T2V, I2V and TI2V?

T2V is text-only input — you write a prompt and the model invents the whole scene. I2V takes both a prompt and a still image, animating the still while following the prompt; this is the most-used variant because it lets you art-direct the keyframe in a separate image model. TI2V is a smaller hybrid 5B checkpoint that handles both modes from one file at lower VRAM cost and lower quality.

Should I wait for Wan 2.5?

No. Wan 2.5 ships closed-weights — Alibaba pivoted the 2.5 line to a SaaS/API product, confirmed by ArtificialAnalysis in October 2025 and by the Wan team's own messaging. There is no local Wan 2.5 to wait for. Wan 2.2 is the last open-weights checkpoint in the Wan series and the open ceiling for local video generation in 2026.

Wan 2.2 ComfyUI Workflow Guide

Wan 2.2 In ComfyUI: What You're Actually Building

A ComfyUI workflow for Wan 2.2 takes a text prompt or input image, runs it through the Wan diffusion transformer via the ComfyUI-WanVideo custom-node package, and outputs an mp4 clip. The 14B A14B variants need 48 GB VRAM at fp8 quantization; the TI2V 5B variant runs on 24 GB. That's the whole shape of the build — the rest of this guide is plumbing.

Wan 2.2 is the open-weights video model from Alibaba's Wan-AI team, released late 2024 and refined through 2025. It comes in three checkpoints with different trade-offs between VRAM, speed, and capability. The reference workflows ship from the Wan-AI team and from kijai's ComfyUI-WanVideoWrapper, the de-facto custom-node package the community settled on.

VideoWanUnrestricted

Wan 2.2 I2V A14B

by Wan-AI

28 GB48 GB VRAM

Why Local Wan 2.2 Still Matters In 2026

In October 2025, ArtificialAnalysis confirmed on X what the Wan team had been telegraphing for weeks: Wan 2.5 ships closed-weights. Alibaba pivoted the 2.5 line to a SaaS/API product. No checkpoint download, no local inference, prompt logging by default, content moderation enforced at the API layer. The 2.5 release is the moment Wan stopped being open.

That makes Wan 2.2 the open ceiling for local video generation through 2026. Tencent's HunyuanVideo is the only competing 80GB-tier open video model and it's heavier to run; LTX Video covers the 16 GB tier but trades quality for the lower memory footprint. Between them, Wan 2.2 sits in the prosumer 24-48 GB sweet spot — the band that matches a 4090, a 5090, or an A6000 workstation.

Local Wan 2.2 has no content filter. No usage cap. No prompt logging. No telemetry to a vendor that can change the rules in a quarterly product review. You download the safetensors file once and the model never gets worse, never adds a new "safety" layer, never decides one Tuesday morning that your past prompts violate a new policy. For users who left closed AI specifically because of moderation, that's the entire pitch.

If you were waiting for Wan 2.5 to fix Wan 2.2's rough edges, stop waiting. The version of Wan you can actually own is the one already on Hugging Face.

VideoWanUnrestricted

Wan 2.2 T2V A14B

by Wan-AI

28 GB48 GB VRAM

The Three Wan 2.2 Variants — Which One You Run

T2V A14B — text-to-video, 14B parameters, 48 GB VRAM at fp8. Roughly 4-8 minutes for a 5-second 720p clip on an A6000. Pure prompt-to-video, no image conditioning. Best for fully synthetic shots where you want the model to invent the entire scene from text.

I2V A14B — image-to-video, 14B parameters, 48 GB VRAM at fp8. You feed it a still — typically the output of FLUX, Pony, or Illustrious — and the model animates it. This is the most-used Wan variant in 2026 because the dominant prosumer pipeline is "perfect-still in an image model, then animate via Wan I2V." You get the precise composition and aesthetic control of an image model and the temporal coherence of a video model. T2V can't match that level of art direction.

TI2V 5B — hybrid 5B model handling both text-to-video and image-to-video from a single checkpoint, 24 GB VRAM at fp8, 2-4 minutes per clip. Quality is meaningfully lower than the A14B variants — softer details, more motion artifacts on complex scenes — but TI2V is the only Wan checkpoint that fits on a single 4090 without offloading. If you're on consumer hardware, this is your entry point.

VideoWanUnrestricted

Wan 2.2 TI2V 5B

by Wan-AI

10 GB24 GB VRAM

Prerequisites

ComfyUI installed (portable Windows build or git clone https://github.com/comfyanonymous/ComfyUI on Linux/Mac)
ComfyUI-Manager installed — strongly recommended for managing custom nodes and dependencies
24-48 GB VRAM — RTX 4090, RTX 5090, A6000, dual 3090s with NVLink, or equivalent
100 GB free disk space for model weights, VAE, text encoder, and ComfyUI cache
Python 3.10+ with a CUDA-matched PyTorch build (CUDA 12.4 or 12.6 are the typical targets in 2026)

Step 1: Install The ComfyUI-WanVideo Custom Node

The custom-node package everyone uses is kijai/ComfyUI-WanVideoWrapper. Install it via the Manager or clone manually.

Via ComfyUI-Manager: open Manager → Install Custom Nodes → search "WanVideo" → install ComfyUI-WanVideoWrapper by kijai. Restart ComfyUI when prompted.

Manual install:

``bash cd ComfyUI/custom_nodes git clone https://github.com/kijai/ComfyUI-WanVideoWrapper.git cd ComfyUI-WanVideoWrapper pip install -r requirements.txt ``

Restart ComfyUI. The new node categories — WanVideo Model Loader, WanVideo Sampler, WanVideo TextEncode, and friends — will show up in the right-click node menu.

kijai/ComfyUI-WanVideoWrapper repository — the de-facto custom-node package for running Wan video models in ComfyUI

Step 2: Download The Wan 2.2 Weights

Pull from the official Wan-AI organization on Hugging Face: Wan-AI/Wan2.2-I2V-A14B, Wan-AI/Wan2.2-T2V-A14B, and Wan-AI/Wan2.2-TI2V-5B. The recommended quantization is fp8 — specifically the _fp8_e4m3fn files. For Wan 2.2 I2V A14B that's Wan2.2-I2V-A14B_fp8_e4m3fn.safetensors, roughly 13-15 GB on disk. fp8 is the sweet spot: visually indistinguishable from fp16 in side-by-side tests, but half the VRAM and faster sampling.

Three things matter for placement. The diffusion model goes in ComfyUI/models/diffusion_models/ — not checkpoints/. Wan uses the diffusion_models folder because the architecture is a diffusion transformer rather than a Stable Diffusion-style UNet. The VAE goes in ComfyUI/models/vae/. The UMT5-XXL text encoder goes in ComfyUI/models/text_encoders/. For the I2V variants you also need CLIP-Vision-H in ComfyUI/models/clip_vision/.

Sample directory layout for an I2V setup:

``text ComfyUI/models/ ├── diffusion_models/ │ └── Wan2.2-I2V-A14B_fp8_e4m3fn.safetensors ├── vae/ │ └── Wan2.1_VAE.safetensors ├── text_encoders/ │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors └── clip_vision/ └── clip_vision_h.safetensors ``

Note the VAE is named Wan2.1_VAE.safetensors — Wan 2.2 reuses the 2.1 VAE intentionally. Don't try to swap in an SDXL or FLUX VAE; you'll get black frames.

Wan2.2-I2V-A14B on Hugging Face — the official 14B image-to-video checkpoint

Step 3: Load The Reference Workflow

The Wan-AI team and the WanVideoWrapper repo both publish reference workflow JSONs in custom_nodes/ComfyUI-WanVideoWrapper/example_workflows/. Drag the JSON file directly onto the ComfyUI canvas and the entire graph reconstructs — model loader, text encoder, sampler, decoder, video combine, all wired up.

Three reference workflows to know by name:

wanvideo_T2V_workflow.json — pure text-to-video with the A14B checkpoint
wanvideo_I2V_workflow.json — image-to-video with A14B
wanvideo_TI2V_workflow.json — hybrid 5B for either mode

Each workflow loads the right model, text encoder, VAE, and sampler chain for its variant. Start with the reference graph; don't try to wire WanVideo nodes from scratch on first attempt.

Step 4: The I2V Workflow Walkthrough

Walking through the I2V graph node by node, because it's the workflow most people actually run:

WanVideo Model Loader — points at the I2V A14B fp8 safetensors file and loads it into VRAM. Set precision to fp8_e4m3fn here.
WanVideo TextEncode — UMT5-XXL encodes positive and negative prompts into the text-conditioning embedding the diffusion transformer expects. Use one TextEncode node per prompt direction.
Load Image — your input still. Output of an image model, a photograph, or a previous video frame all work.
WanVideo I2V Image Encode — encodes the still and extracts CLIP-Vision features. This is where the model gets the spatial conditioning that anchors the animation to your input image.
WanVideo Sampler — the diffusion step itself. Steps, CFG, sampler, scheduler, frame count, seed all live here. This is the node you'll spend the most time tuning.
WanVideo Decode — the VAE decode that turns the latent video tensor into pixel frames.
Video Combine — frames-to-mp4 encoding via ffmpeg. Set fps and codec here.

For source stills, the standard prosumer pipeline is to render the keyframe in an image model first and then animate it.

ImagePonyUnrestricted

Pony Diffusion XL

by PurpleSmartAI

6.94 GB8 GB VRAM★ 4.9

ImageFLUX

FLUX.1 [dev]

by Black Forest Labs

23.8 GB12 GB VRAM★ 4.7

Recommended Sampler Settings

A working starting point for I2V A14B:

``text Sampler: euler Scheduler: simple or beta (beta gives slightly cleaner motion) Steps: 25-30 (more than 30 stops helping) CFG: 6.0 (range 5.5-7 OK; below 5 loses prompt adherence, above 7 burns) Frame count: 81 (5-second clip at 16 fps) Resolution: 720x1280 portrait or 1280x720 landscape (16:9 / 9:16) Seed: random for exploration, fixed for reproducibility ``

Full settings comparison across the three variants:

Parameter	T2V A14B	I2V A14B	TI2V 5B
Min VRAM (fp8)	38 GB	40 GB	22 GB
Recommended steps	25-30	25-30	20-25
CFG	6.0	6.0	5.5
Frame count	81 (5s @ 16fps)	81	81
Default resolution	1280x720	matches input	768x1280
Sampler	euler / unipc	euler / unipc	euler

The "Min VRAM" column assumes fp8 quantization with sage attention enabled. Without sage attention, add roughly 4-6 GB to each row.

Common Errors And Fixes

"CUDA out of memory" — drop to fp8 if you're on fp16, reduce frame count from 81 to 49 or 33, enable sage attention or xformers. If you're already at fp8 with 49 frames and still OOM-ing, you're on the wrong variant — switch to TI2V 5B.
"NoneType has no attribute" in WanVideo Sampler — the text encoder didn't load. Check the WanVideo TextEncode node has UMT5-XXL selected, not a stale CLIP reference from a SDXL workflow.
Black frames in output — wrong VAE. Wan needs Wan2.1_VAE.safetensors. SDXL VAE, FLUX VAE, or any other VAE produces black or noise frames. Re-download the VAE from Wan-AI/Wan2.2-I2V-A14B and place it in ComfyUI/models/vae/.
Choppy or flickering motion — too few steps, or aggressive quantization. Q4 GGUF degrades temporal coherence in ways you'll see immediately in the output. Stay on fp8. If you're already at fp8, push steps from 20 to 30.
"Module not found" on first run — custom-node dependencies didn't install. Activate the venv ComfyUI uses and run pip install -r requirements.txt in custom_nodes/ComfyUI-WanVideoWrapper. On portable Windows builds, use the embedded Python: python_embeded\python.exe -m pip install -r ....

Performance Tuning

sageattention — install via pip install sageattention and the WanVideoWrapper will pick it up automatically. Replaces standard scaled-dot-product attention with a kernel-fused version. Roughly 30-40% memory reduction on the attention path and a modest speed win. This is the single highest-impact optimization for Wan workflows.

xformers — alternative attention optimization. Slightly less effective than sageattention but more compatible with older PyTorch builds and pre-Ada GPUs. Good fallback if sageattention won't compile on your stack.

fp8 vs Q4 GGUF — fp8 is the recommended quantization. Q4 GGUF saves disk space and lets the model fit on smaller cards, but it visibly degrades temporal coherence — frames lose internal consistency, motion becomes lurchy. fp8 is the floor; don't go lower.

TeaCache — timestep-based caching custom node that skips redundant computation between similar timesteps. Roughly 2x speedup at a small quality cost. Worth enabling for prototyping passes when you're hunting for the right prompt; disable for final renders.

Frame interpolation (RIFE / FILM) — post-process Wan's 16fps output to 32fps or 48fps for smoother playback. Standard RIFE and FILM nodes exist in the ComfyUI ecosystem and slot in after Video Combine. Cheaper than rendering more frames natively.

Workflow Patterns That Work

Iterate at low resolution, render at high. Generate dozens of variations at 768x768 with 25 steps to find the prompt and seed that work. Then re-render the keepers at full 1280x720 with 30 steps. Wan sampling time scales hard with both resolution and step count; treat the low-res pass as the search and the high-res pass as the commit.

I2V chain for longer clips. Wan 2.2 caps comfortably at around 5 seconds per generation. For longer outputs, generate the first 5-second clip, take the last frame, feed it back as the I2V input for the next 5 seconds, and repeat. Stitch the segments together for ~20-30 seconds of consistent motion. Quality drift accumulates — by clip four or five the subject starts shifting — but for short-form output it works.

Hybrid pipeline. The standard 2026 prosumer stack is: image model (Pony, Illustrious, or FLUX) for the keyframe → Wan I2V for the animation → RIFE for frame interpolation → ffmpeg encoding to web-friendly mp4. Each stage does what it's best at. Don't try to make Wan do composition work an image model handles better, and don't try to make an image model do motion.

Why Not Use Wan 2.5 / Veo / Sora?

Wan 2.5 is closed-weights, API-only, with prompt logging and content moderation at the API layer. Sora 2 and Veo 3 — same architecture of access. Closed APIs, gated keys, mandatory moderation, terms-of-service that can change without notice. Kling, Runway, Pika round out the pack and they all moderate, all log, all close.

The trade is honest. Closed APIs produce slightly cleaner output today — call it a 12-month lead on cinematic polish, sharper detail, more reliable motion. Local Wan 2.2 is uncensored, untracked, and untouchable by future policy changes. For users who left closed AI specifically because of moderation, the choice is already made. For the cinematic-polish crowd that doesn't care about moderation, the closed APIs are fine. This guide is not for them.

The other consideration: open-weights models don't get worse. The Wan 2.2 checkpoint on your disk in 2026 is the same checkpoint in 2027 and 2030. Closed APIs degrade in subtle ways — moderation tightens, prompts that worked last quarter get rejected this quarter, the model gets quietly swapped for a cheaper one. Local inference is the only path where the model you tested is the model you ship.

Alternatives if Wan 2.2 doesn't fit your hardware:

VideoHunyuanVideoUnrestricted

HunyuanVideo

by Tencent

26 GB80 GB VRAM

VideoLTXUnrestricted

LTX Video 2.3

by Lightricks

10 GB16 GB VRAM

Wan 2.2 ComfyUI Workflow Guide