ComfyUI dual-GPU setup yields no video speed gains despite 28GB VRAM
A ComfyUI user running a 4080 Super + 3080 Ti eGPU setup reports negligible video generation speedup, with the second card only offloading text encoding and leaving 40% of model weights in shared memory.
A ComfyUI user testing dual-GPU video generation found that adding a second card via eGPU barely moves the needle on iteration time. The setup pairs a 16GB RTX 4080 Super with a 12GB RTX 3080 Ti connected through an Oculink cage, both cards detected and working under fresh drivers. Video generation with Wan 2.1 still clocks 32 seconds per iteration — unchanged from single-GPU runs on the 4080 Super alone.
The bottleneck appears structural. ComfyUI's current memory scheduler can move the text encoder to the second GPU and offload the VAE (a 200-300MB footprint for Wan 2.1), but roughly 40 percent of the model weights remain cached in shared system memory rather than split across VRAM. Even an 8GB quantized checkpoint that fits entirely in VRAM shows no iteration-time improvement, suggesting the denoising loop itself isn't parallelizing across cards. The test system runs an i9-14700KF with 64GB DDR5, so CPU and RAM bandwidth aren't the constraint — the second GPU effectively acts as a text-encoding coprocessor, freeing VRAM headroom on the primary card but not distributing the diffusion math that dominates video synthesis time.
The user also flagged that Crystools, a popular node pack for memory profiling, broke during the setup, forcing a rollback of certain workflows. Multi-GPU video generation in ComfyUI remains a work-in-progress feature. The next round of improvements will likely need native model-parallel sampling or tensor-split support in the underlying diffusion backend to distribute denoising steps across cards, rather than just offloading auxiliary components. Until then, dual-GPU rigs buy VRAM capacity but not speed.
