What ComfyUI Is
ComfyUI is a node-graph workflow runtime for diffusion models. It was released in August 2023 by an anonymous developer who goes by comfyanonymous, it is open source under a permissive license, it is free, and in 2026 it is the de facto runtime for every serious open-weights image and video model — Stable Diffusion 1.5, SDXL and its fine-tunes, FLUX, Pony, Illustrious, Wan, HunyuanVideo, LTX Video, and whatever drops next month.
The phrase "node graph" means you do not fill in a form and press Generate. You wire together a pipeline. Each node performs one step — load a checkpoint, encode a prompt, run the sampler, decode the latent, save the image — and you connect the outputs of one node to the inputs of the next. The result is a directed graph that ComfyUI executes from left to right. Workflows are JSON files. You can save them, share them, drag them onto someone else's canvas and the entire graph rebuilds.
That is the whole concept. The complexity that follows is just more nodes wired into the same idea.
Why People Run ComfyUI Instead Of Automatic1111
Automatic1111 — usually called A1111 — is a web form. You see a big text box for the prompt, a smaller box for the negative prompt, sliders for steps and CFG, and a Generate button. For a beginner who wants to type "a cat in a hat" and get a cat in a hat, this is the path of least resistance.
ComfyUI is not that. The first time you open it you see a canvas with seven boxes connected by colored lines and you have no idea which one to touch. The learning curve is real. Every guide on Reddit warning you it is intimidating is correct.
Here is the tradeoff in 2026, after watching the ecosystem for two more years:
- A1111 is faster to start, easier for "type prompt → get image", and has fallen visibly behind on new model support. FLUX dev took months to get usable A1111 support after release. Wan video is partial at best. HunyuanVideo never arrived. The A1111 development pace slowed in 2024 and never recovered.
- ComfyUI has a steeper learning curve, but every new open-weights model in the last eighteen months has shipped with a reference ComfyUI workflow on day one. Full pipeline visibility means you can see exactly which step is breaking when something goes wrong. Workflows are portable JSON, which means the entire community shares techniques as files, not screenshots of settings.
- Forge is the middle path — an A1111 fork with much better SDXL and FLUX memory management. It was the smart compromise in 2024. It has been losing ground steadily through 2025 and 2026 as ComfyUI absorbed the cutting edge.
The honest summary: A1111 is the runtime you would have picked in 2023. ComfyUI is the runtime you pick in 2026.
How A Workflow Works
Open ComfyUI for the first time and you get a default text-to-image graph with about seven nodes. Reading left to right:
- 01Load Checkpoint — points at a
.safetensorsfile in yourmodels/checkpoints/folder. Outputs three things: the model itself, the CLIP text encoder, and the VAE. - 02CLIP Text Encode (Positive) — takes the CLIP encoder from the checkpoint plus your prompt string, outputs a conditioning tensor. This is where you type what you want.
- 03CLIP Text Encode (Negative) — same node, second instance, for what you do not want.
- 04Empty Latent Image — defines the canvas. Width, height, batch size. Outputs an empty latent tensor at that resolution.
- 05KSampler — the actual diffusion. Takes the model, the positive conditioning, the negative conditioning, the empty latent, plus parameters (steps, CFG, sampler, scheduler, seed). Outputs a denoised latent. This is the node that does the real work.
- 06VAE Decode — takes the denoised latent and the VAE, outputs an RGB image in pixel space.
- 07Save Image — writes the RGB tensor to disk as a PNG.
The wires carry data between them: model → sampler, CLIP → encoders → sampler, latent → sampler → VAE → save. Once you internalize this seven-node spine, every advanced workflow you will ever see is the same spine with extra nodes spliced in. A LoRA is one node between Checkpoint and Sampler. ControlNet is two nodes feeding extra conditioning into the sampler. Hires fix is a second sampler chained after the first. Video is a different sampler and a different decoder. The pattern does not change.
This is why people who push through the first week of ComfyUI rarely go back. Once you can read the graph, you can read any workflow on the internet.
The Models You Actually Run
ComfyUI is a runtime. The models are separate files you download and drop into folders. In 2026 the live model categories look like this:
SDXL fine-tunes — the workhorse for image generation on consumer hardware. Anime and illustration models like Pony Diffusion and Illustrious. Realism fine-tunes from various community trainers. Eight to twelve gigabytes of VRAM is enough.
FLUX — Black Forest Labs' twelve-billion-parameter image model. Better prompt adherence and text rendering than SDXL, hungrier on VRAM. Dev is the high-quality variant, schnell is the four-step distilled version.
Open video — the entire reason 2025 mattered. Wan 2.2 from Alibaba is the current sweet spot for image-to-video and text-to-video on prosumer hardware. HunyuanVideo from Tencent is the heavyweight. LTX Video is the speed champion.
Each of these ships with reference ComfyUI workflows on the day of release. That is not a coincidence. That is the model teams treating ComfyUI as the canonical runtime.
Custom Nodes — The Real Power
The base ComfyUI install does maybe thirty percent of what people actually do. The other seventy percent comes from custom nodes — community-built extensions that add new node types to the canvas.
The standard entry point is ComfyUI-Manager, a third-party but universally adopted node pack that gives you a GUI for installing other custom nodes. You install it once, then use it to install everything else.
What custom nodes add:
- LoRA loaders — chain LoRAs into the model pipeline, control their strength per prompt
- ControlNet — pose, depth, canny, OpenPose, scribble conditioning
- IP-Adapter — image prompts, style transfer, face reference
- Video samplers — ComfyUI-WanVideo, ComfyUI-HunyuanVideoWrapper, ComfyUI-LTXVideo, the official video implementations
- Upscalers — Ultimate SD Upscale, RealESRGAN nodes, RIFE for video frame interpolation
- Face detailers — ADetailer-style automatic face and hand fixing
- Region prompting — different prompts for different parts of the image
- Model utilities — checkpoint merging, LoRA extraction, format converters
- Workflow utilities — math nodes, text manipulation, conditional routing, batch loops
There are several thousand custom node packs on the registry. Most of them are abandoned or experimental. About fifty are essential. ComfyUI-Manager flags which ones are actively maintained.
This is the asymmetry that makes ComfyUI versus A1111 not a fair comparison in 2026. A1111 has extensions too, but the ComfyUI custom node ecosystem is where the entire research community ships their inference code now. New papers drop with ComfyUI nodes attached.
Hardware: What You Need
VRAM is the constraint. Everything else is secondary.
- 8 GB VRAM — minimum for SDXL, Pony, Illustrious at 1024×1024. Tight but workable.
- 12 GB VRAM — comfortable for SDXL workflows with LoRAs and ControlNet stacked.
- 16 GB VRAM — FLUX dev runs with NF4 or fp8 quantization. Some video at low resolution.
- 24 GB VRAM — FLUX dev at full precision. Wan 2.2 with quantization. The 4090/5090 sweet spot.
- 48 GB VRAM — Wan 2.2 A14B at reasonable quality. RTX 6000 Ada territory.
- 80 GB VRAM — HunyuanVideo at native settings. H100 / A100 / dual-GPU rigs.
If you do not have the VRAM, quantization buys you a tier or two. NF4 and fp8 cut memory roughly in half with minor quality loss. GGUF quants from the llama.cpp world have crossed over into image and video models in 2025-2026 and let you run FLUX and Wan on cards you have no business running them on, at the cost of generation speed.
CPU and system RAM matter mostly for offload — if your VRAM is not enough, ComfyUI will spill to system RAM, and that is slow but functional. 32 GB of system RAM is a reasonable floor. NVMe SSD is recommended because models are large (FLUX dev is twenty-three gigabytes, video models are larger) and load time off a spinning disk is painful.
How To Install ComfyUI
There are two paths.
Portable Windows. Download the ComfyUI portable .7z from the GitHub releases page, extract it, run run_nvidia_gpu.bat. Python is bundled, dependencies are bundled, and updates are a one-click affair. This is what every Windows beginner should use.
Manual install. For Linux, Mac, or anyone who wants to manage their own Python environment:
``bash git clone https://github.com/comfyanonymous/ComfyUI.git cd ComfyUI python -m venv venv source venv/bin/activate pip install -r requirements.txt python main.py ``
That last command starts a local web server, default port 8188. Open http://127.0.0.1:8188 in a browser and you have the canvas.
After the base install, install ComfyUI-Manager:
``bash cd custom_nodes git clone https://github.com/ltdrdata/ComfyUI-Manager.git ``
Restart ComfyUI. There will now be a Manager button on the canvas. Use it to install everything else.
Mac users with Apple Silicon: it works, MPS backend is supported, but you will be slower than an equivalent NVIDIA card and some video custom nodes assume CUDA. Linux with NVIDIA is the smoothest setup.
Models Folder Structure
Models live under ComfyUI/models/. The subdirectory tells ComfyUI what kind of model each file is, which determines which nodes can see it.
checkpoints/— base models. Pony, Illustrious, FLUX dev, Wan, HunyuanVideo.safetensorsfiles go here.loras/— LoRA fine-tunes. Hundreds of megabytes each.vae/— separate VAE files. Most modern models bake the VAE into the checkpoint, so this folder is often empty.controlnet/— ControlNet models for pose, depth, canny, etc.upscale_models/— upscaler weights. 4x-UltraSharp, RealESRGAN variants, RealESRGAN-anime.clip/andclip_vision/— text and image encoders that some workflows load separately, especially FLUX and IP-Adapter.embeddings/— textual inversions. Less common in 2026 than in 2023.
Drop a file into the right folder, refresh the relevant node, the file appears in the dropdown. There is no registration step, no config edit. The folder is the registry.
If you already have models from an A1111 install and do not want to duplicate them, edit extra_model_paths.yaml in the ComfyUI root to point at your existing folders.
Workflow Examples — Where To Find Them
The single highest-leverage thing a ComfyUI beginner can do is stop trying to build workflows from scratch. Every model team publishes reference workflows. Use them.
The sources, in order of reliability:
- 01The model's official HuggingFace card. FLUX, Wan, HunyuanVideo, LTX — all of them publish a workflow JSON in the repo. Download, drag onto canvas, the graph appears.
- 02The custom node pack's GitHub README. ComfyUI-WanVideo and ComfyUI-HunyuanVideoWrapper ship with example workflows in their
examples/folder. - 03Civitai. Filter by ComfyUI workflow. Quality varies. Read what the uploader says before running random JSON.
- 04The official ComfyUI examples site — comfyanonymous maintains a page of canonical workflows for every supported model architecture.
The drag-and-drop trick is worth highlighting. ComfyUI embeds the workflow into the PNG metadata of every image it generates. If you find an image you like and the uploader did not strip the metadata, drag the PNG itself onto the canvas and the workflow that produced it rebuilds. The community shares techniques as image files, which is genuinely elegant.
Alternatives To ComfyUI
A brief honest tour of what else exists, because the question always comes up:
- Automatic1111 / Forge — easier interface, slower at adopting new models, fine for SDXL workflows that are not pushing the envelope. Not viable for FLUX dev at full quality, not viable for Wan or HunyuanVideo.
- Fooocus — opinionated A1111 fork with auto-tuned defaults. Beginner-friendly, almost zero setup, very limited customization. The right tool for "I just want pretty pictures and I do not want to learn anything."
- InvokeAI — polished commercial-leaning UX, narrower model coverage, lagging on the open-weights cutting edge.
- SwarmUI — relatively new, uses ComfyUI as a backend with a simpler UI on top. Worth knowing about for users who want ComfyUI's model support without the node graph in their face.
- SD.Next — another A1111 fork, smaller community, niche.
In 2026: ComfyUI for serious work, Fooocus or SwarmUI for beginners who want a soft landing, ignore the rest unless you have a specific reason.
Frequently Asked Questions
What is ComfyUI used for?
Running diffusion models locally on your own hardware. Image generation with Stable Diffusion, SDXL, FLUX, Pony, Illustrious. Video generation with Wan, HunyuanVideo, LTX. Anything from simple text-to-image to complex multi-stage pipelines with LoRAs, ControlNet, upscaling, and post-processing — all wired together as a node graph.
Is ComfyUI free?
Yes. ComfyUI is open source under a permissive license, free to download, free to use, free to modify, and free for commercial use. There is no paid tier. The models you run on it are separately licensed — most open-weights models are free, some have usage restrictions in their license text.
Is ComfyUI better than Automatic1111?
For new model support, debugging, and serious workflows: yes, by a wide margin in 2026. For "type a prompt and get an image with no learning curve": no, A1111 is still easier on day one. The decision is whether you plan to use this software for an afternoon or for a year.
How much VRAM does ComfyUI need?
ComfyUI itself uses negligible VRAM. The model determines the requirement. SDXL family runs comfortably on 8-12 GB. FLUX dev wants 16-24 GB. Wan 2.2 wants 24-48 GB. HunyuanVideo wants 48-80 GB. Quantization (NF4, fp8, GGUF) and CPU offload can push these down at the cost of speed or quality.
Can ComfyUI generate videos?
Yes. Wan 2.2, HunyuanVideo, and LTX Video all run in ComfyUI through their respective custom node packs. Image-to-video and text-to-video both work. Length and resolution are bounded by your VRAM. Frame interpolation with RIFE and upscaling are typically applied as post-processing nodes in the same workflow.
Do I need to know coding to use ComfyUI?
No. You will run terminal commands during installation, you will edit the occasional config file, you will read GitHub READMEs. None of that is programming. You do not write code to build workflows — you wire nodes on a canvas. The mental model is closer to a synthesizer patch or a Houdini graph than to writing Python.