ComfyUI is a free, open-source node-based interface for running AI image generation models locally. It supports Stable Diffusion 1.5, SDXL, SD 3.5, Flux, and many other models with full control over every step of the generation pipeline.

How much VRAM do I need for ComfyUI?

It depends on the model. SD 1.5 needs about 4GB, SDXL needs 7GB, and Flux needs 9-33GB depending on quantization. An 8GB GPU like the RTX 4060 runs SDXL comfortably. For Flux, 12GB or more is recommended.

Can I run ComfyUI on a Mac?

Yes. ComfyUI supports Apple Silicon Macs through the MPS backend. Performance is slower than NVIDIA GPUs but functional. Macs with 16GB or more unified memory run SDXL well, and 24GB or more handles Flux quantized models.

What is the best model to start with in ComfyUI?

SDXL 1.0 is the best starting model. It produces high-quality 1024x1024 images, needs only 7GB VRAM, has a massive ecosystem of LoRAs and ControlNets, and works on most modern GPUs.

How do I add ControlNet to ComfyUI?

Download a ControlNet model (like SDXL canny or depth) and place it in ComfyUI/models/controlnet/. Then add ControlNet loader and apply nodes to your workflow, connecting a preprocessed control image as input.

March 25, 2025comfyui, image-generation, tutorial, sdxl, flux, beginners

ComfyUI Beginner's Guide — Set Up Local AI Image Generation

Step-by-step ComfyUI installation and setup guide for local AI image generation. Learn text-to-image workflows, ControlNets, LoRAs, and VRAM optimization with SDXL, Flux, and SD 3.5.

ComfyUI is the most powerful tool for running AI image generation locally. Unlike simple one-click interfaces, ComfyUI gives you a visual node graph where every part of the generation pipeline — text encoding, denoising, VAE decoding — is an explicit, configurable node. This guide walks you from zero to generating images with models, ControlNets, and LoRAs.

Why ComfyUI?

ComfyUI has become the standard for local image generation for several reasons:

Full control: Every step of the pipeline is visible and adjustable
Model support: Works with SD 1.5, SDXL, SD 3.5, Flux, PixArt, and more
Memory efficient: Only loads what you use — no wasted VRAM on unused components
Extensible: Hundreds of community custom nodes for upscaling, inpainting, video, and more
Free and open source: No subscriptions, no usage limits, runs entirely on your hardware

If you have used Automatic1111 (A1111) before, ComfyUI is the next step up. It replaces A1111's settings panels with a visual graph that is more complex initially but far more powerful once learned.

Installation

Windows

The easiest path on Windows is the portable package:

Download the latest release from github.com/comfyanonymous/ComfyUI
Extract the zip to a folder (e.g., C:\ComfyUI)
Run run_nvidia_gpu.bat (or run_cpu.bat if you have no GPU)
Open http://127.0.0.1:8188 in your browser

Linux

# Clone the repository
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

# Install ComfyUI dependencies
pip install -r requirements.txt

# Start ComfyUI
python main.py

macOS (Apple Silicon)

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

python -m venv venv
source venv/bin/activate

# Install PyTorch for MPS (Apple Silicon)
pip install torch torchvision torchaudio

pip install -r requirements.txt

# Start with MPS force flag
python main.py --force-fp16

After starting, open http://127.0.0.1:8188 in your browser. You should see the ComfyUI node editor with a default workflow.

Downloading Your First Model

For beginners, start with SDXL 1.0. It produces excellent 1024x1024 images, needs only 7GB VRAM, and has the largest ecosystem of LoRAs and ControlNets.

Download SDXL

# Download SDXL base model (~6.9GB)
huggingface-cli download stabilityai/stable-diffusion-xl-base-1.0 \
  sd_xl_base_1.0.safetensors \
  --local-dir ComfyUI/models/checkpoints/

Alternatively, download manually from HuggingFace and place the .safetensors file in ComfyUI/models/checkpoints/.

ComfyUI Model Directory Structure

ComfyUI/
  models/
    checkpoints/      # Full model files (SD 1.5, SDXL, etc.)
    diffusion_models/ # Standalone transformers (Flux GGUF, etc.)
    clip/             # Text encoders (T5, CLIP)
    vae/              # VAE decoders
    controlnet/       # ControlNet models
    loras/            # LoRA adapter files
    upscale_models/   # Upscaling models (RealESRGAN, etc.)

Basic Text-to-Image Workflow

ComfyUI ships with a default SDXL workflow. Here is how to use it:

Load the default workflow — ComfyUI opens with a basic text-to-image graph
Select your model — Click the checkpoint loader node and select sd_xl_base_1.0.safetensors
Enter your prompt — Type in the positive prompt node (e.g., "A majestic mountain landscape at sunset, photorealistic, 8k detail")
Set negative prompt — Common negatives: "blurry, low quality, distorted, deformed"
Configure settings:
- Resolution: 1024x1024 (SDXL's native resolution)
- Steps: 25-30
- CFG Scale: 7.0
- Sampler: euler (fast) or dpmpp_2m (quality)
- Scheduler: karras
Click Queue Prompt — generation starts

Your first image should appear in 5-15 seconds depending on your GPU.

Key Settings Explained

Setting	What It Does	Recommended
Steps	Number of denoising iterations	25-30 for SDXL
CFG Scale	How closely to follow the prompt	6-8 for SDXL
Sampler	Denoising algorithm	euler or dpmpp_2m
Scheduler	Noise schedule	karras
Seed	Reproducibility — same seed gives same image	-1 for random

Adding ControlNets

ControlNets let you guide image generation with structural inputs — edges, depth maps, poses, or reference images. They are essential for consistent, controllable output.

Setup

Download a ControlNet model:

# SDXL Canny ControlNet (~2.5GB)
huggingface-cli download diffusers/controlnet-canny-sdxl-1.0 \
  diffusion_pytorch_model.fp16.safetensors \
  --local-dir ComfyUI/models/controlnet/

In ComfyUI, add these nodes to your workflow:
- Load ControlNet Model — points to your downloaded ControlNet file
- Apply ControlNet — connects between your conditioning and the sampler
- Load Image — your control image (edge map, depth map, etc.)

Common ControlNet Types

Type	Input	Use Case
Canny	Edge detection image	Maintain structure/outlines
Depth	Depth map (MiDaS)	3D spatial composition
Pose (OpenPose)	Skeleton keypoints	Character poses
Tile	Low-res image	Upscaling with detail
Inpaint	Masked region	Selective regeneration

Start with canny — it is the most intuitive. Take any image, run it through a canny edge detector (available as a ComfyUI node), and use the edge map to guide generation. The AI fills in the details while respecting your structure.

Using LoRAs for Style

LoRAs (Low-Rank Adaptations) are small adapter files that modify a model's style or teach it new concepts. They are typically 10-300MB and add about 0.1-0.3GB to VRAM usage.

Setup

Download LoRA files from CivitAI or HuggingFace
Place .safetensors files in ComfyUI/models/loras/
Add a Load LoRA node between your checkpoint loader and the CLIP/model outputs
Set the strength (0.5-1.0 is typical — start at 0.7)

Tips for LoRAs

Check compatibility: SDXL LoRAs only work with SDXL models, SD 1.5 LoRAs only with SD 1.5
Trigger words: Many LoRAs require specific words in your prompt to activate (listed on the download page)
Strength matters: Too high (above 1.0) causes artifacts; too low (below 0.3) has no effect
Stack carefully: You can use multiple LoRAs, but each adds VRAM. Three to four LoRAs at 0.5-0.7 strength is a practical limit

VRAM Optimization Tips

Running out of VRAM is the most common issue in ComfyUI. Here are practical solutions:

Choose the Right Model for Your GPU

VRAM	Recommended Models
6 GB	SD 1.5, PixArt-alpha
8 GB	SDXL 1.0, SD 3.5 Medium (FP8)
12 GB	SDXL + ControlNets, Flux GGUF Q4-Q5
16 GB	SD 3.5 Large (FP8), Flux GGUF Q6-Q8
24 GB	Flux FP8, any model with ControlNets

Optimization Techniques

Use FP8 precision — Add --fp8_e4m3fn-unet to your ComfyUI launch command. Halves model VRAM with minimal quality loss.
Enable sequential offloading — Use --lowvram flag when launching ComfyUI. Moves unused model components to CPU RAM during generation.
Use GGUF quantized models — For Flux, GGUF versions from city96 dramatically reduce VRAM. Q4 brings Flux from 33GB down to about 9GB.
Close other GPU applications — Browser hardware acceleration, Discord, and video players all consume VRAM. Close them before generating.
Reduce resolution — Generate at 768x768 instead of 1024x1024 to save 30-40% VRAM, then upscale the result.

Next Steps

Once you are comfortable with basic generation, explore:

Inpainting — Selectively regenerate parts of an image
Upscaling — Use models like RealESRGAN to upscale to 4K
IP-Adapter — Use reference images to guide style without ControlNet structure
AnimateDiff — Generate short animations from your image workflows
Custom nodes — Install community nodes from ComfyUI Manager

Summary

ComfyUI is the most flexible way to run AI image generation locally. Start with SDXL on an 8GB GPU, learn the node workflow, then expand to ControlNets and LoRAs as you get comfortable. When you are ready for the highest quality output, move to Flux with a 12GB or larger GPU.

Check if your GPU can run SDXL | Check if your GPU can run Flux | Full Flux setup guide

Why ComfyUI?

Installation

Windows

Linux

macOS (Apple Silicon)

Downloading Your First Model

Download SDXL

ComfyUI Model Directory Structure

Basic Text-to-Image Workflow

Key Settings Explained

Adding ControlNets

Setup

Common ControlNet Types

Using LoRAs for Style

Setup

Tips for LoRAs

VRAM Optimization Tips

Choose the Right Model for Your GPU

Optimization Techniques

Next Steps

Summary

Frequently Asked Questions