Will It Run AI
comfyui, image-generation, tutorial, sdxl, flux, beginners

ComfyUI Beginner's Guide — Set Up Local AI Image Generation

Step-by-step ComfyUI installation and setup guide for local AI image generation. Learn text-to-image workflows, ControlNets, LoRAs, and VRAM optimization with SDXL, Flux, and SD 3.5.

ComfyUI is the most powerful tool for running AI image generation locally. Unlike simple one-click interfaces, ComfyUI gives you a visual node graph where every part of the generation pipeline — text encoding, denoising, VAE decoding — is an explicit, configurable node. This guide walks you from zero to generating images with models, ControlNets, and LoRAs.


Why ComfyUI?

ComfyUI has become the standard for local image generation for several reasons:

  • Full control: Every step of the pipeline is visible and adjustable
  • Model support: Works with SD 1.5, SDXL, SD 3.5, Flux, PixArt, and more
  • Memory efficient: Only loads what you use — no wasted VRAM on unused components
  • Extensible: Hundreds of community custom nodes for upscaling, inpainting, video, and more
  • Free and open source: No subscriptions, no usage limits, runs entirely on your hardware

If you have used Automatic1111 (A1111) before, ComfyUI is the next step up. It replaces A1111's settings panels with a visual graph that is more complex initially but far more powerful once learned.


Installation

Windows

The easiest path on Windows is the portable package:

  1. Download the latest release from github.com/comfyanonymous/ComfyUI
  2. Extract the zip to a folder (e.g., C:\ComfyUI)
  3. Run run_nvidia_gpu.bat (or run_cpu.bat if you have no GPU)
  4. Open http://127.0.0.1:8188 in your browser

Linux

# Clone the repository
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

# Install ComfyUI dependencies
pip install -r requirements.txt

# Start ComfyUI
python main.py

macOS (Apple Silicon)

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

python -m venv venv
source venv/bin/activate

# Install PyTorch for MPS (Apple Silicon)
pip install torch torchvision torchaudio

pip install -r requirements.txt

# Start with MPS force flag
python main.py --force-fp16

After starting, open http://127.0.0.1:8188 in your browser. You should see the ComfyUI node editor with a default workflow.


Downloading Your First Model

For beginners, start with SDXL 1.0. It produces excellent 1024x1024 images, needs only 7GB VRAM, and has the largest ecosystem of LoRAs and ControlNets.

Download SDXL

# Download SDXL base model (~6.9GB)
huggingface-cli download stabilityai/stable-diffusion-xl-base-1.0 \
  sd_xl_base_1.0.safetensors \
  --local-dir ComfyUI/models/checkpoints/

Alternatively, download manually from HuggingFace and place the .safetensors file in ComfyUI/models/checkpoints/.

ComfyUI Model Directory Structure

ComfyUI/
  models/
    checkpoints/      # Full model files (SD 1.5, SDXL, etc.)
    diffusion_models/ # Standalone transformers (Flux GGUF, etc.)
    clip/             # Text encoders (T5, CLIP)
    vae/              # VAE decoders
    controlnet/       # ControlNet models
    loras/            # LoRA adapter files
    upscale_models/   # Upscaling models (RealESRGAN, etc.)

Basic Text-to-Image Workflow

ComfyUI ships with a default SDXL workflow. Here is how to use it:

  1. Load the default workflow — ComfyUI opens with a basic text-to-image graph
  2. Select your model — Click the checkpoint loader node and select sd_xl_base_1.0.safetensors
  3. Enter your prompt — Type in the positive prompt node (e.g., "A majestic mountain landscape at sunset, photorealistic, 8k detail")
  4. Set negative prompt — Common negatives: "blurry, low quality, distorted, deformed"
  5. Configure settings:
    • Resolution: 1024x1024 (SDXL's native resolution)
    • Steps: 25-30
    • CFG Scale: 7.0
    • Sampler: euler (fast) or dpmpp_2m (quality)
    • Scheduler: karras
  6. Click Queue Prompt — generation starts

Your first image should appear in 5-15 seconds depending on your GPU.

Key Settings Explained

SettingWhat It DoesRecommended
StepsNumber of denoising iterations25-30 for SDXL
CFG ScaleHow closely to follow the prompt6-8 for SDXL
SamplerDenoising algorithmeuler or dpmpp_2m
SchedulerNoise schedulekarras
SeedReproducibility — same seed gives same image-1 for random

Adding ControlNets

ControlNets let you guide image generation with structural inputs — edges, depth maps, poses, or reference images. They are essential for consistent, controllable output.

Setup

  1. Download a ControlNet model:
# SDXL Canny ControlNet (~2.5GB)
huggingface-cli download diffusers/controlnet-canny-sdxl-1.0 \
  diffusion_pytorch_model.fp16.safetensors \
  --local-dir ComfyUI/models/controlnet/
  1. In ComfyUI, add these nodes to your workflow:
    • Load ControlNet Model — points to your downloaded ControlNet file
    • Apply ControlNet — connects between your conditioning and the sampler
    • Load Image — your control image (edge map, depth map, etc.)

Common ControlNet Types

TypeInputUse Case
CannyEdge detection imageMaintain structure/outlines
DepthDepth map (MiDaS)3D spatial composition
Pose (OpenPose)Skeleton keypointsCharacter poses
TileLow-res imageUpscaling with detail
InpaintMasked regionSelective regeneration

Start with canny — it is the most intuitive. Take any image, run it through a canny edge detector (available as a ComfyUI node), and use the edge map to guide generation. The AI fills in the details while respecting your structure.


Using LoRAs for Style

LoRAs (Low-Rank Adaptations) are small adapter files that modify a model's style or teach it new concepts. They are typically 10-300MB and add about 0.1-0.3GB to VRAM usage.

Setup

  1. Download LoRA files from CivitAI or HuggingFace
  2. Place .safetensors files in ComfyUI/models/loras/
  3. Add a Load LoRA node between your checkpoint loader and the CLIP/model outputs
  4. Set the strength (0.5-1.0 is typical — start at 0.7)

Tips for LoRAs

  • Check compatibility: SDXL LoRAs only work with SDXL models, SD 1.5 LoRAs only with SD 1.5
  • Trigger words: Many LoRAs require specific words in your prompt to activate (listed on the download page)
  • Strength matters: Too high (above 1.0) causes artifacts; too low (below 0.3) has no effect
  • Stack carefully: You can use multiple LoRAs, but each adds VRAM. Three to four LoRAs at 0.5-0.7 strength is a practical limit

VRAM Optimization Tips

Running out of VRAM is the most common issue in ComfyUI. Here are practical solutions:

Choose the Right Model for Your GPU

VRAMRecommended Models
6 GBSD 1.5, PixArt-alpha
8 GBSDXL 1.0, SD 3.5 Medium (FP8)
12 GBSDXL + ControlNets, Flux GGUF Q4-Q5
16 GBSD 3.5 Large (FP8), Flux GGUF Q6-Q8
24 GBFlux FP8, any model with ControlNets

Optimization Techniques

  1. Use FP8 precision — Add --fp8_e4m3fn-unet to your ComfyUI launch command. Halves model VRAM with minimal quality loss.

  2. Enable sequential offloading — Use --lowvram flag when launching ComfyUI. Moves unused model components to CPU RAM during generation.

  3. Use GGUF quantized models — For Flux, GGUF versions from city96 dramatically reduce VRAM. Q4 brings Flux from 33GB down to about 9GB.

  4. Close other GPU applications — Browser hardware acceleration, Discord, and video players all consume VRAM. Close them before generating.

  5. Reduce resolution — Generate at 768x768 instead of 1024x1024 to save 30-40% VRAM, then upscale the result.


Next Steps

Once you are comfortable with basic generation, explore:

  • Inpainting — Selectively regenerate parts of an image
  • Upscaling — Use models like RealESRGAN to upscale to 4K
  • IP-Adapter — Use reference images to guide style without ControlNet structure
  • AnimateDiff — Generate short animations from your image workflows
  • Custom nodes — Install community nodes from ComfyUI Manager

Summary

ComfyUI is the most flexible way to run AI image generation locally. Start with SDXL on an 8GB GPU, learn the node workflow, then expand to ControlNets and LoRAs as you get comfortable. When you are ready for the highest quality output, move to Flux with a 12GB or larger GPU.

Check if your GPU can run SDXL | Check if your GPU can run Flux | Full Flux setup guide


Related reading: How to Run Flux Locally | Best GPU for AI Image Generation | SDXL LoRA Guide

Frequently Asked Questions

What is ComfyUI?

ComfyUI is a free, open-source node-based interface for running AI image generation models locally. It supports Stable Diffusion 1.5, SDXL, SD 3.5, Flux, and many other models with full control over every step of the generation pipeline.

How much VRAM do I need for ComfyUI?

It depends on the model. SD 1.5 needs about 4GB, SDXL needs 7GB, and Flux needs 9-33GB depending on quantization. An 8GB GPU like the RTX 4060 runs SDXL comfortably. For Flux, 12GB or more is recommended.

Can I run ComfyUI on a Mac?

Yes. ComfyUI supports Apple Silicon Macs through the MPS backend. Performance is slower than NVIDIA GPUs but functional. Macs with 16GB or more unified memory run SDXL well, and 24GB or more handles Flux quantized models.

What is the best model to start with in ComfyUI?

SDXL 1.0 is the best starting model. It produces high-quality 1024x1024 images, needs only 7GB VRAM, has a massive ecosystem of LoRAs and ControlNets, and works on most modern GPUs.

How do I add ControlNet to ComfyUI?

Download a ControlNet model (like SDXL canny or depth) and place it in ComfyUI/models/controlnet/. Then add ControlNet loader and apply nodes to your workflow, connecting a preprocessed control image as input.