ComfyUI Beginner's Guide — Set Up Local AI Image Generation
Step-by-step ComfyUI installation and setup guide for local AI image generation. Learn text-to-image workflows, ControlNets, LoRAs, and VRAM optimization with SDXL, Flux, and SD 3.5.
ComfyUI is the most powerful tool for running AI image generation locally. Unlike simple one-click interfaces, ComfyUI gives you a visual node graph where every part of the generation pipeline — text encoding, denoising, VAE decoding — is an explicit, configurable node. This guide walks you from zero to generating images with models, ControlNets, and LoRAs.
Why ComfyUI?
ComfyUI has become the standard for local image generation for several reasons:
- Full control: Every step of the pipeline is visible and adjustable
- Model support: Works with SD 1.5, SDXL, SD 3.5, Flux, PixArt, and more
- Memory efficient: Only loads what you use — no wasted VRAM on unused components
- Extensible: Hundreds of community custom nodes for upscaling, inpainting, video, and more
- Free and open source: No subscriptions, no usage limits, runs entirely on your hardware
If you have used Automatic1111 (A1111) before, ComfyUI is the next step up. It replaces A1111's settings panels with a visual graph that is more complex initially but far more powerful once learned.
Installation
Windows
The easiest path on Windows is the portable package:
- Download the latest release from github.com/comfyanonymous/ComfyUI
- Extract the zip to a folder (e.g.,
C:\ComfyUI) - Run
run_nvidia_gpu.bat(orrun_cpu.batif you have no GPU) - Open
http://127.0.0.1:8188in your browser
Linux
# Clone the repository
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate
# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# Install ComfyUI dependencies
pip install -r requirements.txt
# Start ComfyUI
python main.py
macOS (Apple Silicon)
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python -m venv venv
source venv/bin/activate
# Install PyTorch for MPS (Apple Silicon)
pip install torch torchvision torchaudio
pip install -r requirements.txt
# Start with MPS force flag
python main.py --force-fp16
After starting, open http://127.0.0.1:8188 in your browser. You should see the ComfyUI node editor with a default workflow.
Downloading Your First Model
For beginners, start with SDXL 1.0. It produces excellent 1024x1024 images, needs only 7GB VRAM, and has the largest ecosystem of LoRAs and ControlNets.
Download SDXL
# Download SDXL base model (~6.9GB)
huggingface-cli download stabilityai/stable-diffusion-xl-base-1.0 \
sd_xl_base_1.0.safetensors \
--local-dir ComfyUI/models/checkpoints/
Alternatively, download manually from HuggingFace and place the .safetensors file in ComfyUI/models/checkpoints/.
ComfyUI Model Directory Structure
ComfyUI/
models/
checkpoints/ # Full model files (SD 1.5, SDXL, etc.)
diffusion_models/ # Standalone transformers (Flux GGUF, etc.)
clip/ # Text encoders (T5, CLIP)
vae/ # VAE decoders
controlnet/ # ControlNet models
loras/ # LoRA adapter files
upscale_models/ # Upscaling models (RealESRGAN, etc.)
Basic Text-to-Image Workflow
ComfyUI ships with a default SDXL workflow. Here is how to use it:
- Load the default workflow — ComfyUI opens with a basic text-to-image graph
- Select your model — Click the checkpoint loader node and select
sd_xl_base_1.0.safetensors - Enter your prompt — Type in the positive prompt node (e.g., "A majestic mountain landscape at sunset, photorealistic, 8k detail")
- Set negative prompt — Common negatives: "blurry, low quality, distorted, deformed"
- Configure settings:
- Resolution: 1024x1024 (SDXL's native resolution)
- Steps: 25-30
- CFG Scale: 7.0
- Sampler: euler (fast) or dpmpp_2m (quality)
- Scheduler: karras
- Click Queue Prompt — generation starts
Your first image should appear in 5-15 seconds depending on your GPU.
Key Settings Explained
| Setting | What It Does | Recommended |
|---|---|---|
| Steps | Number of denoising iterations | 25-30 for SDXL |
| CFG Scale | How closely to follow the prompt | 6-8 for SDXL |
| Sampler | Denoising algorithm | euler or dpmpp_2m |
| Scheduler | Noise schedule | karras |
| Seed | Reproducibility — same seed gives same image | -1 for random |
Adding ControlNets
ControlNets let you guide image generation with structural inputs — edges, depth maps, poses, or reference images. They are essential for consistent, controllable output.
Setup
- Download a ControlNet model:
# SDXL Canny ControlNet (~2.5GB)
huggingface-cli download diffusers/controlnet-canny-sdxl-1.0 \
diffusion_pytorch_model.fp16.safetensors \
--local-dir ComfyUI/models/controlnet/
- In ComfyUI, add these nodes to your workflow:
- Load ControlNet Model — points to your downloaded ControlNet file
- Apply ControlNet — connects between your conditioning and the sampler
- Load Image — your control image (edge map, depth map, etc.)
Common ControlNet Types
| Type | Input | Use Case |
|---|---|---|
| Canny | Edge detection image | Maintain structure/outlines |
| Depth | Depth map (MiDaS) | 3D spatial composition |
| Pose (OpenPose) | Skeleton keypoints | Character poses |
| Tile | Low-res image | Upscaling with detail |
| Inpaint | Masked region | Selective regeneration |
Start with canny — it is the most intuitive. Take any image, run it through a canny edge detector (available as a ComfyUI node), and use the edge map to guide generation. The AI fills in the details while respecting your structure.
Using LoRAs for Style
LoRAs (Low-Rank Adaptations) are small adapter files that modify a model's style or teach it new concepts. They are typically 10-300MB and add about 0.1-0.3GB to VRAM usage.
Setup
- Download LoRA files from CivitAI or HuggingFace
- Place
.safetensorsfiles inComfyUI/models/loras/ - Add a Load LoRA node between your checkpoint loader and the CLIP/model outputs
- Set the strength (0.5-1.0 is typical — start at 0.7)
Tips for LoRAs
- Check compatibility: SDXL LoRAs only work with SDXL models, SD 1.5 LoRAs only with SD 1.5
- Trigger words: Many LoRAs require specific words in your prompt to activate (listed on the download page)
- Strength matters: Too high (above 1.0) causes artifacts; too low (below 0.3) has no effect
- Stack carefully: You can use multiple LoRAs, but each adds VRAM. Three to four LoRAs at 0.5-0.7 strength is a practical limit
VRAM Optimization Tips
Running out of VRAM is the most common issue in ComfyUI. Here are practical solutions:
Choose the Right Model for Your GPU
| VRAM | Recommended Models |
|---|---|
| 6 GB | SD 1.5, PixArt-alpha |
| 8 GB | SDXL 1.0, SD 3.5 Medium (FP8) |
| 12 GB | SDXL + ControlNets, Flux GGUF Q4-Q5 |
| 16 GB | SD 3.5 Large (FP8), Flux GGUF Q6-Q8 |
| 24 GB | Flux FP8, any model with ControlNets |
Optimization Techniques
-
Use FP8 precision — Add
--fp8_e4m3fn-unetto your ComfyUI launch command. Halves model VRAM with minimal quality loss. -
Enable sequential offloading — Use
--lowvramflag when launching ComfyUI. Moves unused model components to CPU RAM during generation. -
Use GGUF quantized models — For Flux, GGUF versions from city96 dramatically reduce VRAM. Q4 brings Flux from 33GB down to about 9GB.
-
Close other GPU applications — Browser hardware acceleration, Discord, and video players all consume VRAM. Close them before generating.
-
Reduce resolution — Generate at 768x768 instead of 1024x1024 to save 30-40% VRAM, then upscale the result.
Next Steps
Once you are comfortable with basic generation, explore:
- Inpainting — Selectively regenerate parts of an image
- Upscaling — Use models like RealESRGAN to upscale to 4K
- IP-Adapter — Use reference images to guide style without ControlNet structure
- AnimateDiff — Generate short animations from your image workflows
- Custom nodes — Install community nodes from ComfyUI Manager
Summary
ComfyUI is the most flexible way to run AI image generation locally. Start with SDXL on an 8GB GPU, learn the node workflow, then expand to ControlNets and LoRAs as you get comfortable. When you are ready for the highest quality output, move to Flux with a 12GB or larger GPU.
Check if your GPU can run SDXL | Check if your GPU can run Flux | Full Flux setup guide
Related reading: How to Run Flux Locally | Best GPU for AI Image Generation | SDXL LoRA Guide