Image Generation VRAM Requirements 2026 — Flux, SDXL, SD 3.5
Flux.1 needs 7 GB at GGUF Q4; SDXL fits on 8 GB. FP16/FP8/GGUF VRAM tables for Flux.1, Flux.2, SDXL, SD 3.5, Pony Diffusion — GPU picks by tier for 2026.
VRAM is the deciding factor for local image generation. Every model has a hard floor — if it does not fit, you get a black image or an out-of-memory crash. With the arrival of FP8 quantization and GGUF formats for diffusion models, those floors have shifted dramatically. This guide maps out the exact VRAM requirements for every major image model in 2026, at every practical precision level.
Use our diffusion model calculator to check any specific GPU and model combination instantly.
The Complete VRAM Table
This is the reference table. All numbers reflect total VRAM usage during generation (model weights + VAE + text encoder + working memory), not just checkpoint file size.
At 512x512 Resolution
| Model | Params | FP16 | FP8 | GGUF Q4 |
|---|---|---|---|---|
| SD 1.5 | 0.86B | ~4 GB | — | — |
| SDXL 1.0 | 2.6B | ~7.5 GB | — | — |
| Pony Diffusion V6 XL | 2.6B | ~7.5 GB | — | — |
| Illustrious XL | 2.6B | ~7.5 GB | — | — |
| SD 3.5 Medium | 2.5B | ~6 GB | — | — |
| SD 3.5 Large | 8B | ~10 GB | ~7 GB | — |
| Flux.1 Schnell | 12B | ~33 GB | ~13 GB | ~7 GB |
| Flux.1 Dev | 12B | ~33 GB | ~13 GB | ~7 GB |
| Flux.2 Dev | 12B | ~33 GB | ~13 GB | ~7 GB |
At 1024x1024 Resolution
Higher resolution increases the intermediate tensor memory. UNet/DiT activations scale with pixel count, adding 1 to 4 GB depending on architecture.
| Model | Params | FP16 | FP8 | GGUF Q4 |
|---|---|---|---|---|
| SD 1.5 | 0.86B | ~5 GB | — | — |
| SDXL 1.0 | 2.6B | ~8 GB | — | — |
| Pony Diffusion V6 XL | 2.6B | ~8 GB | — | — |
| Illustrious XL | 2.6B | ~8 GB | — | — |
| SD 3.5 Medium | 2.5B | ~7 GB | — | — |
| SD 3.5 Large | 8B | ~12 GB | ~9 GB | — |
| Flux.1 Schnell | 12B | ~35 GB | ~15 GB | ~9 GB |
| Flux.1 Dev | 12B | ~35 GB | ~15 GB | ~9 GB |
| Flux.2 Dev | 12B | ~35 GB | ~15 GB | ~9 GB |
Key observations:
- SD 1.5 and SDXL-based models (including Pony and Illustrious) do not benefit from FP8 or GGUF because they already fit on consumer GPUs at FP16.
- Flux models are the primary beneficiaries of quantization — FP8 and GGUF make a 33 GB model accessible on 12 to 16 GB hardware.
- SD 3.5 Large straddles the line: FP16 is tight on 12 GB, but FP8 fits comfortably.
Understanding Each Model
SD 1.5 — The Lightweight Veteran
Stable Diffusion 1.5 runs on virtually anything. At 0.86B parameters, it needs just 4 to 5 GB VRAM and generates images in under 2 seconds on modern GPUs. The ecosystem is massive — thousands of checkpoints, LoRAs, and ControlNets. Image quality is the weakest of this list, but fine-tunes like Realistic Vision and DreamShaper close the gap.
Best for: 4 to 6 GB GPUs, fast iteration, ControlNet-heavy workflows.
SDXL 1.0, Pony Diffusion V6 XL, and Illustrious XL
SDXL 1.0 is the workhorse of the 8 GB tier. Its 2.6B UNet produces significantly better images than SD 1.5, with stronger composition and more detail. Pony Diffusion V6 XL and Illustrious XL are SDXL fine-tunes — same architecture, same VRAM footprint, but trained on different data. Pony excels at anime and stylized art with Danbooru-style tagging. Illustrious targets high-fidelity anime illustration with improved hand and face rendering.
All three need 7 to 8 GB VRAM at FP16. No quantization needed, no quantization available in standard form.
Best for: 8 GB GPUs, anime workflows (Pony/Illustrious), large LoRA ecosystem.
SD 3.5 Medium and Large
SD 3.5 Medium uses an MMDiT architecture at 2.5B parameters — roughly the same weight count as SDXL but with better text rendering and prompt understanding. At ~6 GB FP16, it slots below SDXL in VRAM requirements while offering improved quality.
SD 3.5 Large scales to 8B parameters. The UNet alone takes ~10 GB, but with its triple text encoder (CLIP-L, CLIP-G, T5-XXL), total VRAM climbs to 16 to 18 GB in full precision. FP8 brings the UNet down to ~7 GB, making the total around 12 to 14 GB. The ecosystem remains small — far fewer LoRAs and ControlNets than SDXL or Flux.
Best for: 12 to 16 GB GPUs wanting better text rendering than SDXL without the VRAM cost of Flux.
Flux.1 Dev, Flux.1 Schnell, and Flux.2 Dev
The Flux family uses a 12B DiT (Diffusion Transformer) architecture that delivers the best image quality in local generation. Flux.1 Dev is the high-quality variant (28 steps). Flux.1 Schnell is the fast variant (4 steps, Apache 2.0 license). Flux.2 Dev is the successor with refined training and improved detail.
At FP16, all three need ~33 GB — beyond any consumer GPU. This is where quantization transforms accessibility:
| Precision | VRAM (1024x1024) | Quality Impact | Recommended GPU |
|---|---|---|---|
| FP16 | ~35 GB | Baseline | A100, H100 |
| FP8 | ~15 GB | Negligible | RTX 4080 16GB, RTX 5070 Ti |
| GGUF Q6 | ~11 GB | Minimal | RTX 4070 12GB |
| GGUF Q5 | ~10 GB | Very slight | RTX 4070 12GB |
| GGUF Q4 | ~9 GB | Noticeable softening | RTX 4060 8GB (tight) |
Compare these models directly: Flux.1 Dev vs SDXL 1.0
FP8 and GGUF Quantization Explained
FP8 — The Sweet Spot for Flux
FP8 (8-bit floating point) halves the memory footprint of model weights compared to FP16. For diffusion models, the quality impact is nearly invisible — most users cannot distinguish FP8 from FP16 outputs in blind comparisons. Both ComfyUI and Forge UI load FP8 checkpoints natively.
FP8 is available for:
- Flux.1 Dev / Schnell / Flux.2 Dev — the primary use case, dropping from 33 GB to ~13 GB
- SD 3.5 Large — useful on 12 GB GPUs where FP16 is too tight
FP8 is not useful for SD 1.5 or SDXL because they already fit comfortably at FP16.
GGUF — Pushing Below 12 GB
GGUF quantization (originally from the LLM space via llama.cpp) has been adapted for Flux models. It offers more aggressive compression than FP8, with quality levels from Q8 (near-lossless) down to Q4 (noticeable softening). GGUF enables Flux on 8 to 10 GB GPUs — a tier previously limited to SDXL.
The trade-off is real: Q4 Flux images show softer fine details and slightly less accurate text rendering compared to FP8 or FP16. But the core composition and prompt adherence remain strong. For most use cases, Q5 or Q6 offers the best balance between VRAM savings and quality preservation.
Every image model page on Will It Run AI now includes FP8 VRAM tables so you can see the exact numbers for your specific hardware.
GPU Recommendations by VRAM Tier
8 GB — RTX 4060, RTX 3070, RTX 4060 Ti 8GB
| Model | Status |
|---|---|
| SD 1.5 | Runs perfectly, fast generation |
| SDXL 1.0 | Runs well, tight with LoRAs + ControlNet |
| Pony Diffusion V6 XL | Same as SDXL |
| Illustrious XL | Same as SDXL |
| Flux.1 Dev GGUF Q4 | Runs at 512x512, very tight at 1024x1024 |
| SD 3.5 Medium | Runs comfortably |
Verdict: SDXL and its fine-tunes are the best fit. Flux at GGUF Q4 is possible but pushes limits. SD 1.5 is the fastest option.
12 GB — RTX 4070, RTX 3060 12GB, RTX 4070 Super
| Model | Status |
|---|---|
| All SDXL-based models | Comfortable, room for LoRAs + ControlNet |
| Flux.1 Dev GGUF Q5-Q6 | Sweet spot — good quality, fits with headroom |
| SD 3.5 Large FP8 | Fits with attention offloading |
| SD 3.5 Medium | Very comfortable |
Verdict: The best value tier. Flux at GGUF Q5-Q6 delivers excellent quality. SDXL runs with full LoRA and ControlNet stacks.
16 GB — RTX 4080, RTX 5070 Ti, RTX 4060 Ti 16GB
| Model | Status |
|---|---|
| Flux.1 Dev FP8 | Runs well at 1024x1024 |
| Flux.2 Dev FP8 | Runs well at 1024x1024 |
| SD 3.5 Large FP16 | Fits but tight with T5-XXL encoder |
| All SDXL-based models | Comfortable with every addon |
Verdict: FP8 Flux becomes practical. This tier unlocks the best image quality without GGUF compromises. SD 3.5 Large in full precision is feasible.
24 GB — RTX 4090, RTX 5090 (32 GB), RTX 3090
| Model | Status |
|---|---|
| Flux.1 Dev FP8 | Full comfort, room for ControlNets |
| Flux.2 Dev FP8 | Full comfort, room for ControlNets |
| SD 3.5 Large FP16 | Comfortable with all encoders loaded |
| All SDXL-based models | Overkill — batching and high-res become practical |
Verdict: No compromises. FP8 Flux with ControlNets, IP-Adapter, and LoRAs all loaded simultaneously. This is the tier for professional workflows.
Quick Decision Guide
If you want to skip the details:
- Under 6 GB VRAM: SD 1.5 fine-tunes only.
- 8 GB: SDXL or Pony Diffusion. Flux GGUF Q4 for experimentation.
- 12 GB: Flux GGUF Q5-Q6 for best quality. SDXL for ecosystem breadth.
- 16 GB: Flux FP8. Best quality-to-VRAM ratio available.
- 24 GB+: Flux FP8 with full addon stack. Batch generation. No limits.
Check your exact GPU compatibility with our diffusion model calculator, or compare any two models head-to-head in the image model comparison tool.
What About Apple Silicon?
Apple Silicon Macs use unified memory shared between CPU and GPU. An M4 Pro with 24 GB handles Flux at FP8 comfortably. An M4 Max with 48 or 64 GB can run Flux at FP16. Generation speed is slower than NVIDIA — roughly 2 to 4x slower per image — but the large unified memory pool means fewer quantization compromises. SDXL runs on any M-series Mac with 16 GB or more.
Bottom Line
The image generation VRAM landscape in 2026 comes down to two tiers: models that fit on 8 GB (SDXL family, SD 1.5, SD 3.5 Medium) and models that need quantization to fit on consumer hardware (Flux family, SD 3.5 Large). FP8 quantization has made Flux practical on 16 GB GPUs with near-zero quality loss, and GGUF pushes it down to 12 GB with modest trade-offs.
For the most current VRAM numbers for any model on any GPU, check the model pages on Will It Run AI — every image model now includes FP8 VRAM tables alongside FP16 baselines.