How much VRAM do I need for Flux.1 Dev?

Flux.1 Dev at full FP16 precision needs approximately 33 GB VRAM, which exceeds every consumer GPU. At FP8 quantization it drops to around 13 GB, fitting on an RTX 4070 Ti Super or RTX 4080. GGUF Q4 quantization brings it down to roughly 7 GB, making it accessible on 8 GB GPUs like the RTX 4060. Quality loss at Q4 is noticeable but still very usable.

Can I run SDXL on an 8 GB GPU?

Yes. SDXL 1.0 requires approximately 7 to 8 GB VRAM at FP16, which fits within an 8 GB GPU like the RTX 4060 or RTX 3070. You may need to disable preview features or limit batch size to one image at a time. For more headroom with LoRAs and ControlNets, a 12 GB GPU is more comfortable.

What is FP8 quantization for image models?

FP8 is a floating-point format that uses 8 bits instead of the standard 16 bits per weight. For diffusion models like Flux, FP8 cuts VRAM usage nearly in half with minimal quality loss — often imperceptible in blind tests. ComfyUI and Forge both support FP8 natively. It is the recommended precision for Flux models on 12 to 16 GB GPUs.

Is GGUF quantization available for image generation models?

GGUF quantization is available for Flux models thanks to community work with the llama.cpp-compatible diffusion backends. You can find GGUF versions of Flux.1 Dev, Flux.1 Schnell, and Flux.2 Dev on CivitAI and Hugging Face. SDXL and SD 1.5 do not typically use GGUF since they already fit comfortably on consumer GPUs at FP16.

Which image model gives the best quality on a 12 GB GPU?

On 12 GB VRAM, Flux.1 Dev at GGUF Q5 or Q6 offers the best image quality available. It outperforms SDXL and SD 3.5 Medium in prompt adherence, detail, and text rendering. If you need a large LoRA ecosystem or anime-specific workflows, SDXL with fine-tunes like Pony Diffusion runs comfortably at 12 GB with room for ControlNets.

How much VRAM does Stable Diffusion 3.5 need?

SD 3.5 comes in two sizes. SD 3.5 Medium needs approximately 6 GB VRAM at FP16, making it accessible on most modern GPUs. SD 3.5 Large requires around 10 GB at FP16 for the UNet alone, but with its triple text encoder the total climbs to 16 to 18 GB in practice. A 16 GB GPU handles SD 3.5 Large comfortably, while 12 GB requires attention offloading.

April 2, 2026vram, image-generation, flux, sdxl, stable-diffusion, gpu, fp8, gguf, quantization

Image Generation VRAM Requirements 2026 — Flux, SDXL, SD 3.5

Flux.1 needs 7 GB at GGUF Q4; SDXL fits on 8 GB. FP16/FP8/GGUF VRAM tables for Flux.1, Flux.2, SDXL, SD 3.5, Pony Diffusion — GPU picks by tier for 2026.

VRAM is the deciding factor for local image generation. Every model has a hard floor — if it does not fit, you get a black image or an out-of-memory crash. With the arrival of FP8 quantization and GGUF formats for diffusion models, those floors have shifted dramatically. This guide maps out the exact VRAM requirements for every major image model in 2026, at every practical precision level.

Use our diffusion model calculator to check any specific GPU and model combination instantly.

The Complete VRAM Table

This is the reference table. All numbers reflect total VRAM usage during generation (model weights + VAE + text encoder + working memory), not just checkpoint file size.

At 512x512 Resolution

Model	Params	FP16	FP8	GGUF Q4
SD 1.5	0.86B	~4 GB	—	—
SDXL 1.0	2.6B	~7.5 GB	—	—
Pony Diffusion V6 XL	2.6B	~7.5 GB	—	—
Illustrious XL	2.6B	~7.5 GB	—	—
SD 3.5 Medium	2.5B	~6 GB	—	—
SD 3.5 Large	8B	~10 GB	~7 GB	—
Flux.1 Schnell	12B	~33 GB	~13 GB	~7 GB
Flux.1 Dev	12B	~33 GB	~13 GB	~7 GB
Flux.2 Dev	12B	~33 GB	~13 GB	~7 GB

At 1024x1024 Resolution

Higher resolution increases the intermediate tensor memory. UNet/DiT activations scale with pixel count, adding 1 to 4 GB depending on architecture.

Model	Params	FP16	FP8	GGUF Q4
SD 1.5	0.86B	~5 GB	—	—
SDXL 1.0	2.6B	~8 GB	—	—
Pony Diffusion V6 XL	2.6B	~8 GB	—	—
Illustrious XL	2.6B	~8 GB	—	—
SD 3.5 Medium	2.5B	~7 GB	—	—
SD 3.5 Large	8B	~12 GB	~9 GB	—
Flux.1 Schnell	12B	~35 GB	~15 GB	~9 GB
Flux.1 Dev	12B	~35 GB	~15 GB	~9 GB
Flux.2 Dev	12B	~35 GB	~15 GB	~9 GB

Key observations:

SD 1.5 and SDXL-based models (including Pony and Illustrious) do not benefit from FP8 or GGUF because they already fit on consumer GPUs at FP16.
Flux models are the primary beneficiaries of quantization — FP8 and GGUF make a 33 GB model accessible on 12 to 16 GB hardware.
SD 3.5 Large straddles the line: FP16 is tight on 12 GB, but FP8 fits comfortably.

Understanding Each Model

SD 1.5 — The Lightweight Veteran

Stable Diffusion 1.5 runs on virtually anything. At 0.86B parameters, it needs just 4 to 5 GB VRAM and generates images in under 2 seconds on modern GPUs. The ecosystem is massive — thousands of checkpoints, LoRAs, and ControlNets. Image quality is the weakest of this list, but fine-tunes like Realistic Vision and DreamShaper close the gap.

Best for: 4 to 6 GB GPUs, fast iteration, ControlNet-heavy workflows.

SDXL 1.0, Pony Diffusion V6 XL, and Illustrious XL

SDXL 1.0 is the workhorse of the 8 GB tier. Its 2.6B UNet produces significantly better images than SD 1.5, with stronger composition and more detail. Pony Diffusion V6 XL and Illustrious XL are SDXL fine-tunes — same architecture, same VRAM footprint, but trained on different data. Pony excels at anime and stylized art with Danbooru-style tagging. Illustrious targets high-fidelity anime illustration with improved hand and face rendering.

All three need 7 to 8 GB VRAM at FP16. No quantization needed, no quantization available in standard form.

Best for: 8 GB GPUs, anime workflows (Pony/Illustrious), large LoRA ecosystem.

SD 3.5 Medium and Large

SD 3.5 Medium uses an MMDiT architecture at 2.5B parameters — roughly the same weight count as SDXL but with better text rendering and prompt understanding. At ~6 GB FP16, it slots below SDXL in VRAM requirements while offering improved quality.

SD 3.5 Large scales to 8B parameters. The UNet alone takes ~10 GB, but with its triple text encoder (CLIP-L, CLIP-G, T5-XXL), total VRAM climbs to 16 to 18 GB in full precision. FP8 brings the UNet down to ~7 GB, making the total around 12 to 14 GB. The ecosystem remains small — far fewer LoRAs and ControlNets than SDXL or Flux.

Best for: 12 to 16 GB GPUs wanting better text rendering than SDXL without the VRAM cost of Flux.

Flux.1 Dev, Flux.1 Schnell, and Flux.2 Dev

The Flux family uses a 12B DiT (Diffusion Transformer) architecture that delivers the best image quality in local generation. Flux.1 Dev is the high-quality variant (28 steps). Flux.1 Schnell is the fast variant (4 steps, Apache 2.0 license). Flux.2 Dev is the successor with refined training and improved detail.

At FP16, all three need ~33 GB — beyond any consumer GPU. This is where quantization transforms accessibility:

Precision	VRAM (1024x1024)	Quality Impact	Recommended GPU
FP16	~35 GB	Baseline	A100, H100
FP8	~15 GB	Negligible	RTX 4080 16GB, RTX 5070 Ti
GGUF Q6	~11 GB	Minimal	RTX 4070 12GB
GGUF Q5	~10 GB	Very slight	RTX 4070 12GB
GGUF Q4	~9 GB	Noticeable softening	RTX 4060 8GB (tight)

Compare these models directly: Flux.1 Dev vs SDXL 1.0

FP8 and GGUF Quantization Explained

FP8 — The Sweet Spot for Flux

FP8 (8-bit floating point) halves the memory footprint of model weights compared to FP16. For diffusion models, the quality impact is nearly invisible — most users cannot distinguish FP8 from FP16 outputs in blind comparisons. Both ComfyUI and Forge UI load FP8 checkpoints natively.

FP8 is available for:

Flux.1 Dev / Schnell / Flux.2 Dev — the primary use case, dropping from 33 GB to ~13 GB
SD 3.5 Large — useful on 12 GB GPUs where FP16 is too tight

FP8 is not useful for SD 1.5 or SDXL because they already fit comfortably at FP16.

GGUF — Pushing Below 12 GB

GGUF quantization (originally from the LLM space via llama.cpp) has been adapted for Flux models. It offers more aggressive compression than FP8, with quality levels from Q8 (near-lossless) down to Q4 (noticeable softening). GGUF enables Flux on 8 to 10 GB GPUs — a tier previously limited to SDXL.

The trade-off is real: Q4 Flux images show softer fine details and slightly less accurate text rendering compared to FP8 or FP16. But the core composition and prompt adherence remain strong. For most use cases, Q5 or Q6 offers the best balance between VRAM savings and quality preservation.

Every image model page on Will It Run AI now includes FP8 VRAM tables so you can see the exact numbers for your specific hardware.

GPU Recommendations by VRAM Tier

8 GB — RTX 4060, RTX 3070, RTX 4060 Ti 8GB

Model	Status
SD 1.5	Runs perfectly, fast generation
SDXL 1.0	Runs well, tight with LoRAs + ControlNet
Pony Diffusion V6 XL	Same as SDXL
Illustrious XL	Same as SDXL
Flux.1 Dev GGUF Q4	Runs at 512x512, very tight at 1024x1024
SD 3.5 Medium	Runs comfortably

Verdict: SDXL and its fine-tunes are the best fit. Flux at GGUF Q4 is possible but pushes limits. SD 1.5 is the fastest option.

12 GB — RTX 4070, RTX 3060 12GB, RTX 4070 Super

Model	Status
All SDXL-based models	Comfortable, room for LoRAs + ControlNet
Flux.1 Dev GGUF Q5-Q6	Sweet spot — good quality, fits with headroom
SD 3.5 Large FP8	Fits with attention offloading
SD 3.5 Medium	Very comfortable

Verdict: The best value tier. Flux at GGUF Q5-Q6 delivers excellent quality. SDXL runs with full LoRA and ControlNet stacks.

16 GB — RTX 4080, RTX 5070 Ti, RTX 4060 Ti 16GB

Model	Status
Flux.1 Dev FP8	Runs well at 1024x1024
Flux.2 Dev FP8	Runs well at 1024x1024
SD 3.5 Large FP16	Fits but tight with T5-XXL encoder
All SDXL-based models	Comfortable with every addon

Verdict: FP8 Flux becomes practical. This tier unlocks the best image quality without GGUF compromises. SD 3.5 Large in full precision is feasible.

24 GB — RTX 4090, RTX 5090 (32 GB), RTX 3090

Model	Status
Flux.1 Dev FP8	Full comfort, room for ControlNets
Flux.2 Dev FP8	Full comfort, room for ControlNets
SD 3.5 Large FP16	Comfortable with all encoders loaded
All SDXL-based models	Overkill — batching and high-res become practical

Verdict: No compromises. FP8 Flux with ControlNets, IP-Adapter, and LoRAs all loaded simultaneously. This is the tier for professional workflows.

Quick Decision Guide

If you want to skip the details:

Under 6 GB VRAM: SD 1.5 fine-tunes only.
8 GB: SDXL or Pony Diffusion. Flux GGUF Q4 for experimentation.
12 GB: Flux GGUF Q5-Q6 for best quality. SDXL for ecosystem breadth.
16 GB: Flux FP8. Best quality-to-VRAM ratio available.
24 GB+: Flux FP8 with full addon stack. Batch generation. No limits.

Check your exact GPU compatibility with our diffusion model calculator, or compare any two models head-to-head in the image model comparison tool.

What About Apple Silicon?

Apple Silicon Macs use unified memory shared between CPU and GPU. An M4 Pro with 24 GB handles Flux at FP8 comfortably. An M4 Max with 48 or 64 GB can run Flux at FP16. Generation speed is slower than NVIDIA — roughly 2 to 4x slower per image — but the large unified memory pool means fewer quantization compromises. SDXL runs on any M-series Mac with 16 GB or more.

Bottom Line

The image generation VRAM landscape in 2026 comes down to two tiers: models that fit on 8 GB (SDXL family, SD 1.5, SD 3.5 Medium) and models that need quantization to fit on consumer hardware (Flux family, SD 3.5 Large). FP8 quantization has made Flux practical on 16 GB GPUs with near-zero quality loss, and GGUF pushes it down to 12 GB with modest trade-offs.

For the most current VRAM numbers for any model on any GPU, check the model pages on Will It Run AI — every image model now includes FP8 VRAM tables alongside FP16 baselines.