Flux vs SDXL vs SD 3.5 — Which Image Model Should You Choose?
Side-by-side comparison of Flux.1, Stable Diffusion XL, and SD 3.5 for local image generation. Quality, VRAM requirements, speed, ecosystem, licensing, and recommendations by use case.
Flux.1, SDXL, and SD 3.5 are the three most important image generation architectures for local use today. Each makes different trade-offs between quality, accessibility, and ecosystem maturity. This comparison breaks down exactly where each model wins and loses, so you can choose based on your hardware and workflow.
The Quick Comparison
| Flux.1 Dev | SDXL 1.0 | SD 3.5 Large | |
|---|---|---|---|
| Architecture | DiT (12B) | UNet (2.6B) | MMDiT (2.5B) |
| VRAM (FP16) | 33 GB | 8 GB | 18 GB |
| VRAM (Optimized) | 12 GB (GGUF Q4) | 7 GB | 18 GB |
| Steps | 28 (Dev) / 4 (Schnell) | 30 | 28 |
| Quality | Excellent | Good | Very good |
| Text Rendering | Excellent | Poor | Good |
| ControlNets | 3 types | 5+ types | 1 type |
| LoRAs (CivitAI) | ~500 | 5,000+ | ~50 |
| License | Non-commercial (Dev) | OpenRAIL++ | Community |
| Speed (RTX 4090) | 12 sec | 4.5 sec | 8 sec |
Quality
Photorealism
Flux.1 Dev leads decisively. Its 12B DiT architecture produces images with finer details, more natural lighting, and better skin textures than either SDXL or SD 3.5. The gap is most visible in close-up portraits and detailed product photography.
SD 3.5 Large sits in the middle — better composition and detail than SDXL, thanks to its MMDiT architecture and triple text encoder. It handles complex scenes more coherently.
SDXL's base model shows its age, but community fine-tunes like RealVisXL v5 and Juggernaut XL v9 close the gap significantly. A well-tuned SDXL checkpoint can approach SD 3.5 quality for specific styles.
Ranking: Flux > SD 3.5 > SDXL (base) > SDXL (fine-tuned, depending on style)
Text Rendering
This is where architecture differences show most clearly.
Flux.1 renders text in images accurately. Signs, labels, watermarks, and typography come out legible and correctly spelled in most cases. This is a breakthrough capability that SDXL fundamentally cannot match.
SD 3.5 renders text reasonably well — better than SDXL but below Flux. Its T5-XXL text encoder gives it stronger language understanding.
SDXL struggles with text. Letters are frequently garbled, misspelled, or nonsensical. This is a known architectural limitation of the UNet + CLIP text encoder combination.
Ranking: Flux >> SD 3.5 > SDXL
Prompt Adherence
Flux's combination of T5-XXL (4.7B) and CLIP-L encoders gives it the best prompt understanding. Complex, detailed prompts with multiple subjects and specific spatial relationships are handled well.
SD 3.5's triple encoder (T5-XXL + CLIP-L + OpenCLIP-G, 5.5B combined) provides excellent prompt following — occasionally rivaling Flux on compositional accuracy.
SDXL's dual CLIP encoder (0.82B combined) is the weakest link. It handles straightforward prompts well but drops details or misinterprets complex compositions.
Ranking: Flux > SD 3.5 > SDXL
VRAM Requirements
This is where SDXL's advantage is overwhelming.
| Model | Minimum VRAM | Comfortable | Ideal |
|---|---|---|---|
| SDXL 1.0 | 7 GB | 8 GB | 12 GB |
| SD 3.5 Large | 18 GB | 24 GB | 24 GB |
| Flux.1 Dev (GGUF Q4) | 9 GB | 12 GB | 16 GB |
| Flux.1 Dev (FP8) | 17 GB | 20 GB | 24 GB |
| Flux.1 Dev (FP16) | 33 GB | 40 GB | 48 GB |
SDXL runs on virtually any modern GPU with 8GB. An RTX 4060, RX 7600, or even older cards like the RTX 3060 handle it comfortably. This accessibility is why SDXL remains the most widely used image model.
SD 3.5 Large at 18GB effectively requires a 24GB GPU — RTX 4090, RTX 3090, or equivalent. There is no quantization path to reduce this significantly.
Flux is the most flexible despite its large size. GGUF quantization from city96 brings usable Flux down to 12GB GPUs. The quality trade-off at Q4-Q6 is modest for most use cases.
Speed
At the same resolution (1024x1024) on an RTX 4090:
| Model | Steps | Time | Images/Minute |
|---|---|---|---|
| SDXL 1.0 | 30 | 4.5 sec | ~13 |
| SD 3.5 Large | 28 | 8 sec | ~7.5 |
| Flux.1 Schnell | 4 | 2 sec | ~30 |
| Flux.1 Dev | 28 | 12 sec | ~5 |
Flux.1 Schnell is the fastest by a wide margin — 4 steps versus 28-30 for the others. For iteration-heavy workflows where you generate many candidates, Schnell is exceptional.
SDXL is the fastest "full quality" model. Its UNet architecture is computationally efficient, and 30 years of community optimization have refined its pipelines.
Flux.1 Dev is the slowest, but 12 seconds per image on an RTX 4090 is still very usable for single-image workflows.
Ecosystem
LoRAs
SDXL's LoRA ecosystem dwarfs everything else. Over 5,000 LoRAs on CivitAI alone cover every style, character, concept, and quality modifier imaginable. Want a specific anime character? There are dozens of options. Need a particular art style? It exists.
Flux has roughly 500 LoRAs and growing. The essentials are covered — realism, anime, specific styles — but the selection is a fraction of SDXL's.
SD 3.5 has approximately 50 LoRAs. The ecosystem is nascent and may not develop significantly given the model's VRAM requirements limit its user base.
ControlNets
| Control Type | SD 1.5 | SDXL | Flux.1 Dev | SD 3.5 |
|---|---|---|---|---|
| Canny | Yes | Yes | Yes | Yes |
| Depth | Yes | Yes | Yes | No |
| OpenPose | Yes | Yes | No | No |
| IP-Adapter | Yes | Yes | Limited | No |
| Union (Multi) | No | Yes | Yes | No |
| Scribble | Yes | No | No | No |
| Lineart | Yes | No | No | No |
| Tile/Upscale | Yes | No | No | No |
| Inpaint | Yes | No | No | No |
SDXL offers the best balance of ControlNet variety and image quality. SD 1.5 has the most ControlNet types but lower base quality. Flux has fewer options but the quality of controlled generations is the highest.
Community Fine-Tunes
SDXL has the richest ecosystem of fine-tuned checkpoints:
- Photorealism: RealVisXL v5, Juggernaut XL v9
- Versatile: DreamShaper XL
- Anime: Animagine XL 3.1, Pony Diffusion V6 XL
Flux and SD 3.5 do not have significant community fine-tune ecosystems yet.
Licensing
| Model | License | Commercial Use | Training Use |
|---|---|---|---|
| Flux.1 Schnell | Apache 2.0 | Yes, unrestricted | Yes |
| SDXL 1.0 | OpenRAIL++ | Yes, with restrictions | Yes |
| SD 3.5 Large | Stability Community | Limited | Limited |
| Flux.1 Dev | Non-commercial | No | No |
If commercial use is a requirement, Flux.1 Schnell and SDXL are the clear choices. Flux Dev is explicitly non-commercial. SD 3.5's community license has limitations worth reviewing for commercial projects.
Best For Each Use Case
Photorealism
Best: Flux.1 Dev (quality leader) or RealVisXL v5 on SDXL (best on 8GB VRAM)
Flux produces the most naturally photorealistic images. But RealVisXL v5, a fine-tuned SDXL checkpoint, delivers remarkable photorealism at a fraction of the VRAM cost.
Anime and Illustration
Best: SDXL with Animagine XL 3.1 or Pony Diffusion V6 XL
The anime LoRA ecosystem is concentrated around SDXL. Thousands of character LoRAs, style modifiers, and quality enhancers make SDXL the practical choice for anime workflows.
Text in Images
Best: Flux.1 Dev
If you need readable text in generated images — signs, labels, typography, watermarks — Flux is the only model that handles this reliably. SDXL cannot do this well. SD 3.5 is a distant second.
Budget Hardware (8GB VRAM)
Best: SDXL 1.0 or its fine-tunes
SDXL is the only model in this comparison that runs well on 8GB. SD 3.5 and Flux at full precision are out of reach. Quantized Flux (GGUF Q4) fits at 8-9GB but the experience is tight.
Controlled Generation (ControlNet Workflows)
Best: SDXL or SD 1.5
If your workflow depends on pose control, edge guidance, IP-adapter, or inpainting via ControlNets, SDXL has the broadest toolkit. SD 1.5 has even more ControlNet types but lower base quality.
Commercial Projects
Best: Flux.1 Schnell (quality + Apache 2.0) or SDXL (ecosystem + OpenRAIL++)
Flux.1 Schnell gives you Flux-quality generation with a permissive Apache 2.0 license. SDXL's OpenRAIL++ license also permits commercial use with some restrictions.
Decision Table
| Your Situation | Recommended Model |
|---|---|
| 8GB VRAM, need flexibility | SDXL 1.0 + fine-tunes |
| 12GB VRAM, want best quality possible | Flux.1 Dev GGUF Q4-Q5 |
| 16GB VRAM, balanced use | Flux.1 Dev GGUF Q6 |
| 24GB VRAM, no compromises | Flux.1 Dev FP8 |
| Need text in images | Flux.1 Dev |
| Anime workflow | SDXL 1.0 + Animagine/Pony |
| Commercial project, fast | Flux.1 Schnell |
| Maximum ControlNet flexibility | SDXL 1.0 |
| Budget hardware, still decent | SD 1.5 fine-tunes |
There is no single winner. Flux leads on quality, SDXL leads on ecosystem and accessibility, and SD 3.5 sits in a narrow middle ground that is worth considering if you have the VRAM and do not need ecosystem depth.
Update (March 2026): Flux 2 Dev is now available as the successor to Flux 1, with improved quality. Flux 2 Klein 4B offers Apache 2.0 licensing for commercial use.
Check which models fit your GPU | Compare models side-by-side
Related reading: How to Run Flux Locally | Best Local Image Generation Models | Best GPU for Running LLMs Locally