stability ai

SDXL Turbo

Stable

by Stability AI

Adversarial distillation of SDXL for near real-time image generation. 2.6B UNet, only 1-4 steps needed. Quality is lower than SDXL base but generation is almost instant. Great for real-time previewing.

VRAM requirements, GPU fit, and setup notes for SDXL Turbo, including 8GB/12GB fit guidance where relevant. Recommended runtimes: ComfyUI and Diffusers support. Best download size: ~6.9 GB at FP16.

  • 1-step generation — near real-time
  • Based on SDXL architecture
  • Great for interactive prototyping
  • Quality tradeoff for speed
ComfyUI, DiffusersFP16 safetensors

Your hardware

Detecting...

Parameters2.6B
Max Resolution512×512
Default Steps1
ArchitectureUNET
Licensestability-ai-non-commercial

Image Quality Benchmarks

Measured quality metrics for SDXL Turbo outputs.

Human Preference Score60%

How often humans prefer this model's output (0-100%)

Aesthetic Score6.0

Visual quality and composition rating (5-9 scale)

VRAM Requirements by Resolution and Precision

Compare which GPUs can run SDXL Turbo at different precisions. FP8 uses less memory than FP16 when available, and the grade shows how comfortably each GPU handles the workload.

FP16 (full precision)

ResolutionVRAM RequiredRTX 4090 24GBRTX 3060 12GBRTX 4060 8GBMacBook Pro M4 Pro 24GB
512×5126.7 GBSSAS
768×7686.8 GBSSAS
1024×10247.0 GBSSAS

Run with Python

Run with Python (diffusers)
from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/sdxl-turbo",
    torch_dtype=torch.float16
)
pipe.to("cuda")

image = pipe(
    prompt="your prompt here",
    num_inference_steps=1,
    height=512,
    width=512,
).images[0]
image.save("output.png")

Get started

Setup instructions for running SDXL Turbo locally

1. Download the model

Get the checkpoint from HuggingFace

2. Place in:

ComfyUI/models/checkpoints/

3. Launch ComfyUI

python main.py

ComfyUI Workflow

Basic txt2img workflow for SDXL Turbo

7 nodes

Drag & drop into ComfyUI or use File → Import

Memory Breakdown

VRAM allocation at 1024×1024 on RTX 4090 24GB (24 GB)

Required: 7.0 GBAvailable: 24.0 GB
Weights5.2 GB
VAE0.2 GB
Text Encoder1.6 GB
Activations0.5 GB
Overhead0.5 GB

Estimated Generation Time

Time per image at 1024×1024, 28 steps, FP16.

RTX 4090 24GB500ms
RTX 3060 12GB~1.9s
RTX 4060 8GB~7.6s
MacBook Pro M4 Pro 24GB~4s

Sample Outputs

Available Formats, Downloads & Setup

Download SDXL Turbo in the precision that matches your GPU. Lower precision usually means less VRAM pressure, while higher precision keeps more quality.

FormatPräzisionGrößeAnbieter
safetensorsEmpfohlenFP166.9 GBofficialHerunterladen

LoRA Ecosystem

Limited

Few LoRAs available specifically for Turbo. Some SDXL LoRAs work but may degrade quality.

Related Workflows

You might also like

Frequently asked questions

FAQ — SDXL Turbo VRAM, Runtimes & Fit

How much VRAM does SDXL Turbo need?

SDXL Turbo (2.6B parameters) requires approximately 7.0 GB of VRAM at FP16 precision for standard 1024×1024 image generation. If you want a lighter setup, lower precisions like FP8 can reduce memory use when available.

Can I run SDXL Turbo on an 8GB GPU?

SDXL Turbo usually needs more than 8GB for comfortable local use. Check the VRAM table above for the exact resolution and precision trade-off.

Does SDXL Turbo work in ComfyUI and Diffusers?

SDXL Turbo is marked for ComfyUI and Diffusers support in our catalog, so those are the runtimes we recommend first for local setup. If your workflow uses another front end, check the model's available formats and workflow notes above before downloading.

Can I run SDXL Turbo on RTX 4090?

Yes, the RTX 4090 (24 GB VRAM) can run SDXL Turbo comfortably at FP16. Expected generation time is around 500ms per image at 1024×1024.

Does SDXL Turbo support ControlNet?

There are currently no known ControlNet adapters for SDXL Turbo. Check Hugging Face and Civitai for community-contributed adapters.

Does SDXL Turbo have LoRA support?

Few LoRAs available specifically for Turbo. Some SDXL LoRAs work but may degrade quality. The LoRA ecosystem for SDXL Turbo is rated as "minimal". Each LoRA adds roughly 0.2 GB of extra VRAM.

How fast is SDXL Turbo?

On a reference GPU (RTX 4090 24GB), SDXL Turbo generates a 1024×1024 image in approximately 500ms at FP16 with 28 inference steps. Faster GPUs with higher memory bandwidth will produce images more quickly.

About SDXL Turbo

Use cases
fast-generationreal-timeprototyping
Recommended runtimes
comfyuidiffusers

See also