Best AI Models for 8GB VRAM — What Actually Runs on RTX 4060, RTX 3070, Arc B580
15 AI models that fit in 8GB VRAM: Qwen 3.5 4B, Phi-4 Mini, Gemma 3 4B for LLMs. SD 1.5, SDXL for images. Complete VRAM breakdown at Q4, Q5, Q8.
You have 8GB of VRAM. That is the most common GPU tier in the world — millions of RTX 4060, RTX 3060 8GB, RTX 3070, and Intel Arc B580 cards. It is enough to run real AI models locally, but you need to choose wisely.
This guide ranks the best models that actually fit in 8GB, based on our compatibility engine that scores VRAM fit, speed, and quality for every combination.
Best LLMs for 8GB VRAM
Tier 1: Fits Perfectly (under 5GB at Q4)
These models leave plenty of headroom for context and KV cache.
| Model | Params | VRAM (Q4) | VRAM (Q8) | Best For |
|---|---|---|---|---|
| Qwen 3.5 4B | 4B | 2.5 GB | 4.6 GB | Best all-rounder at this size |
| Phi-4 Mini Reasoning 4B | 3.8B | 2.3 GB | 4.3 GB | Reasoning and math |
| Gemma 3 4B | 4B | 2.5 GB | 4.6 GB | Multilingual, multimodal |
| Qwen 3 4B | 4B | 2.5 GB | 4.6 GB | Chinese + English |
| Llama 3.2 3B | 3B | 1.9 GB | 3.4 GB | Fast, lightweight assistant |
Recommended pick: Qwen 3.5 4B — best quality at this size, runs at Q8 with plenty of headroom on 8GB.
Quick start:
ollama run qwen3.5:4b
Tier 2: Fits at Q4 (4-6GB, tight but functional)
These models fit but leave limited room for large contexts.
| Model | Params | VRAM (Q4) | Best For |
|---|---|---|---|
| Qwen 3 8B | 8B | 4.6 GB | Best 8B all-rounder |
| Qwen 3.5 9B | 9B | 5.1 GB | Updated 8B replacement |
| Gemma 3 12B | 12B | 6.7 GB | Tight fit, high quality |
| Mistral 7B | 7B | 4.2 GB | Proven, fast, well-supported |
| Llama 3.1 8B | 8B | 4.3 GB | Strong instruction following |
Recommended pick: Qwen 3 8B at Q4_K_M — 4.6GB leaves ~3.4GB for context. Great quality for the size.
ollama run qwen3:8b
Coding-Specific Models
| Model | Params | VRAM (Q4) | Notes |
|---|---|---|---|
| Qwen 3 Coder 8B | 8B | 4.6 GB | Best coding under 8GB |
| DeepSeek Coder V2 Lite 16B | 16B | — | MoE, 2.4B active — check fit |
Best Image Models for 8GB VRAM
| Model | VRAM (FP16) | VRAM (FP8) | Best For |
|---|---|---|---|
| Stable Diffusion 1.5 | 4-5 GB | — | Fastest, huge ecosystem |
| SDXL 1.0 | 7.5-8 GB | — | Best quality at FP16 |
| Flux.1 Schnell | 32+ GB (FP16) | ~7 GB (GGUF Q4) | Fastest Flux, needs quant |
| Pony Diffusion V6 XL | 7.4-7.8 GB | — | Anime/stylized art |
| AnimateDiff v1.5.3 | 6 GB | — | Short video clips |
Key insight: SDXL is the sweet spot for 8GB at FP16 full precision. For Flux, you need GGUF quantization — but then Flux.1 Schnell fits and produces incredible results.
Hardware Recommendations for 8GB
| GPU | Price | Bandwidth | Best Choice If... |
|---|---|---|---|
| RTX 4060 8GB | ~$300 | 272 GB/s | Budget, power-efficient |
| RTX 3070 8GB | ~$250 used | 448 GB/s | Speed priority (higher bandwidth) |
| Intel Arc B580 12GB | ~$250 | 456 GB/s | Actually 12GB — better value |
| RTX 3060 8GB | ~$200 used | 360 GB/s | Cheapest option |
Pro tip: The Intel Arc B580 is actually 12GB and costs the same as 8GB cards. If you are buying new, it is strictly better for AI workloads.
Upgrade Path
If 8GB feels limiting, here are the next steps:
- 12GB (RTX 4070, ~$500): Fits 14B models at Q4, SDXL with headroom
- 16GB (RTX 4070 Ti Super, ~$700): Comfortable 14B at Q6, Flux at FP8
- 24GB (RTX 4090, ~$1600): 30B models, Flux at FP16
Use our VRAM calculator to check exactly which models fit your hardware, or browse models by VRAM tier.