DeepSeek R1 VRAM Requirements — 7B to 671B Distills at Q4/Q8 (GPU Guide)
DeepSeek R1 14B needs ~8GB at Q4, 32B needs ~18GB, full 671B needs ~376GB. Complete VRAM tables for every distill with GPU and quantization recommendations.
DeepSeek R1 is one of the most capable open-weight reasoning models available. Released by DeepSeek in early 2025, it brought frontier-level reasoning to the open-source community. But with variants ranging from 7B to 671B parameters, the hardware requirements vary dramatically.
This guide covers every DeepSeek R1 variant with exact VRAM requirements, hardware recommendations, and setup instructions.
DeepSeek R1 Model Family
DeepSeek R1 comes in two forms:
- Full model (671B): A massive Mixture of Experts (MoE) model with 671B total parameters and ~37B active per token. This is the flagship with the best reasoning capabilities.
- Distilled variants: Smaller dense models trained to replicate R1's reasoning ability. Available at 7B, 14B, 32B, and 70B sizes.
The distilled variants are not just smaller — they were trained using knowledge distillation from the full 671B model, which means they punch well above their weight class in reasoning tasks.
VRAM Requirements by Variant
Here's what each DeepSeek R1 variant needs at different quantization levels:
| Variant | Type | Q4_K_M | Q5_K_M | Q6_K | Q8_0 | F16 |
|---|---|---|---|---|---|---|
| R1 7B | Dense | 4.0 GB | 4.8 GB | 5.7 GB | 7.4 GB | 14.0 GB |
| R1 14B | Dense | 7.8 GB | 9.7 GB | 11.3 GB | 14.8 GB | 28.0 GB |
| R1 32B | Dense | 18.4 GB | 22.6 GB | 26.6 GB | 34.8 GB | 65.6 GB |
| R1 70B | Dense | 39.5 GB | 48.7 GB | 57.2 GB | 74.8 GB | 141 GB |
| R1 671B | MoE | 375.8 GB | 463 GB | 543 GB | 711 GB | 1,342 GB |
Note: These are model weight sizes. Add ~1-2 GB for KV cache and runtime overhead at default context lengths.
Hardware Recommendations
DeepSeek R1 7B — Entry-Level Reasoning
The 7B distill is the easiest way to experience R1's reasoning style. It fits on virtually any modern GPU.
Recommended hardware:
- RTX 4060 8GB — runs at Q8 with room for context
- RTX 4070 12GB — comfortable at Q8, excellent performance
- Any Mac with 16GB+ unified memory
Quick start:
ollama run deepseek-r1:7b
Check compatibility: R1 7B on RTX 4060 | R1 7B on RTX 4070
DeepSeek R1 14B — Best Quality Under 16GB
The 14B distill offers significantly better reasoning than 7B while staying accessible to mainstream GPUs.
Recommended hardware:
- RTX 4070 12GB — fits at Q4, good for most tasks
- RTX 4070 Ti Super 16GB — fits at Q6+, better quality
- Mac M4 Pro 24GB — comfortable at Q5+
Quick start:
ollama run deepseek-r1:14b
DeepSeek R1 32B — Sweet Spot for Serious Users
The 32B variant hits a remarkable quality level — close to the full 671B on many reasoning benchmarks. This is the most popular R1 variant for users with high-end consumer hardware.
Recommended hardware:
- RTX 4090 24GB — fits at Q4 with tight margins
- RTX 5090 32GB — comfortable at Q4, room for context
- Mac M4 Max 36GB — good unified memory fit
- Mac M4 Max 64GB — runs at Q6+ easily
Quick start:
ollama run deepseek-r1:32b
Check compatibility: R1 32B on RTX 4090
DeepSeek R1 70B — Near-Frontier Reasoning
The 70B distill approaches the full model's capability on most benchmarks. Requires substantial hardware.
Recommended hardware:
- Mac M4 Max 64GB — fits at Q4 with ~46GB usable
- A100 80GB — comfortable at Q4-Q5
- RTX 4090 with aggressive offloading (slower but works)
Quick start:
ollama run deepseek-r1:70b
DeepSeek R1 671B — The Full Model
The complete 671B MoE model delivers the best reasoning performance. Its Mixture of Experts architecture means only ~37B parameters are active per token, but all 671B must fit in memory.
Recommended hardware:
- A100 80GB × 4-8 GPUs (tensor parallel)
- H100 80GB × 4-8 GPUs
- MI300X 192GB × 2-4 GPUs
- Mac M4 Ultra 192GB — fits at Q2-Q3 with offloading (slow but works)
At Q2_K quantization (~210 GB), a Mac Studio with M4 Ultra and 192GB unified memory can technically load this model, though generation speed will be limited by memory bandwidth.
Choosing the Right Quantization
DeepSeek R1 is a reasoning model, which means it's more sensitive to quantization than chat models. The chain-of-thought reasoning process benefits from higher precision.
Our recommendations for R1 variants:
- If VRAM allows → Q6_K (preserves reasoning quality)
- Tight fit → Q5_K_M (good balance, minimal reasoning degradation)
- Need to squeeze → Q4_K_M (acceptable, some complex reasoning may degrade)
- Avoid Q3 and below for serious reasoning tasks
For more details, read our quantization guide.
DeepSeek R1 vs Other Reasoning Models
How does R1 compare to alternatives?
| Model | Params | VRAM (Q4) | Reasoning Quality | Best For |
|---|---|---|---|---|
| DeepSeek R1 7B | 7B | 4 GB | Good | Entry-level reasoning |
| QwQ 32B | 32B | 18 GB | Very good | Similar hardware to R1 32B |
| DeepSeek R1 32B | 32B | 18 GB | Excellent | Math, coding, logic |
| DeepSeek R1 70B | 70B | 40 GB | Near-frontier | Advanced reasoning |
| DeepSeek R1 671B | 671B | 376 GB | Frontier | Best open reasoning model |
Performance Expectations
Decode speed depends heavily on your hardware's memory bandwidth:
| Hardware | R1 7B | R1 14B | R1 32B |
|---|---|---|---|
| RTX 4060 8GB | ~45 tok/s | — | — |
| RTX 4070 12GB | ~55 tok/s | ~30 tok/s | — |
| RTX 4090 24GB | ~80 tok/s | ~50 tok/s | ~25 tok/s |
| Mac M4 Pro 24GB | ~35 tok/s | ~20 tok/s | ~12 tok/s |
| Mac M4 Max 64GB | ~40 tok/s | ~25 tok/s | ~18 tok/s |
Approximate values with Q4_K_M quantization. Actual performance varies by runtime and configuration.
Getting Started
- Check compatibility: Use our VRAM calculator to see which R1 variant fits your hardware
- Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh - Run your variant:
ollama run deepseek-r1:7b(or:14b,:32b,:70b) - Test reasoning: Try math problems, coding challenges, or logical puzzles
For a complete setup walkthrough, read our getting started guide.