DeepSeek R1 VRAM Requirements — 7B to 671B Hardware Guide
Exact VRAM requirements for every DeepSeek R1 variant at Q4, Q8, and FP16 quantization. Covers the 7B, 14B, 32B, 70B distills and the full 671B MoE model with GPU recommendations for each size.
DeepSeek R1 is one of the most capable open-weight reasoning models available. With its chain-of-thought approach to math, coding, and logic problems, it competes with frontier closed-source models. But the family spans from a compact 7B distill to a massive 671B Mixture of Experts model — and the VRAM requirements are wildly different across that range.
This guide gives you the exact numbers for every variant at every common quantization level, plus specific GPU recommendations.
DeepSeek R1 Variants
The DeepSeek R1 family has two types of models:
- Distilled variants (7B, 14B, 32B, 70B): Dense models trained via knowledge distillation from the full R1. These are based on Llama and Qwen architectures. They are practical for consumer hardware and retain strong reasoning ability.
- Full R1 (671B): A Mixture of Experts model with 671B total parameters but only ~37B active per token. This is the flagship model with the strongest reasoning, but it requires datacenter-class hardware or creative multi-GPU setups.
The distilled variants are not just "smaller R1" — they were specifically trained to mimic R1's reasoning patterns, making them some of the best reasoning models at their respective sizes.
Complete VRAM Requirements
Here are the exact memory requirements for model weights at each quantization level. Add 1-2GB for KV cache and runtime overhead.
| Variant | Parameters | Q4_K_M | Q8_0 | FP16 |
|---|---|---|---|---|
| R1 7B | 7.6B | 4.0 GB | 7.4 GB | 14.0 GB |
| R1 14B | 14.8B | 7.8 GB | 14.8 GB | 28.0 GB |
| R1 32B | 32.8B | 18.4 GB | 34.8 GB | 65.6 GB |
| R1 70B | 70.6B | 39.5 GB | 74.8 GB | 141 GB |
| R1 671B | 671B (MoE) | 375.8 GB | 711 GB | 1,342 GB |
What the quantization levels mean:
- Q4_K_M — 4-bit quantization. Best balance of size and quality. Minor quality loss compared to FP16.
- Q8_0 — 8-bit quantization. Near-lossless. Recommended when you have the VRAM for it.
- FP16 — Full 16-bit precision. No quality loss. Requires 2x the memory of Q8.
For a deeper explanation, see our quantization guide.
Best GPU for Each Variant
DeepSeek R1 7B — Any Modern GPU
The 7B distill is the easiest entry point. At Q4, it needs just 4GB plus overhead.
| GPU | Best Quant | Fit |
|---|---|---|
| RTX 4060 8GB | Q8_0 | Comfortable, room for 8K context |
| RTX 3060 12GB | Q8_0 | Excellent fit with large context |
| RTX 4070 12GB | Q8_0 | Fast inference, great experience |
ollama run deepseek-r1:7b
DeepSeek R1 14B — Mid-Range GPUs
The 14B distill is based on Qwen 2.5 and offers a meaningful step up in reasoning quality.
| GPU | Best Quant | Fit |
|---|---|---|
| RTX 4060 Ti 16GB | Q8_0 | Good fit, moderate context |
| RTX 4070 Ti Super 16GB | Q8_0 | Fast and comfortable |
| RTX 4090 24GB | Q8_0 | Plenty of headroom |
ollama run deepseek-r1:14b
DeepSeek R1 32B — The Consumer Sweet Spot
The 32B distill is where R1's reasoning really shines. It is based on Qwen 2.5 32B and is widely considered the best reasoning model you can run on a single consumer GPU.
| GPU | Best Quant | Fit |
|---|---|---|
| RTX 4090 24GB | Q4_K_M | Fits at Q4 with context up to 8K |
| RTX 5090 32GB | Q6_K | Higher quality quant with good headroom |
| RTX A6000 48GB | Q8_0 | Near-lossless, professional card |
| M4 Max 64GB | Q8_0 | Excellent fit on Mac |
ollama run deepseek-r1:32b
DeepSeek R1 70B — High-End Hardware Required
At Q4, the 70B needs about 40GB. No single consumer GPU can fit it natively.
| Hardware | Best Quant | Notes |
|---|---|---|
| RTX A6000 48GB | Q4_K_M | Fits with moderate context |
| 2x RTX 4090 (48GB total) | Q4_K_M | Split across two GPUs via llama.cpp |
| M4 Max 64GB | Q4_K_M | Good fit, 8-12 tok/s |
| M4 Max 128GB | Q8_0 | Excellent quality, comfortable fit |
For multi-GPU setups, see our multi-GPU inference guide.
DeepSeek R1 671B — The Full Model
The full 671B is a Mixture of Experts model. Despite having 671B total parameters, only ~37B are active per token, which means it can run faster than you might expect — if you can fit it in memory.
Minimum requirements: 376GB at Q4_K_M.
| Hardware | Quant | Estimated Speed |
|---|---|---|
| 4x A100 80GB (320GB) | Q4 with offloading | 10-15 tok/s |
| 8x A100 80GB (640GB) | Q8_0 | 15-25 tok/s |
| M3/M4 Ultra 192GB | Q2-Q3 with offloading | 2-4 tok/s |
| 4x RTX 4090 + CPU offload | Q2-Q3 | 1-3 tok/s |
Running the full 671B on consumer hardware is technically possible but extremely slow. The distilled variants are the practical choice for most users. The 32B distill captures a surprising amount of R1's reasoning capability at a fraction of the hardware cost.
Distilled vs Full: Which Should You Run?
For most users, the answer is simple: run the largest distilled variant that fits your GPU.
- 8GB VRAM: R1 7B at Q4 or Q8
- 12-16GB VRAM: R1 14B at Q4-Q8
- 24GB VRAM: R1 32B at Q4 — this is the sweet spot
- 48GB+ VRAM: R1 70B at Q4-Q6
- Multiple datacenter GPUs: Full R1 671B
The 32B distill on an RTX 4090 is probably the best reasoning experience available on consumer hardware. It handles complex math, multi-step logic, and coding problems with impressive accuracy.
Check Your Hardware
Use our tools to see exactly how DeepSeek R1 fits on your specific setup:
- Can I run DeepSeek R1 on an RTX 4090? — Detailed fit analysis
- Hardware Detection — Auto-detect your GPU and get personalized recommendations
- Hardware Calculator — Check any model against any GPU
- Browse DeepSeek Models — See all DeepSeek variants with fit estimates
- Best models for 24GB VRAM — See what fits alongside DeepSeek R1
Popular Hardware Checks
- DeepSeek R1 32B on RTX 5090 — Best consumer GPU for R1 32B
- DeepSeek R1 14B on RTX 4070 — Mid-range sweet spot
- DeepSeek R1 70B on M4 Max 64GB — Mac option for the 70B
DeepSeek R1 proved that open-weight models can match frontier reasoning. With the distilled variants, that capability is accessible on hardware most enthusiasts already own.