Will It Run AI
deepseek, vram, gpu, reasoning

DeepSeek R1 VRAM Requirements — 7B to 671B Hardware Guide

Exact VRAM requirements for every DeepSeek R1 variant at Q4, Q8, and FP16 quantization. Covers the 7B, 14B, 32B, 70B distills and the full 671B MoE model with GPU recommendations for each size.

DeepSeek R1 is one of the most capable open-weight reasoning models available. With its chain-of-thought approach to math, coding, and logic problems, it competes with frontier closed-source models. But the family spans from a compact 7B distill to a massive 671B Mixture of Experts model — and the VRAM requirements are wildly different across that range.

This guide gives you the exact numbers for every variant at every common quantization level, plus specific GPU recommendations.


DeepSeek R1 Variants

The DeepSeek R1 family has two types of models:

  • Distilled variants (7B, 14B, 32B, 70B): Dense models trained via knowledge distillation from the full R1. These are based on Llama and Qwen architectures. They are practical for consumer hardware and retain strong reasoning ability.
  • Full R1 (671B): A Mixture of Experts model with 671B total parameters but only ~37B active per token. This is the flagship model with the strongest reasoning, but it requires datacenter-class hardware or creative multi-GPU setups.

The distilled variants are not just "smaller R1" — they were specifically trained to mimic R1's reasoning patterns, making them some of the best reasoning models at their respective sizes.


Complete VRAM Requirements

Here are the exact memory requirements for model weights at each quantization level. Add 1-2GB for KV cache and runtime overhead.

VariantParametersQ4_K_MQ8_0FP16
R1 7B7.6B4.0 GB7.4 GB14.0 GB
R1 14B14.8B7.8 GB14.8 GB28.0 GB
R1 32B32.8B18.4 GB34.8 GB65.6 GB
R1 70B70.6B39.5 GB74.8 GB141 GB
R1 671B671B (MoE)375.8 GB711 GB1,342 GB

What the quantization levels mean:

  • Q4_K_M — 4-bit quantization. Best balance of size and quality. Minor quality loss compared to FP16.
  • Q8_0 — 8-bit quantization. Near-lossless. Recommended when you have the VRAM for it.
  • FP16 — Full 16-bit precision. No quality loss. Requires 2x the memory of Q8.

For a deeper explanation, see our quantization guide.


Best GPU for Each Variant

DeepSeek R1 7B — Any Modern GPU

The 7B distill is the easiest entry point. At Q4, it needs just 4GB plus overhead.

GPUBest QuantFit
RTX 4060 8GBQ8_0Comfortable, room for 8K context
RTX 3060 12GBQ8_0Excellent fit with large context
RTX 4070 12GBQ8_0Fast inference, great experience
ollama run deepseek-r1:7b

DeepSeek R1 14B — Mid-Range GPUs

The 14B distill is based on Qwen 2.5 and offers a meaningful step up in reasoning quality.

GPUBest QuantFit
RTX 4060 Ti 16GBQ8_0Good fit, moderate context
RTX 4070 Ti Super 16GBQ8_0Fast and comfortable
RTX 4090 24GBQ8_0Plenty of headroom
ollama run deepseek-r1:14b

DeepSeek R1 32B — The Consumer Sweet Spot

The 32B distill is where R1's reasoning really shines. It is based on Qwen 2.5 32B and is widely considered the best reasoning model you can run on a single consumer GPU.

GPUBest QuantFit
RTX 4090 24GBQ4_K_MFits at Q4 with context up to 8K
RTX 5090 32GBQ6_KHigher quality quant with good headroom
RTX A6000 48GBQ8_0Near-lossless, professional card
M4 Max 64GBQ8_0Excellent fit on Mac
ollama run deepseek-r1:32b

DeepSeek R1 70B — High-End Hardware Required

At Q4, the 70B needs about 40GB. No single consumer GPU can fit it natively.

HardwareBest QuantNotes
RTX A6000 48GBQ4_K_MFits with moderate context
2x RTX 4090 (48GB total)Q4_K_MSplit across two GPUs via llama.cpp
M4 Max 64GBQ4_K_MGood fit, 8-12 tok/s
M4 Max 128GBQ8_0Excellent quality, comfortable fit

For multi-GPU setups, see our multi-GPU inference guide.

DeepSeek R1 671B — The Full Model

The full 671B is a Mixture of Experts model. Despite having 671B total parameters, only ~37B are active per token, which means it can run faster than you might expect — if you can fit it in memory.

Minimum requirements: 376GB at Q4_K_M.

HardwareQuantEstimated Speed
4x A100 80GB (320GB)Q4 with offloading10-15 tok/s
8x A100 80GB (640GB)Q8_015-25 tok/s
M3/M4 Ultra 192GBQ2-Q3 with offloading2-4 tok/s
4x RTX 4090 + CPU offloadQ2-Q31-3 tok/s

Running the full 671B on consumer hardware is technically possible but extremely slow. The distilled variants are the practical choice for most users. The 32B distill captures a surprising amount of R1's reasoning capability at a fraction of the hardware cost.


Distilled vs Full: Which Should You Run?

For most users, the answer is simple: run the largest distilled variant that fits your GPU.

  • 8GB VRAM: R1 7B at Q4 or Q8
  • 12-16GB VRAM: R1 14B at Q4-Q8
  • 24GB VRAM: R1 32B at Q4 — this is the sweet spot
  • 48GB+ VRAM: R1 70B at Q4-Q6
  • Multiple datacenter GPUs: Full R1 671B

The 32B distill on an RTX 4090 is probably the best reasoning experience available on consumer hardware. It handles complex math, multi-step logic, and coding problems with impressive accuracy.


Check Your Hardware

Use our tools to see exactly how DeepSeek R1 fits on your specific setup:

Popular Hardware Checks

DeepSeek R1 proved that open-weight models can match frontier reasoning. With the distilled variants, that capability is accessible on hardware most enthusiasts already own.

Frequently Asked Questions

How much VRAM does DeepSeek R1 need?

DeepSeek R1 7B needs 4GB at Q4 and 14GB at FP16. The 14B needs 8GB at Q4 and 28GB at FP16. The 32B needs 18GB at Q4 and 66GB at FP16. The 70B needs 40GB at Q4. The full 671B needs 376GB at Q4.

Can I run DeepSeek R1 on 8GB VRAM?

Yes. DeepSeek R1 7B fits on 8GB VRAM at Q4, Q6, or Q8 quantization. The 14B variant also fits at Q4 with tight memory. For 8GB cards, the 7B distill is the recommended choice.

What GPU do I need for DeepSeek R1 32B?

An RTX 4090 (24GB) runs DeepSeek R1 32B at Q4 with room for context. An RTX 3090 (24GB) also works. For Q8 quality you need 48GB or more, such as the RTX A6000 or a 64GB Mac.

Can I run DeepSeek R1 671B at home?

The full 671B requires 376GB at Q4 minimum. Consumer options include an Apple M3/M4 Ultra with 192GB (at Q2-Q3 with offloading) or 4x RTX 3090/4090 with CPU offloading. It is technically possible but very slow on consumer hardware.

What is the difference between DeepSeek R1 and the distilled variants?

The full DeepSeek R1 is a 671B Mixture of Experts model with ~37B active parameters per token. The distilled variants (7B, 14B, 32B, 70B) are dense models trained to replicate R1's reasoning behavior. They are much smaller but retain strong reasoning capabilities.