Will It Run AI
deepseek, vram, gpu-requirements, reasoning

DeepSeek R1 VRAM Requirements — 7B to 671B Distills at Q4/Q8 (GPU Guide)

DeepSeek R1 14B needs ~8GB at Q4, 32B needs ~18GB, full 671B needs ~376GB. Complete VRAM tables for every distill with GPU and quantization recommendations.

DeepSeek R1 is one of the most capable open-weight reasoning models available. Released by DeepSeek in early 2025, it brought frontier-level reasoning to the open-source community. But with variants ranging from 7B to 671B parameters, the hardware requirements vary dramatically.

This guide covers every DeepSeek R1 variant with exact VRAM requirements, hardware recommendations, and setup instructions.

DeepSeek R1 Model Family

DeepSeek R1 comes in two forms:

  • Full model (671B): A massive Mixture of Experts (MoE) model with 671B total parameters and ~37B active per token. This is the flagship with the best reasoning capabilities.
  • Distilled variants: Smaller dense models trained to replicate R1's reasoning ability. Available at 7B, 14B, 32B, and 70B sizes.

The distilled variants are not just smaller — they were trained using knowledge distillation from the full 671B model, which means they punch well above their weight class in reasoning tasks.

VRAM Requirements by Variant

Here's what each DeepSeek R1 variant needs at different quantization levels:

VariantTypeQ4_K_MQ5_K_MQ6_KQ8_0F16
R1 7BDense4.0 GB4.8 GB5.7 GB7.4 GB14.0 GB
R1 14BDense7.8 GB9.7 GB11.3 GB14.8 GB28.0 GB
R1 32BDense18.4 GB22.6 GB26.6 GB34.8 GB65.6 GB
R1 70BDense39.5 GB48.7 GB57.2 GB74.8 GB141 GB
R1 671BMoE375.8 GB463 GB543 GB711 GB1,342 GB

Note: These are model weight sizes. Add ~1-2 GB for KV cache and runtime overhead at default context lengths.

Hardware Recommendations

DeepSeek R1 7B — Entry-Level Reasoning

The 7B distill is the easiest way to experience R1's reasoning style. It fits on virtually any modern GPU.

Recommended hardware:

  • RTX 4060 8GB — runs at Q8 with room for context
  • RTX 4070 12GB — comfortable at Q8, excellent performance
  • Any Mac with 16GB+ unified memory

Quick start:

ollama run deepseek-r1:7b

Check compatibility: R1 7B on RTX 4060 | R1 7B on RTX 4070

DeepSeek R1 14B — Best Quality Under 16GB

The 14B distill offers significantly better reasoning than 7B while staying accessible to mainstream GPUs.

Recommended hardware:

Quick start:

ollama run deepseek-r1:14b

DeepSeek R1 32B — Sweet Spot for Serious Users

The 32B variant hits a remarkable quality level — close to the full 671B on many reasoning benchmarks. This is the most popular R1 variant for users with high-end consumer hardware.

Recommended hardware:

Quick start:

ollama run deepseek-r1:32b

Check compatibility: R1 32B on RTX 4090

DeepSeek R1 70B — Near-Frontier Reasoning

The 70B distill approaches the full model's capability on most benchmarks. Requires substantial hardware.

Recommended hardware:

  • Mac M4 Max 64GB — fits at Q4 with ~46GB usable
  • A100 80GB — comfortable at Q4-Q5
  • RTX 4090 with aggressive offloading (slower but works)

Quick start:

ollama run deepseek-r1:70b

DeepSeek R1 671B — The Full Model

The complete 671B MoE model delivers the best reasoning performance. Its Mixture of Experts architecture means only ~37B parameters are active per token, but all 671B must fit in memory.

Recommended hardware:

At Q2_K quantization (~210 GB), a Mac Studio with M4 Ultra and 192GB unified memory can technically load this model, though generation speed will be limited by memory bandwidth.

Choosing the Right Quantization

DeepSeek R1 is a reasoning model, which means it's more sensitive to quantization than chat models. The chain-of-thought reasoning process benefits from higher precision.

Our recommendations for R1 variants:

  • If VRAM allows → Q6_K (preserves reasoning quality)
  • Tight fit → Q5_K_M (good balance, minimal reasoning degradation)
  • Need to squeeze → Q4_K_M (acceptable, some complex reasoning may degrade)
  • Avoid Q3 and below for serious reasoning tasks

For more details, read our quantization guide.

DeepSeek R1 vs Other Reasoning Models

How does R1 compare to alternatives?

ModelParamsVRAM (Q4)Reasoning QualityBest For
DeepSeek R1 7B7B4 GBGoodEntry-level reasoning
QwQ 32B32B18 GBVery goodSimilar hardware to R1 32B
DeepSeek R1 32B32B18 GBExcellentMath, coding, logic
DeepSeek R1 70B70B40 GBNear-frontierAdvanced reasoning
DeepSeek R1 671B671B376 GBFrontierBest open reasoning model

Performance Expectations

Decode speed depends heavily on your hardware's memory bandwidth:

HardwareR1 7BR1 14BR1 32B
RTX 4060 8GB~45 tok/s
RTX 4070 12GB~55 tok/s~30 tok/s
RTX 4090 24GB~80 tok/s~50 tok/s~25 tok/s
Mac M4 Pro 24GB~35 tok/s~20 tok/s~12 tok/s
Mac M4 Max 64GB~40 tok/s~25 tok/s~18 tok/s

Approximate values with Q4_K_M quantization. Actual performance varies by runtime and configuration.

Getting Started

  1. Check compatibility: Use our VRAM calculator to see which R1 variant fits your hardware
  2. Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
  3. Run your variant: ollama run deepseek-r1:7b (or :14b, :32b, :70b)
  4. Test reasoning: Try math problems, coding challenges, or logical puzzles

For a complete setup walkthrough, read our getting started guide.

Next Steps

Frequently Asked Questions

How much VRAM does DeepSeek R1 need?

It depends on the variant. DeepSeek R1 7B needs ~4GB at Q4. The 32B distill needs ~18GB at Q4. The full 671B model needs ~376GB at Q4, requiring multiple GPUs or a high-memory Mac with CPU offloading.

Can I run DeepSeek R1 on an RTX 4090?

Yes. The RTX 4090 (24GB) can run DeepSeek R1 7B at Q8 quality, the 14B distill at Q6, and the 32B distill at Q4 with some offloading. The full 671B model doesn't fit on a single 4090.

What is the best DeepSeek R1 variant for my hardware?

For 8GB VRAM: R1 7B. For 12-16GB: R1 14B. For 24GB: R1 32B (Q4). For 48GB+: R1 70B. The full 671B requires 192GB+ and is best suited for datacenter GPUs or high-memory Macs.

Is DeepSeek R1 good for coding?

DeepSeek R1 excels at reasoning tasks including coding, math, and logic. The distilled variants (7B, 14B, 32B, 70B) retain strong reasoning capabilities. For dedicated coding, also consider DeepSeek Coder V2.

Can I run the full DeepSeek R1 671B locally?

Yes, but you need significant hardware. At Q2-Q3 quantization it needs ~210-280GB. A Mac Studio M4 Ultra with 192GB unified memory can run it with heavy quantization and offloading, or you can use multiple datacenter GPUs.