How much VRAM does DeepSeek R1 need?

It depends on the variant. DeepSeek R1 7B needs ~4GB at Q4. The 32B distill needs ~18GB at Q4. The full 671B model needs ~376GB at Q4, requiring multiple GPUs or a high-memory Mac with CPU offloading.

Can I run DeepSeek R1 on an RTX 4090?

Yes. The RTX 4090 (24GB) can run DeepSeek R1 7B at Q8 quality, the 14B distill at Q6, and the 32B distill at Q4 with some offloading. The full 671B model doesn't fit on a single 4090.

What is the best DeepSeek R1 variant for my hardware?

For 8GB VRAM: R1 7B. For 12-16GB: R1 14B. For 24GB: R1 32B (Q4). For 48GB+: R1 70B. The full 671B requires 192GB+ and is best suited for datacenter GPUs or high-memory Macs.

Is DeepSeek R1 good for coding?

DeepSeek R1 excels at reasoning tasks including coding, math, and logic. The distilled variants (7B, 14B, 32B, 70B) retain strong reasoning capabilities. For dedicated coding, also consider DeepSeek Coder V2.

Can I run the full DeepSeek R1 671B locally?

Yes, but you need significant hardware. At Q2-Q3 quantization it needs ~210-280GB. A Mac Studio M4 Ultra with 192GB unified memory can run it with heavy quantization and offloading, or you can use multiple datacenter GPUs.

March 18, 2026deepseek, vram, gpu-requirements, reasoning

DeepSeek R1 VRAM Requirements — 7B to 671B Distills at Q4/Q8 (GPU Guide)

DeepSeek R1 14B needs ~8GB at Q4, 32B needs ~18GB, full 671B needs ~376GB. Complete VRAM tables for every distill with GPU and quantization recommendations.

DeepSeek R1 is one of the most capable open-weight reasoning models available. Released by DeepSeek in early 2025, it brought frontier-level reasoning to the open-source community. But with variants ranging from 7B to 671B parameters, the hardware requirements vary dramatically.

This guide covers every DeepSeek R1 variant with exact VRAM requirements, hardware recommendations, and setup instructions.

DeepSeek R1 Model Family

DeepSeek R1 comes in two forms:

Full model (671B): A massive Mixture of Experts (MoE) model with 671B total parameters and ~37B active per token. This is the flagship with the best reasoning capabilities.
Distilled variants: Smaller dense models trained to replicate R1's reasoning ability. Available at 7B, 14B, 32B, and 70B sizes.

The distilled variants are not just smaller — they were trained using knowledge distillation from the full 671B model, which means they punch well above their weight class in reasoning tasks.

VRAM Requirements by Variant

Here's what each DeepSeek R1 variant needs at different quantization levels:

Variant	Type	Q4_K_M	Q5_K_M	Q6_K	Q8_0	F16
R1 7B	Dense	4.0 GB	4.8 GB	5.7 GB	7.4 GB	14.0 GB
R1 14B	Dense	7.8 GB	9.7 GB	11.3 GB	14.8 GB	28.0 GB
R1 32B	Dense	18.4 GB	22.6 GB	26.6 GB	34.8 GB	65.6 GB
R1 70B	Dense	39.5 GB	48.7 GB	57.2 GB	74.8 GB	141 GB
R1 671B	MoE	375.8 GB	463 GB	543 GB	711 GB	1,342 GB

Note: These are model weight sizes. Add ~1-2 GB for KV cache and runtime overhead at default context lengths.

Hardware Recommendations

DeepSeek R1 7B — Entry-Level Reasoning

The 7B distill is the easiest way to experience R1's reasoning style. It fits on virtually any modern GPU.

Recommended hardware:

RTX 4060 8GB — runs at Q8 with room for context
RTX 4070 12GB — comfortable at Q8, excellent performance
Any Mac with 16GB+ unified memory

Quick start:

ollama run deepseek-r1:7b

Check compatibility: R1 7B on RTX 4060 | R1 7B on RTX 4070

DeepSeek R1 14B — Best Quality Under 16GB

The 14B distill offers significantly better reasoning than 7B while staying accessible to mainstream GPUs.

Recommended hardware:

RTX 4070 12GB — fits at Q4, good for most tasks
RTX 4070 Ti Super 16GB — fits at Q6+, better quality
Mac M4 Pro 24GB — comfortable at Q5+

Quick start:

ollama run deepseek-r1:14b

DeepSeek R1 32B — Sweet Spot for Serious Users

The 32B variant hits a remarkable quality level — close to the full 671B on many reasoning benchmarks. This is the most popular R1 variant for users with high-end consumer hardware.

Recommended hardware:

RTX 4090 24GB — fits at Q4 with tight margins
RTX 5090 32GB — comfortable at Q4, room for context
Mac M4 Max 36GB — good unified memory fit
Mac M4 Max 64GB — runs at Q6+ easily

Quick start:

ollama run deepseek-r1:32b

Check compatibility: R1 32B on RTX 4090

DeepSeek R1 70B — Near-Frontier Reasoning

The 70B distill approaches the full model's capability on most benchmarks. Requires substantial hardware.

Recommended hardware:

Mac M4 Max 64GB — fits at Q4 with ~46GB usable
A100 80GB — comfortable at Q4-Q5
RTX 4090 with aggressive offloading (slower but works)

Quick start:

ollama run deepseek-r1:70b

DeepSeek R1 671B — The Full Model

The complete 671B MoE model delivers the best reasoning performance. Its Mixture of Experts architecture means only ~37B parameters are active per token, but all 671B must fit in memory.

Recommended hardware:

A100 80GB × 4-8 GPUs (tensor parallel)
H100 80GB × 4-8 GPUs
MI300X 192GB × 2-4 GPUs
Mac M4 Ultra 192GB — fits at Q2-Q3 with offloading (slow but works)

At Q2_K quantization (~210 GB), a Mac Studio with M4 Ultra and 192GB unified memory can technically load this model, though generation speed will be limited by memory bandwidth.

Choosing the Right Quantization

DeepSeek R1 is a reasoning model, which means it's more sensitive to quantization than chat models. The chain-of-thought reasoning process benefits from higher precision.

Our recommendations for R1 variants:

If VRAM allows → Q6_K (preserves reasoning quality)
Tight fit → Q5_K_M (good balance, minimal reasoning degradation)
Need to squeeze → Q4_K_M (acceptable, some complex reasoning may degrade)
Avoid Q3 and below for serious reasoning tasks

For more details, read our quantization guide.

DeepSeek R1 vs Other Reasoning Models

How does R1 compare to alternatives?

Model	Params	VRAM (Q4)	Reasoning Quality	Best For
DeepSeek R1 7B	7B	4 GB	Good	Entry-level reasoning
QwQ 32B	32B	18 GB	Very good	Similar hardware to R1 32B
DeepSeek R1 32B	32B	18 GB	Excellent	Math, coding, logic
DeepSeek R1 70B	70B	40 GB	Near-frontier	Advanced reasoning
DeepSeek R1 671B	671B	376 GB	Frontier	Best open reasoning model

Performance Expectations

Decode speed depends heavily on your hardware's memory bandwidth:

Hardware	R1 7B	R1 14B	R1 32B
RTX 4060 8GB	~45 tok/s	—	—
RTX 4070 12GB	~55 tok/s	~30 tok/s	—
RTX 4090 24GB	~80 tok/s	~50 tok/s	~25 tok/s
Mac M4 Pro 24GB	~35 tok/s	~20 tok/s	~12 tok/s
Mac M4 Max 64GB	~40 tok/s	~25 tok/s	~18 tok/s

Approximate values with Q4_K_M quantization. Actual performance varies by runtime and configuration.

Getting Started

Check compatibility: Use our VRAM calculator to see which R1 variant fits your hardware
Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
Run your variant: ollama run deepseek-r1:7b (or :14b, :32b, :70b)
Test reasoning: Try math problems, coding challenges, or logical puzzles

For a complete setup walkthrough, read our getting started guide.

DeepSeek R1 VRAM Requirements — 7B to 671B Distills at Q4/Q8 (GPU Guide)

DeepSeek R1 Model Family

VRAM Requirements by Variant

Hardware Recommendations

DeepSeek R1 7B — Entry-Level Reasoning

DeepSeek R1 14B — Best Quality Under 16GB

DeepSeek R1 32B — Sweet Spot for Serious Users

DeepSeek R1 70B — Near-Frontier Reasoning

DeepSeek R1 671B — The Full Model

Choosing the Right Quantization

DeepSeek R1 vs Other Reasoning Models

Performance Expectations

Getting Started

Next Steps

Frequently Asked Questions