How much VRAM does DeepSeek R1 need?

DeepSeek R1 7B needs 4GB at Q4 and 14GB at FP16. The 14B needs 8GB at Q4 and 28GB at FP16. The 32B needs 18GB at Q4 and 66GB at FP16. The 70B needs 40GB at Q4. The full 671B needs 376GB at Q4.

Can I run DeepSeek R1 on 8GB VRAM?

Yes. DeepSeek R1 7B fits on 8GB VRAM at Q4, Q6, or Q8 quantization. The 14B variant also fits at Q4 with tight memory. For 8GB cards, the 7B distill is the recommended choice.

What GPU do I need for DeepSeek R1 32B?

An RTX 4090 (24GB) runs DeepSeek R1 32B at Q4 with room for context. An RTX 3090 (24GB) also works. For Q8 quality you need 48GB or more, such as the RTX A6000 or a 64GB Mac.

Can I run DeepSeek R1 671B at home?

The full 671B requires 376GB at Q4 minimum. Consumer options include an Apple M3/M4 Ultra with 192GB (at Q2-Q3 with offloading) or 4x RTX 3090/4090 with CPU offloading. It is technically possible but very slow on consumer hardware.

What is the difference between DeepSeek R1 and the distilled variants?

The full DeepSeek R1 is a 671B Mixture of Experts model with ~37B active parameters per token. The distilled variants (7B, 14B, 32B, 70B) are dense models trained to replicate R1's reasoning behavior. They are much smaller but retain strong reasoning capabilities.

March 28, 2026deepseek, vram, gpu, reasoning

DeepSeek R1 VRAM Requirements — 7B to 671B Hardware Guide

Exact VRAM requirements for every DeepSeek R1 variant at Q4, Q8, and FP16 quantization. Covers the 7B, 14B, 32B, 70B distills and the full 671B MoE model with GPU recommendations for each size.

DeepSeek R1 is one of the most capable open-weight reasoning models available. With its chain-of-thought approach to math, coding, and logic problems, it competes with frontier closed-source models. But the family spans from a compact 7B distill to a massive 671B Mixture of Experts model — and the VRAM requirements are wildly different across that range.

This guide gives you the exact numbers for every variant at every common quantization level, plus specific GPU recommendations.

DeepSeek R1 Variants

The DeepSeek R1 family has two types of models:

Distilled variants (7B, 14B, 32B, 70B): Dense models trained via knowledge distillation from the full R1. These are based on Llama and Qwen architectures. They are practical for consumer hardware and retain strong reasoning ability.
Full R1 (671B): A Mixture of Experts model with 671B total parameters but only ~37B active per token. This is the flagship model with the strongest reasoning, but it requires datacenter-class hardware or creative multi-GPU setups.

The distilled variants are not just "smaller R1" — they were specifically trained to mimic R1's reasoning patterns, making them some of the best reasoning models at their respective sizes.

Complete VRAM Requirements

Here are the exact memory requirements for model weights at each quantization level. Add 1-2GB for KV cache and runtime overhead.

Variant	Parameters	Q4_K_M	Q8_0	FP16
R1 7B	7.6B	4.0 GB	7.4 GB	14.0 GB
R1 14B	14.8B	7.8 GB	14.8 GB	28.0 GB
R1 32B	32.8B	18.4 GB	34.8 GB	65.6 GB
R1 70B	70.6B	39.5 GB	74.8 GB	141 GB
R1 671B	671B (MoE)	375.8 GB	711 GB	1,342 GB

What the quantization levels mean:

Q4_K_M — 4-bit quantization. Best balance of size and quality. Minor quality loss compared to FP16.
Q8_0 — 8-bit quantization. Near-lossless. Recommended when you have the VRAM for it.
FP16 — Full 16-bit precision. No quality loss. Requires 2x the memory of Q8.

For a deeper explanation, see our quantization guide.

Best GPU for Each Variant

DeepSeek R1 7B — Any Modern GPU

The 7B distill is the easiest entry point. At Q4, it needs just 4GB plus overhead.

GPU	Best Quant	Fit
RTX 4060 8GB	Q8_0	Comfortable, room for 8K context
RTX 3060 12GB	Q8_0	Excellent fit with large context
RTX 4070 12GB	Q8_0	Fast inference, great experience

ollama run deepseek-r1:7b

DeepSeek R1 14B — Mid-Range GPUs

The 14B distill is based on Qwen 2.5 and offers a meaningful step up in reasoning quality.

GPU	Best Quant	Fit
RTX 4060 Ti 16GB	Q8_0	Good fit, moderate context
RTX 4070 Ti Super 16GB	Q8_0	Fast and comfortable
RTX 4090 24GB	Q8_0	Plenty of headroom

ollama run deepseek-r1:14b

DeepSeek R1 32B — The Consumer Sweet Spot

The 32B distill is where R1's reasoning really shines. It is based on Qwen 2.5 32B and is widely considered the best reasoning model you can run on a single consumer GPU.

GPU	Best Quant	Fit
RTX 4090 24GB	Q4_K_M	Fits at Q4 with context up to 8K
RTX 5090 32GB	Q6_K	Higher quality quant with good headroom
RTX A6000 48GB	Q8_0	Near-lossless, professional card
M4 Max 64GB	Q8_0	Excellent fit on Mac

ollama run deepseek-r1:32b

DeepSeek R1 70B — High-End Hardware Required

At Q4, the 70B needs about 40GB. No single consumer GPU can fit it natively.

Hardware	Best Quant	Notes
RTX A6000 48GB	Q4_K_M	Fits with moderate context
2x RTX 4090 (48GB total)	Q4_K_M	Split across two GPUs via llama.cpp
M4 Max 64GB	Q4_K_M	Good fit, 8-12 tok/s
M4 Max 128GB	Q8_0	Excellent quality, comfortable fit

For multi-GPU setups, see our multi-GPU inference guide.

DeepSeek R1 671B — The Full Model

The full 671B is a Mixture of Experts model. Despite having 671B total parameters, only ~37B are active per token, which means it can run faster than you might expect — if you can fit it in memory.

Minimum requirements: 376GB at Q4_K_M.

Hardware	Quant	Estimated Speed
4x A100 80GB (320GB)	Q4 with offloading	10-15 tok/s
8x A100 80GB (640GB)	Q8_0	15-25 tok/s
M3/M4 Ultra 192GB	Q2-Q3 with offloading	2-4 tok/s
4x RTX 4090 + CPU offload	Q2-Q3	1-3 tok/s

Running the full 671B on consumer hardware is technically possible but extremely slow. The distilled variants are the practical choice for most users. The 32B distill captures a surprising amount of R1's reasoning capability at a fraction of the hardware cost.

Distilled vs Full: Which Should You Run?

For most users, the answer is simple: run the largest distilled variant that fits your GPU.

8GB VRAM: R1 7B at Q4 or Q8
12-16GB VRAM: R1 14B at Q4-Q8
24GB VRAM: R1 32B at Q4 — this is the sweet spot
48GB+ VRAM: R1 70B at Q4-Q6
Multiple datacenter GPUs: Full R1 671B

The 32B distill on an RTX 4090 is probably the best reasoning experience available on consumer hardware. It handles complex math, multi-step logic, and coding problems with impressive accuracy.

Check Your Hardware

Use our tools to see exactly how DeepSeek R1 fits on your specific setup:

Can I run DeepSeek R1 on an RTX 4090? — Detailed fit analysis
Hardware Detection — Auto-detect your GPU and get personalized recommendations
Hardware Calculator — Check any model against any GPU
Browse DeepSeek Models — See all DeepSeek variants with fit estimates
Best models for 24GB VRAM — See what fits alongside DeepSeek R1

Popular Hardware Checks

DeepSeek R1 32B on RTX 5090 — Best consumer GPU for R1 32B
DeepSeek R1 14B on RTX 4070 — Mid-range sweet spot
DeepSeek R1 70B on M4 Max 64GB — Mac option for the 70B

DeepSeek R1 proved that open-weight models can match frontier reasoning. With the distilled variants, that capability is accessible on hardware most enthusiasts already own.