How much VRAM does Qwen 3 need?

It varies by variant. Qwen 3 8B needs ~4.6GB at Q4. Qwen 3 14B needs ~8.3GB at Q4. The MoE variant Qwen 3 30B-A3B needs ~16.8GB at Q4. Qwen 3 235B-A22B needs ~132GB at Q4.

Can I run Qwen 3 on an RTX 4090?

Yes. The RTX 4090 (24GB) comfortably runs Qwen 3 8B at Q8, Qwen 3 14B at Q6, and Qwen 3 30B-A3B (MoE) at Q4.

What is Qwen 3 30B-A3B?

Qwen 3 30B-A3B is a Mixture of Experts model with 30B total parameters but only 3B active per token. It delivers quality comparable to much larger dense models while fitting in ~17GB at Q4 — making it ideal for RTX 4090 and similar hardware.

Where can I find Qwen 3.5 or Qwen 3.6 VRAM numbers?

See the dedicated pages: /blog/qwen-35-vram-requirements-complete-guide for the Qwen 3.5 family, /blog/qwen-3-6-vram-requirements for the Qwen 3.6 35B-A3B MoE, and /blog/qwen-3-6-27b-vram-requirements for the Qwen 3.6 dense 27B.

Is Qwen 3 good for coding?

Qwen 3 Coder variants are specifically optimized for coding. Qwen 3 Coder 30B-A3B is excellent for programming tasks. The base Qwen 3 models also handle coding well, especially the 14B and 30B-A3B variants.

What is the difference between Qwen 3, Qwen 3.5, and Qwen 3.6?

Qwen 3 is the original 2025 family (0.6B–235B-A22B). Qwen 3.5 is a 2026 refresh with improved sizes and tuning. Qwen 3.6 (April 2026) is a newer generation with a 1M-token native context — it has its own VRAM math and dedicated guides at /blog/qwen-3-6-vram-requirements and /blog/qwen-3-6-27b-vram-requirements.

April 14, 2026Updated May 20, 2026qwen, alibaba, vram, gpu-requirements, qwen-3

Qwen 3 GPU Requirements — Original Family (0.6B–235B) VRAM Guide (2026)

VRAM tables for the original Qwen 3 family (0.6B to 235B-A22B), with GPU and Mac recommendations. For the newer Qwen 3.5 and Qwen 3.6 generations, see the dedicated pages linked below.

This page covers the original Qwen 3 family (released 2025) and the Qwen 3.5 refresh: dense and MoE models from 0.6B up to 235B-A22B. The newest Qwen 3.6 generation has its own dedicated guides — this page no longer covers 3.6 specs.

Running Qwen 3.6? It's a separate generation with its own VRAM math. Go straight to the dedicated guides: Qwen 3.6 27B (dense) → and Qwen 3.6 35B-A3B (MoE) →.

Skip to your size — Qwen 3 / 3.5 jump table

If you searched for…	Q4_K_M VRAM	Dedicated guide
qwen 3.5 9b	~5.7 GB	Qwen 3.5 9B VRAM
qwen 3.5 27b	~16.5 GB	Qwen 3.5 27B VRAM
qwen 3.5 35b-a3b	~19.6 GB	Qwen 3.5 35B-A3B VRAM
qwen 3.5 122b-a10b	~74 GB	Qwen 3.5 122B-A10B VRAM
qwen 3 8B / 14B / 30B-A3B / 32B / 235B-A22B (original)	see tables below	this page

Quick answers for the original Qwen 3

Qwen 3 8B: ~4.6 GB at Q4_K_M, ~8.5 GB at Q8_0
Qwen 3 14B: ~8.3 GB at Q4_K_M, ~15.7 GB at Q8_0
Qwen 3 30B-A3B: ~16.8 GB at Q4_K_M, ~31.6 GB at Q8_0
Qwen 3 32B: ~19.1 GB at Q4_K_M, ~36.1 GB at Q8_0
Qwen 3 235B-A22B: ~132 GB at Q4_K_M

Don't see your variant above? Try the VRAM Calculator — paste any model + GPU/Mac and get an exact fit verdict in seconds.

Qwen 3 is Alibaba's open-weight foundation family. It competes with Llama 4 and DeepSeek V3, offering a wide range of sizes — from compact 0.6B models that run on a phone to the flagship 235B MoE. The lineup combines dense models (efficient, predictable VRAM) with Mixture of Experts models (MoE, more capable per byte of VRAM).

Qwen 3 Model Family

Alibaba released Qwen 3 as a complete lineup covering every hardware tier:

Model	Type	Parameters	Active Params	Best For
Qwen 3 0.6B	Dense	0.6B	0.6B	Edge devices, always-on agents
Qwen 3 1.7B	Dense	1.7B	1.7B	Lightweight local assistants
Qwen 3 4B	Dense	4B	4B	Mid-range phones, low-VRAM desktops
Qwen 3 8B	Dense	8B	8B	Flagship small model, great all-rounder
Qwen 3 14B	Dense	14B	14B	Mid-range performance, strong reasoning
Qwen 3 30B-A3B	MoE	30B	3B	Best efficiency, MoE flagship
Qwen 3 32B	Dense	32B	32B	High-end dense, maximum dense quality
Qwen 3 235B-A22B	MoE	235B	22B	Flagship MoE, frontier-class quality
Qwen 3 Coder 8B	Dense	8B	8B	Coding-optimized small model
Qwen 3 Coder 14B	Dense	14B	14B	Coding mid-range
Qwen 3 Coder 30B-A3B	MoE	30B	3B	Best coding efficiency

The Coder variants share architecture with the base models but are fine-tuned specifically on programming tasks for better results on code generation, debugging, and technical documentation.

VRAM Requirements by Variant

Exact VRAM at different quantization levels for the original Qwen 3 family:

Variant	Q4_K_M	Q5_K_M	Q6_K	Q8_0	F16
Qwen 3 0.6B	0.5 GB	0.6 GB	0.7 GB	0.9 GB	1.3 GB
Qwen 3 1.7B	1.1 GB	1.3 GB	1.5 GB	1.9 GB	3.4 GB
Qwen 3 4B	2.5 GB	3.0 GB	3.6 GB	4.6 GB	8.0 GB
Qwen 3 8B	4.6 GB	5.6 GB	6.6 GB	8.5 GB	16.1 GB
Qwen 3 14B	8.3 GB	10.2 GB	12.0 GB	15.7 GB	28.0 GB
Qwen 3 30B-A3B	16.8 GB	20.6 GB	24.2 GB	31.6 GB	60.0 GB
Qwen 3 32B	19.1 GB	23.5 GB	27.6 GB	36.1 GB	64.4 GB
Qwen 3 235B-A22B	131.9 GB	162.2 GB	190.5 GB	249.0 GB	470.0 GB

Add ~1-2 GB for KV cache and runtime overhead at default context lengths.

Hardware Recommendations

Qwen 3 0.6B and 1.7B — Anywhere and Everywhere

These micro-models are designed for constrained environments. They fit in less than 2 GB of VRAM, making them viable on integrated graphics, older GPUs, or even CPU-only inference.

Recommended hardware:

Any GPU with 4GB+ VRAM (even GTX 1650 4GB)
Mac M-series with 8GB unified memory
Modern CPUs with 8GB+ RAM (llama.cpp CPU mode)

Quick start:

ollama run qwen3:0.6b
ollama run qwen3:1.7b

Qwen 3 4B — Respectable Quality, Minimal Hardware

The 4B model delivers surprisingly capable responses for its size. At Q4, it needs only ~2.5 GB — perfect for 4-6 GB GPUs or low-memory Macs.

Recommended hardware:

RTX 3050 6GB / RTX 4060 8GB — fits at Q8 with headroom
Mac M2/M3 with 8GB unified memory
Intel Arc A770 16GB — excellent efficiency

Quick start:

ollama run qwen3:4b

Qwen 3 8B — Best Small Model

The 8B is the sweet spot of the Qwen 3 dense lineup. It punches well above its weight class on instruction following, coding, and multilingual tasks — particularly strong on Chinese and Japanese.

Recommended hardware:

RTX 4060 8GB — fits at Q4_K_M with minimal headroom; Q6 possible with careful context limits
RTX 4070 12GB — comfortable at Q6, excellent performance-per-watt
RTX 4070 Ti Super 16GB — fits at Q8 with room for large context
Any Mac with 16GB+ unified memory

Quick start:

ollama run qwen3:8b

Check compatibility: Qwen 3 8B on RTX 4070 | Qwen 3 8B on RTX 4060

Qwen 3 14B — Best Under 16GB VRAM

The 14B hits a quality tier noticeably above the 8B, especially for complex reasoning and coding tasks. At Q4 it needs ~8.3 GB, making it accessible on most mainstream GPUs.

Recommended hardware:

RTX 4070 12GB — fits at Q4_K_M; tight but functional
RTX 4070 Ti Super 16GB — comfortable at Q5, strong throughput
RTX 4080 Super 16GB — excellent at Q6+
Mac M4 Pro 24GB — fits Q8 comfortably, unified memory advantage

Quick start:

ollama run qwen3:14b

Check compatibility: Qwen 3 14B on RTX 4070 Ti Super

Qwen 3 30B-A3B — The MoE Efficiency Champion

This is one of the most interesting models in the Qwen 3 family. The 30B-A3B is a Mixture of Experts model with 30B total parameters but only 3B active per token. That means inference is as fast as a 3B dense model while the quality rivals much larger dense models.

At Q4, the 30B-A3B needs ~17 GB — fitting comfortably on a 24GB GPU.

Recommended hardware:

RTX 4090 24GB — perfect fit at Q4, fast inference
RTX 5090 32GB — Q5+ with plenty of context headroom
Mac M4 Max 36GB — comfortable at Q5, excellent efficiency
Mac M4 Pro 24GB — fits at Q4 with good performance

Quick start:

ollama run qwen3:30b-a3b

Check compatibility: Qwen 3 30B-A3B on RTX 4090 | Qwen 3 30B-A3B on RTX 5090

Qwen 3 32B — Maximum Dense Quality

The dense 32B is the largest Qwen 3 model that doesn't use MoE. It delivers the highest quality dense inference in the family. At Q4 it needs ~19 GB — just over what a 24GB GPU can hold comfortably, so tight configurations will require some context length management.

Recommended hardware:

RTX 4090 24GB — fits at Q4 with minimal overhead; disable KV cache extensions
RTX 5090 32GB — comfortable at Q4, room for large context
Mac M4 Max 36GB — fits at Q5, excellent for long documents
Mac M4 Max 64GB — Q6+ runs smoothly with full context

Quick start:

ollama run qwen3:32b

Check compatibility: Qwen 3 32B on RTX 5090

Qwen 3 235B-A22B — Frontier-Class Performance

The flagship MoE model. With 235B total parameters and 22B active per token, this is the most capable model in the original Qwen 3 lineup and competes with frontier proprietary models on many benchmarks.

Recommended hardware:

H100 80GB × 2 GPUs — fits at Q4, excellent throughput
A100 80GB × 2-4 GPUs
MI300X 192GB — fits at Q4 on a single GPU
Mac M4 Ultra 192GB — fits at Q4 with memory to spare

Quick start:

ollama run qwen3:235b-a22b

Newer Qwen generations — dedicated guides

Qwen 3.5 (late 2025 refresh) adds new sizes (2B, 9B, 27B dense; 35B-A3B, 122B-A10B, 397B-A17B MoE) with improved tuning. Qwen 3.6 (April 2026) introduces the 1M-token native context and a flagship-class dense 27B. Because the VRAM math and hardware picks differ per variant, each has its own page:

Qwen 3.5 Complete Guide
Qwen 3.5 9B VRAM
Qwen 3.5 27B VRAM
Qwen 3.5 35B-A3B VRAM
Qwen 3.5 122B-A10B VRAM
Qwen 3.6 27B VRAM & Hardware Requirements — the dense coding flagship
Qwen 3.6 VRAM & Hardware Requirements (35B-A3B MoE)
Qwen 3.6 35B-A3B Release Date

Understanding Qwen 3 MoE Variants

Mixture of Experts (MoE) is a key architectural innovation in Qwen 3. In a standard dense model, every parameter is used for every token. In a MoE model, the network is divided into "expert" subnetworks, and only a small fraction are activated per token.

Qwen 3 30B-A3B in practice:

Total parameters: 30B (must fit in VRAM)
Active parameters per token: 3B (determines inference speed)
Result: You load ~17 GB into VRAM, but inference runs at the speed of a 3B model

This is why the 30B-A3B can outperform a dense 14B model while running at comparable speeds. The routing mechanism selects the most relevant experts for each token, concentrating compute where it matters.

Qwen 3 235B-A22B goes further: 235B total in memory, 22B active per token (comparable to a mid-size dense model) — frontier-level quality at workstation inference speeds.

The trade-off: MoE models use more total VRAM than their active parameter count suggests. You pay for capacity in memory, and you get quality + speed in return. Our VRAM calculator accounts for MoE architecture when estimating fit.

Qwen 3 for Coding

The Qwen 3 Coder variants are fine-tuned on massive programming datasets. Performance differences versus the base models are most pronounced on:

Code generation: Larger, more complex functions from natural language
Bug finding and fixing: Static analysis-style reasoning over code
Repo-level tasks: Multi-file context and refactoring
Technical documentation: Accurate docstrings and API descriptions

Qwen 3 Coder 8B

Excellent for everyday coding assistance on constrained hardware. Uses the same VRAM as Qwen 3 8B (~4.6 GB at Q4). A solid choice for developers on RTX 4060 8GB or similar.

ollama run qwen3-coder:8b

Qwen 3 Coder 14B

The best coding model under 16 GB VRAM. Noticeably stronger than the 8B Coder on longer functions and multi-file reasoning.

ollama run qwen3-coder:14b

Qwen 3 Coder 30B-A3B

The most capable coding model for single-GPU setups. The MoE architecture gives it quality close to the 32B dense model while fitting in ~17 GB. If you have a 24GB GPU and write code for a living, this is the model to run.

ollama run qwen3-coder:30b-a3b

Check compatibility: Qwen 3 Coder 30B-A3B on RTX 4090

Choosing the Right Quantization

Unlike reasoning-heavy models such as DeepSeek R1, Qwen 3's base and Coder variants handle quantization gracefully. You can drop to Q4 without dramatic quality loss for most tasks.

General guidance:

VRAM Budget	Recommended Quant	Notes
Very tight (≤4 GB)	Q4_K_M	Functional, minimal headroom
Normal (4–12 GB)	Q5_K_M	Good quality-size balance
Comfortable (12–24 GB)	Q6_K	Near-lossless for most tasks
Generous (24 GB+)	Q8_0	Effectively identical to F16

For coding tasks, we recommend Q5_K_M or higher — code generation benefits from precision, especially for syntax-sensitive outputs. For casual chat and summarization, Q4_K_M is fine.

Read our quantization guide for a deeper look at how different quant levels affect output quality.

Qwen 3 vs Other Leading Open Models

How does Qwen 3 stack up against the competition on hardware requirements?

Model	Params	VRAM (Q4)	Active Params	Architecture
Qwen 3 8B	8B	4.6 GB	8B	Dense
Llama 4 Scout	109B	~59 GB	17B	MoE
DeepSeek V3	671B	~376 GB	37B	MoE
Qwen 3 30B-A3B	30B	16.8 GB	3B	MoE
QwQ 32B	32B	18 GB	32B	Dense
Qwen 3 235B-A22B	235B	132 GB	22B	MoE

The 30B-A3B is especially compelling: it sits in a hardware tier similar to QwQ 32B but runs inference at the speed of a 3B model thanks to MoE activation sparsity.

Performance Expectations

Inference speed depends on your hardware's memory bandwidth. Approximate token generation speeds with Q4_K_M:

Hardware	Qwen 3 8B	Qwen 3 14B	Qwen 3 30B-A3B
RTX 4060 8GB	~50 tok/s	—	—
RTX 4070 12GB	~60 tok/s	~35 tok/s	—
RTX 4090 24GB	~85 tok/s	~55 tok/s	~70 tok/s*
RTX 5090 32GB	~110 tok/s	~70 tok/s	~90 tok/s*
Mac M4 Pro 24GB	~38 tok/s	~22 tok/s	~35 tok/s*
Mac M4 Max 64GB	~45 tok/s	~28 tok/s	~42 tok/s*

MoE models activate only 3B parameters per token, giving them a speed advantage over dense models of equivalent total size.

Getting Started

Find your fit: Use the VRAM calculator to see which Qwen 3 variant matches your hardware
Install Ollama: curl -fsSL https://ollama.com/install.sh | sh

Run your chosen model:

ollama run qwen3:8b              # Small and fast
ollama run qwen3:30b-a3b         # Best balance (MoE)
ollama run qwen3-coder:30b-a3b   # Best coding on 24GB GPU

Qwen 3 GPU Requirements — Original Family (0.6B–235B) VRAM Guide (2026)

Skip to your size — Qwen 3 / 3.5 jump table

Quick answers for the original Qwen 3

Qwen 3 Model Family

VRAM Requirements by Variant

Hardware Recommendations

Qwen 3 0.6B and 1.7B — Anywhere and Everywhere

Qwen 3 4B — Respectable Quality, Minimal Hardware

Qwen 3 8B — Best Small Model

Qwen 3 14B — Best Under 16GB VRAM

Qwen 3 30B-A3B — The MoE Efficiency Champion

Qwen 3 32B — Maximum Dense Quality

Qwen 3 235B-A22B — Frontier-Class Performance

Newer Qwen generations — dedicated guides

Understanding Qwen 3 MoE Variants

Qwen 3 for Coding

Qwen 3 Coder 8B

Qwen 3 Coder 14B

Qwen 3 Coder 30B-A3B

Choosing the Right Quantization

Qwen 3 vs Other Leading Open Models

Performance Expectations

Getting Started

Next Steps

Frequently Asked Questions