Will It Run AI
qwen, alibaba, vram, gpu-requirements, qwen-3

Qwen 3 GPU Requirements — Original Family (0.6B–235B) VRAM Guide (2026)

VRAM tables for the original Qwen 3 family (0.6B to 235B-A22B), with GPU and Mac recommendations. For the newer Qwen 3.5 and Qwen 3.6 generations, see the dedicated pages linked below.

This page covers the original Qwen 3 family (released 2025) and the Qwen 3.5 refresh: dense and MoE models from 0.6B up to 235B-A22B. The newest Qwen 3.6 generation has its own dedicated guides — this page no longer covers 3.6 specs.

Running Qwen 3.6? It's a separate generation with its own VRAM math. Go straight to the dedicated guides: Qwen 3.6 27B (dense) → and Qwen 3.6 35B-A3B (MoE) →.

Skip to your size — Qwen 3 / 3.5 jump table

If you searched for…Q4_K_M VRAMDedicated guide
qwen 3.5 9b~5.7 GBQwen 3.5 9B VRAM
qwen 3.5 27b~16.5 GBQwen 3.5 27B VRAM
qwen 3.5 35b-a3b~19.6 GBQwen 3.5 35B-A3B VRAM
qwen 3.5 122b-a10b~74 GBQwen 3.5 122B-A10B VRAM
qwen 3 8B / 14B / 30B-A3B / 32B / 235B-A22B (original)see tables belowthis page

Quick answers for the original Qwen 3

  • Qwen 3 8B: ~4.6 GB at Q4_K_M, ~8.5 GB at Q8_0
  • Qwen 3 14B: ~8.3 GB at Q4_K_M, ~15.7 GB at Q8_0
  • Qwen 3 30B-A3B: ~16.8 GB at Q4_K_M, ~31.6 GB at Q8_0
  • Qwen 3 32B: ~19.1 GB at Q4_K_M, ~36.1 GB at Q8_0
  • Qwen 3 235B-A22B: ~132 GB at Q4_K_M

Don't see your variant above? Try the VRAM Calculator — paste any model + GPU/Mac and get an exact fit verdict in seconds.

Qwen 3 is Alibaba's open-weight foundation family. It competes with Llama 4 and DeepSeek V3, offering a wide range of sizes — from compact 0.6B models that run on a phone to the flagship 235B MoE. The lineup combines dense models (efficient, predictable VRAM) with Mixture of Experts models (MoE, more capable per byte of VRAM).

Qwen 3 Model Family

Alibaba released Qwen 3 as a complete lineup covering every hardware tier:

ModelTypeParametersActive ParamsBest For
Qwen 3 0.6BDense0.6B0.6BEdge devices, always-on agents
Qwen 3 1.7BDense1.7B1.7BLightweight local assistants
Qwen 3 4BDense4B4BMid-range phones, low-VRAM desktops
Qwen 3 8BDense8B8BFlagship small model, great all-rounder
Qwen 3 14BDense14B14BMid-range performance, strong reasoning
Qwen 3 30B-A3BMoE30B3BBest efficiency, MoE flagship
Qwen 3 32BDense32B32BHigh-end dense, maximum dense quality
Qwen 3 235B-A22BMoE235B22BFlagship MoE, frontier-class quality
Qwen 3 Coder 8BDense8B8BCoding-optimized small model
Qwen 3 Coder 14BDense14B14BCoding mid-range
Qwen 3 Coder 30B-A3BMoE30B3BBest coding efficiency

The Coder variants share architecture with the base models but are fine-tuned specifically on programming tasks for better results on code generation, debugging, and technical documentation.

VRAM Requirements by Variant

Exact VRAM at different quantization levels for the original Qwen 3 family:

VariantQ4_K_MQ5_K_MQ6_KQ8_0F16
Qwen 3 0.6B0.5 GB0.6 GB0.7 GB0.9 GB1.3 GB
Qwen 3 1.7B1.1 GB1.3 GB1.5 GB1.9 GB3.4 GB
Qwen 3 4B2.5 GB3.0 GB3.6 GB4.6 GB8.0 GB
Qwen 3 8B4.6 GB5.6 GB6.6 GB8.5 GB16.1 GB
Qwen 3 14B8.3 GB10.2 GB12.0 GB15.7 GB28.0 GB
Qwen 3 30B-A3B16.8 GB20.6 GB24.2 GB31.6 GB60.0 GB
Qwen 3 32B19.1 GB23.5 GB27.6 GB36.1 GB64.4 GB
Qwen 3 235B-A22B131.9 GB162.2 GB190.5 GB249.0 GB470.0 GB

Add ~1-2 GB for KV cache and runtime overhead at default context lengths.

Hardware Recommendations

Qwen 3 0.6B and 1.7B — Anywhere and Everywhere

These micro-models are designed for constrained environments. They fit in less than 2 GB of VRAM, making them viable on integrated graphics, older GPUs, or even CPU-only inference.

Recommended hardware:

  • Any GPU with 4GB+ VRAM (even GTX 1650 4GB)
  • Mac M-series with 8GB unified memory
  • Modern CPUs with 8GB+ RAM (llama.cpp CPU mode)

Quick start:

ollama run qwen3:0.6b
ollama run qwen3:1.7b

Qwen 3 4B — Respectable Quality, Minimal Hardware

The 4B model delivers surprisingly capable responses for its size. At Q4, it needs only ~2.5 GB — perfect for 4-6 GB GPUs or low-memory Macs.

Recommended hardware:

  • RTX 3050 6GB / RTX 4060 8GB — fits at Q8 with headroom
  • Mac M2/M3 with 8GB unified memory
  • Intel Arc A770 16GB — excellent efficiency

Quick start:

ollama run qwen3:4b

Qwen 3 8B — Best Small Model

The 8B is the sweet spot of the Qwen 3 dense lineup. It punches well above its weight class on instruction following, coding, and multilingual tasks — particularly strong on Chinese and Japanese.

Recommended hardware:

  • RTX 4060 8GB — fits at Q4_K_M with minimal headroom; Q6 possible with careful context limits
  • RTX 4070 12GB — comfortable at Q6, excellent performance-per-watt
  • RTX 4070 Ti Super 16GB — fits at Q8 with room for large context
  • Any Mac with 16GB+ unified memory

Quick start:

ollama run qwen3:8b

Check compatibility: Qwen 3 8B on RTX 4070 | Qwen 3 8B on RTX 4060

Qwen 3 14B — Best Under 16GB VRAM

The 14B hits a quality tier noticeably above the 8B, especially for complex reasoning and coding tasks. At Q4 it needs ~8.3 GB, making it accessible on most mainstream GPUs.

Recommended hardware:

Quick start:

ollama run qwen3:14b

Check compatibility: Qwen 3 14B on RTX 4070 Ti Super

Qwen 3 30B-A3B — The MoE Efficiency Champion

This is one of the most interesting models in the Qwen 3 family. The 30B-A3B is a Mixture of Experts model with 30B total parameters but only 3B active per token. That means inference is as fast as a 3B dense model while the quality rivals much larger dense models.

At Q4, the 30B-A3B needs ~17 GB — fitting comfortably on a 24GB GPU.

Recommended hardware:

Quick start:

ollama run qwen3:30b-a3b

Check compatibility: Qwen 3 30B-A3B on RTX 4090 | Qwen 3 30B-A3B on RTX 5090

Qwen 3 32B — Maximum Dense Quality

The dense 32B is the largest Qwen 3 model that doesn't use MoE. It delivers the highest quality dense inference in the family. At Q4 it needs ~19 GB — just over what a 24GB GPU can hold comfortably, so tight configurations will require some context length management.

Recommended hardware:

Quick start:

ollama run qwen3:32b

Check compatibility: Qwen 3 32B on RTX 5090

Qwen 3 235B-A22B — Frontier-Class Performance

The flagship MoE model. With 235B total parameters and 22B active per token, this is the most capable model in the original Qwen 3 lineup and competes with frontier proprietary models on many benchmarks.

Recommended hardware:

Quick start:

ollama run qwen3:235b-a22b

Newer Qwen generations — dedicated guides

Qwen 3.5 (late 2025 refresh) adds new sizes (2B, 9B, 27B dense; 35B-A3B, 122B-A10B, 397B-A17B MoE) with improved tuning. Qwen 3.6 (April 2026) introduces the 1M-token native context and a flagship-class dense 27B. Because the VRAM math and hardware picks differ per variant, each has its own page:

Understanding Qwen 3 MoE Variants

Mixture of Experts (MoE) is a key architectural innovation in Qwen 3. In a standard dense model, every parameter is used for every token. In a MoE model, the network is divided into "expert" subnetworks, and only a small fraction are activated per token.

Qwen 3 30B-A3B in practice:

  • Total parameters: 30B (must fit in VRAM)
  • Active parameters per token: 3B (determines inference speed)
  • Result: You load ~17 GB into VRAM, but inference runs at the speed of a 3B model

This is why the 30B-A3B can outperform a dense 14B model while running at comparable speeds. The routing mechanism selects the most relevant experts for each token, concentrating compute where it matters.

Qwen 3 235B-A22B goes further: 235B total in memory, 22B active per token (comparable to a mid-size dense model) — frontier-level quality at workstation inference speeds.

The trade-off: MoE models use more total VRAM than their active parameter count suggests. You pay for capacity in memory, and you get quality + speed in return. Our VRAM calculator accounts for MoE architecture when estimating fit.

Qwen 3 for Coding

The Qwen 3 Coder variants are fine-tuned on massive programming datasets. Performance differences versus the base models are most pronounced on:

  • Code generation: Larger, more complex functions from natural language
  • Bug finding and fixing: Static analysis-style reasoning over code
  • Repo-level tasks: Multi-file context and refactoring
  • Technical documentation: Accurate docstrings and API descriptions

Qwen 3 Coder 8B

Excellent for everyday coding assistance on constrained hardware. Uses the same VRAM as Qwen 3 8B (~4.6 GB at Q4). A solid choice for developers on RTX 4060 8GB or similar.

ollama run qwen3-coder:8b

Qwen 3 Coder 14B

The best coding model under 16 GB VRAM. Noticeably stronger than the 8B Coder on longer functions and multi-file reasoning.

ollama run qwen3-coder:14b

Qwen 3 Coder 30B-A3B

The most capable coding model for single-GPU setups. The MoE architecture gives it quality close to the 32B dense model while fitting in ~17 GB. If you have a 24GB GPU and write code for a living, this is the model to run.

ollama run qwen3-coder:30b-a3b

Check compatibility: Qwen 3 Coder 30B-A3B on RTX 4090

Choosing the Right Quantization

Unlike reasoning-heavy models such as DeepSeek R1, Qwen 3's base and Coder variants handle quantization gracefully. You can drop to Q4 without dramatic quality loss for most tasks.

General guidance:

VRAM BudgetRecommended QuantNotes
Very tight (≤4 GB)Q4_K_MFunctional, minimal headroom
Normal (4–12 GB)Q5_K_MGood quality-size balance
Comfortable (12–24 GB)Q6_KNear-lossless for most tasks
Generous (24 GB+)Q8_0Effectively identical to F16

For coding tasks, we recommend Q5_K_M or higher — code generation benefits from precision, especially for syntax-sensitive outputs. For casual chat and summarization, Q4_K_M is fine.

Read our quantization guide for a deeper look at how different quant levels affect output quality.

Qwen 3 vs Other Leading Open Models

How does Qwen 3 stack up against the competition on hardware requirements?

ModelParamsVRAM (Q4)Active ParamsArchitecture
Qwen 3 8B8B4.6 GB8BDense
Llama 4 Scout109B~59 GB17BMoE
DeepSeek V3671B~376 GB37BMoE
Qwen 3 30B-A3B30B16.8 GB3BMoE
QwQ 32B32B18 GB32BDense
Qwen 3 235B-A22B235B132 GB22BMoE

The 30B-A3B is especially compelling: it sits in a hardware tier similar to QwQ 32B but runs inference at the speed of a 3B model thanks to MoE activation sparsity.

Performance Expectations

Inference speed depends on your hardware's memory bandwidth. Approximate token generation speeds with Q4_K_M:

HardwareQwen 3 8BQwen 3 14BQwen 3 30B-A3B
RTX 4060 8GB~50 tok/s
RTX 4070 12GB~60 tok/s~35 tok/s
RTX 4090 24GB~85 tok/s~55 tok/s~70 tok/s*
RTX 5090 32GB~110 tok/s~70 tok/s~90 tok/s*
Mac M4 Pro 24GB~38 tok/s~22 tok/s~35 tok/s*
Mac M4 Max 64GB~45 tok/s~28 tok/s~42 tok/s*

MoE models activate only 3B parameters per token, giving them a speed advantage over dense models of equivalent total size.

Getting Started

  1. Find your fit: Use the VRAM calculator to see which Qwen 3 variant matches your hardware
  2. Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
  3. Run your chosen model:
    ollama run qwen3:8b              # Small and fast
    ollama run qwen3:30b-a3b         # Best balance (MoE)
    ollama run qwen3-coder:30b-a3b   # Best coding on 24GB GPU
    

Next Steps

Frequently Asked Questions

How much VRAM does Qwen 3 need?

It varies by variant. Qwen 3 8B needs ~4.6GB at Q4. Qwen 3 14B needs ~8.3GB at Q4. The MoE variant Qwen 3 30B-A3B needs ~16.8GB at Q4. Qwen 3 235B-A22B needs ~132GB at Q4.

Can I run Qwen 3 on an RTX 4090?

Yes. The RTX 4090 (24GB) comfortably runs Qwen 3 8B at Q8, Qwen 3 14B at Q6, and Qwen 3 30B-A3B (MoE) at Q4.

What is Qwen 3 30B-A3B?

Qwen 3 30B-A3B is a Mixture of Experts model with 30B total parameters but only 3B active per token. It delivers quality comparable to much larger dense models while fitting in ~17GB at Q4 — making it ideal for RTX 4090 and similar hardware.

Where can I find Qwen 3.5 or Qwen 3.6 VRAM numbers?

See the dedicated pages: /blog/qwen-35-vram-requirements-complete-guide for the Qwen 3.5 family, /blog/qwen-3-6-vram-requirements for the Qwen 3.6 35B-A3B MoE, and /blog/qwen-3-6-27b-vram-requirements for the Qwen 3.6 dense 27B.

Is Qwen 3 good for coding?

Qwen 3 Coder variants are specifically optimized for coding. Qwen 3 Coder 30B-A3B is excellent for programming tasks. The base Qwen 3 models also handle coding well, especially the 14B and 30B-A3B variants.

What is the difference between Qwen 3, Qwen 3.5, and Qwen 3.6?

Qwen 3 is the original 2025 family (0.6B–235B-A22B). Qwen 3.5 is a 2026 refresh with improved sizes and tuning. Qwen 3.6 (April 2026) is a newer generation with a 1M-token native context — it has its own VRAM math and dedicated guides at /blog/qwen-3-6-vram-requirements and /blog/qwen-3-6-27b-vram-requirements.