How much VRAM does Qwen 3.5 27B need?

Qwen 3.5 27B needs ~16.5 GB at Q4_K_M, ~19.4 GB at Q5_K_M, ~22.1 GB at Q6_K, and ~28.9 GB at Q8_0. An RTX 4090 (24 GB) runs it comfortably at Q4, and a 32 GB RTX 5090 handles Q5 with headroom.

Can RTX 4090 run Qwen 3.5?

Yes. The RTX 4090 (24 GB) runs Qwen 3.5 4B and 9B at Q8, Qwen 3.5 27B at Q4, and Qwen 3.5 35B-A3B (MoE) at Q4. It is one of the best consumer GPUs for the Qwen 3.5 family.

Can I run Qwen 3.5 on Mac?

Yes. Apple Silicon Macs with unified memory handle Qwen 3.5 well. An M4 with 16 GB runs the 4B and 9B variants. An M4 Pro 24 GB handles 27B at Q4. An M4 Max 128 GB fits the 122B-A10B MoE at Q4.

What's the best Qwen 3.5 model for 16GB VRAM?

Qwen 3.5 9B at Q8_0 (~9.6 GB) gives the best quality within 16 GB. You can also fit Qwen 3.5 27B at Q4_K_M (~16.5 GB) if you limit context length, though it will be tight.

March 28, 2026qwen, vram, gpu, hardware, apple-silicon

Qwen 3.5 VRAM Requirements — 9B, 27B, 35B-A3B, 122B-A10B

Exact VRAM for Qwen 3.5 9B, 27B, 35B-A3B, and 122B-A10B at Q4, Q5, Q6, Q8, and FP16. Includes RTX 4090, Apple Silicon, and MLX vs GGUF guidance.

If you are searching for Qwen 3.5 9B, 27B, 35B-A3B, or 122B-A10B VRAM requirements, this is the dedicated guide.

Dedicated variant pages (deeper dive with per-GPU matrices and setup commands):

Qwen 3.5 9B VRAM Requirements — best 8B-class, 5.5 GB at Q4

Qwen 3.5 27B VRAM Requirements — dense flagship, 16.5 GB at Q4

Qwen 3.5 35B-A3B VRAM Requirements — MoE, 21.4 GB at Q4

Qwen 3.5 122B-A10B VRAM Requirements — workstation MoE, 74.4 GB at Q4

Quick answers for Qwen 3.5

Qwen 3.5 9B: ~5.5 GB at Q4_K_M, ~9.6 GB at Q8_0
Qwen 3.5 27B: ~16.5 GB at Q4_K_M, ~28.9 GB at Q8_0
Qwen 3.5 35B-A3B: ~21.4 GB at Q4_K_M, ~37.5 GB at Q8_0
Qwen 3.5 122B-A10B: ~74.4 GB at Q4_K_M, ~130.5 GB at Q8_0

Qwen 3.5 is the latest generation of open-weight models from Alibaba Cloud. Building on the strong foundation of Qwen 3, the 3.5 family pushes quality and efficiency further with improved training data, better instruction following, and refined Mixture of Experts (MoE) routing. The lineup spans from a compact 4B dense model to the massive 397B-A17B MoE flagship, covering every hardware tier from budget laptops to multi-GPU data center rigs.

This guide provides exact VRAM requirements for every Qwen 3.5 variant at all major quantization levels, along with hardware recommendations and Apple Silicon guidance.

If you need the original Qwen 3 lineup instead, read Qwen 3 VRAM Requirements.

Qwen 3.5 Model Family Overview

Alibaba's Qwen 3.5 release includes both dense and MoE architectures:

Model	Type	Total Params	Active Params	Best For
Qwen3.5-4B	Dense	4B	4B	Edge devices, lightweight assistants
Qwen3.5-9B	Dense	9B	9B	Strong all-rounder, great value
Qwen3.5-27B	Dense	27B	27B	High-quality dense reasoning
Qwen3.5-35B-A3B	MoE	35B	3B	Best single-GPU efficiency
Qwen3.5-122B-A10B	MoE	122B	10B	Professional workstation, Mac Pro
Qwen3.5-397B-A17B	MoE	397B	17B	Frontier-class, multi-GPU only

The MoE variants are the standout story. Qwen 3.5 35B-A3B activates only 3B parameters per token despite having 35B total, giving you dense-27B-class quality at 3B-class inference speed. The 122B-A10B and 397B-A17B follow the same principle at larger scale.

VRAM Requirements by Model and Quantization

These numbers represent model weight sizes calculated from calibrated per-parameter rates against real GGUF files. Add 1-2 GB for KV cache and runtime overhead at default context lengths.

Model	Q4_K_M	Q5_K_M	Q6_K	Q8_0	F16
Qwen3.5-4B	2.4 GB	2.9 GB	3.3 GB	4.3 GB	8.2 GB
Qwen3.5-9B	5.5 GB	6.5 GB	7.4 GB	9.6 GB	18.5 GB
Qwen3.5-27B	16.5 GB	19.4 GB	22.1 GB	28.9 GB	55.4 GB
Qwen3.5-35B-A3B (MoE)	21.4 GB	25.2 GB	28.7 GB	37.5 GB	71.8 GB
Qwen3.5-122B-A10B (MoE)	74.4 GB	87.8 GB	100.0 GB	130.5 GB	250.1 GB
Qwen3.5-397B-A17B (MoE)	242.2 GB	285.8 GB	325.5 GB	424.8 GB	813.9 GB

Key takeaway: The 35B-A3B MoE needs 21.4 GB at Q4 — it fits on a 24 GB RTX 4090 with room for a reasonable context window. Despite loading 35B parameters into memory, inference runs at the speed of a 3B dense model because only 3B parameters are active per token.

Best GPU for Each Qwen 3.5 Model

Qwen3.5-4B — Runs Almost Anywhere

At 2.4 GB (Q4), the 4B fits on virtually any modern GPU. Even a GTX 1060 6GB or Intel Arc A380 handles it at Q8. Great for always-on assistants, edge deployments, or quick prototyping.

Recommended hardware:

Any GPU with 4 GB or more VRAM (RTX 3060, RTX 4060, GTX 1060)
Mac M-series with 8 GB unified memory
CPU-only inference with 8 GB or more system RAM

Quick start:

ollama run qwen3.5:4b

Qwen3.5-9B — Best Under 12GB

The 9B dense model is the sweet spot of the Qwen 3.5 lineup. It delivers strong instruction-following, multilingual, and coding performance while staying under 10 GB at Q8.

Recommended hardware:

RTX 4060 8GB — fits at Q5_K_M with moderate headroom
RTX 4070 12GB — comfortable at Q6, excellent performance
RTX 3060 12GB — fits at Q6 with careful context limits

Quick start:

ollama run qwen3.5:9b

Qwen3.5-27B — The 24GB GPU Sweet Spot

The 27B dense model is the largest Qwen 3.5 that fits on a single consumer GPU at Q4. At 16.5 GB it leaves about 7 GB of headroom on a 24 GB card — enough for standard context lengths. Quality is a clear step up from the 9B, particularly for complex reasoning and long-form generation.

Recommended hardware:

RTX 4090 24GB — fits at Q4 with good headroom
RTX 5090 32GB — comfortable at Q5, room for large context
RTX 3090 24GB — fits at Q4, slower but functional

Quick start:

ollama run qwen3.5:27b

Check compatibility: Can Qwen 3.5 27B run on RTX 4090?

Qwen3.5-35B-A3B (MoE) — Best Efficiency on 24GB

This is the model to watch. Despite having 35B total parameters, the MoE architecture activates only 3B per token. The result: you load 21.4 GB into VRAM at Q4, but inference is as fast as a 3B dense model while quality approaches that of a much larger dense network.

At Q4, it fits on a 24 GB GPU. Inference speed is excellent because the active compute footprint is tiny.

Recommended hardware:

RTX 4090 24GB — fits at Q4 with ~2.5 GB headroom
RTX 5090 32GB — fits at Q5 with generous context window
Mac M4 Max 36GB — comfortable at Q4, unified memory advantage

Quick start:

ollama run qwen3.5:35b-a3b

Qwen3.5-122B-A10B (MoE) — Workstation and Apple Silicon Territory

With 74.4 GB at Q4, the 122B-A10B requires either professional GPUs or high-memory Apple Silicon. The payoff is substantial: 10B active parameters deliver quality that competes with frontier dense models, while inference speed remains practical.

Recommended hardware:

A100 80GB — fits at Q4 on a single GPU
H100 80GB — fits at Q4 with fast inference
Mac M4 Max 128GB — fits at Q4 with room to spare
Mac M4 Ultra 192GB — comfortable at Q5

Qwen3.5-397B-A17B (MoE) — Multi-GPU or Ultra-High-Memory Mac

The flagship model needs 242 GB at Q4. This is firmly in multi-GPU or Apple Silicon Ultra territory. With 17B active parameters per token, the 397B-A17B delivers frontier-class quality while remaining feasible on high-end local hardware.

Recommended hardware:

H100 80GB x 4 GPUs — fits at Q4 with tensor parallelism
A100 80GB x 4 GPUs — fits at Q4
MI300X 192GB x 2 GPUs — fits at Q4
Apple M3 Ultra 512GB (custom config) — fits at Q4 with unified memory

Apple Silicon Guide — Qwen 3.5 on Mac

Apple Silicon's unified memory architecture makes it one of the best platforms for running large models locally. Here is what each Mac configuration can handle:

Mac Configuration	Best Qwen 3.5 Model	Quantization	Notes
M4 16GB	Qwen3.5-4B	Q8_0 (4.3 GB)	Comfortable with headroom
M4 16GB	Qwen3.5-9B	Q4_K_M (5.5 GB)	Tight — leaves ~10 GB for OS and context
M4 Pro 24GB	Qwen3.5-9B	Q8_0 (9.6 GB)	Comfortable fit
M4 Pro 24GB	Qwen3.5-27B	Q4_K_M (16.5 GB)	Tight but functional
M4 Max 64GB	Qwen3.5-27B	F16 (55.4 GB)	Full precision, excellent quality
M4 Max 64GB	Qwen3.5-35B-A3B	Q4_K_M (21.4 GB)	Comfortable with large context
M4 Max 128GB	Qwen3.5-122B-A10B	Q4_K_M (74.4 GB)	Fits with headroom for context
M4 Ultra 192GB	Qwen3.5-122B-A10B	Q5_K_M (87.8 GB)	Generous headroom

MLX vs GGUF on Mac

Mac users have two main options for running Qwen 3.5 locally:

MLX (via mlx-lm or LM Studio): Apple's native ML framework. Optimized for Apple Silicon with minimal overhead. Best performance and memory efficiency on Mac. Look for mlx-community/Qwen3.5-* models on Hugging Face.
GGUF (via llama.cpp or Ollama): Cross-platform format. Slightly higher memory overhead on Mac than MLX but broader compatibility. Works well through Ollama's simple CLI.

For most Mac users, MLX is the better choice — it takes full advantage of the unified memory architecture and Metal GPU acceleration. GGUF via Ollama is the easier setup if you just want to get started quickly.

Explore Mac compatibility for all models on our Mac hardware pages.

Choosing the Right Quantization

Qwen 3.5 handles quantization well across the board. Here is a quick guide:

VRAM Budget	Recommended Quant	Quality Impact
Very tight (4 GB or less)	Q4_K_M	Minimal loss for chat and general tasks
Normal (4-12 GB)	Q5_K_M	Good balance of quality and size
Comfortable (12-24 GB)	Q6_K	Near-lossless for most tasks
Generous (24 GB and above)	Q8_0	Effectively identical to full precision

For coding and structured output tasks, prefer Q5_K_M or higher. For casual conversation and summarization, Q4_K_M is perfectly adequate.

Read our quantization guide for a deeper look at how quant levels affect output quality.

Getting Started

Auto-detect your hardware: Use Hardware Detection to see which Qwen 3.5 variant fits your GPU or Mac automatically
Check your fit: Use the VRAM calculator to see which Qwen 3.5 variant matches your hardware
Install Ollama: curl -fsSL https://ollama.com/install.sh | sh

Run your model:

# Fast and light
ollama run qwen3.5:9b

# Best efficiency on 24GB GPU (MoE)
ollama run qwen3.5:35b-a3b

# High-quality dense
ollama run qwen3.5:27b

Browse all Qwen models on WillItRunAI: Browse Qwen models

Qwen 3.5 VRAM Requirements — 9B, 27B, 35B-A3B, 122B-A10B

Quick answers for Qwen 3.5

Qwen 3.5 Model Family Overview

VRAM Requirements by Model and Quantization

Best GPU for Each Qwen 3.5 Model

Qwen3.5-4B — Runs Almost Anywhere

Qwen3.5-9B — Best Under 12GB

Qwen3.5-27B — The 24GB GPU Sweet Spot

Qwen3.5-35B-A3B (MoE) — Best Efficiency on 24GB

Qwen3.5-122B-A10B (MoE) — Workstation and Apple Silicon Territory

Qwen3.5-397B-A17B (MoE) — Multi-GPU or Ultra-High-Memory Mac

Apple Silicon Guide — Qwen 3.5 on Mac

MLX vs GGUF on Mac

Choosing the Right Quantization

Getting Started

Next Steps

Frequently Asked Questions