Will It Run AI
qwen, vram, gpu, hardware, apple-silicon

Qwen 3.5 VRAM Requirements — 9B, 27B, 35B-A3B, 122B-A10B

Exact VRAM for Qwen 3.5 9B, 27B, 35B-A3B, and 122B-A10B at Q4, Q5, Q6, Q8, and FP16. Includes RTX 4090, Apple Silicon, and MLX vs GGUF guidance.

If you are searching for Qwen 3.5 9B, 27B, 35B-A3B, or 122B-A10B VRAM requirements, this is the dedicated guide.

Dedicated variant pages (deeper dive with per-GPU matrices and setup commands):

Quick answers for Qwen 3.5

  • Qwen 3.5 9B: ~5.5 GB at Q4_K_M, ~9.6 GB at Q8_0
  • Qwen 3.5 27B: ~16.5 GB at Q4_K_M, ~28.9 GB at Q8_0
  • Qwen 3.5 35B-A3B: ~21.4 GB at Q4_K_M, ~37.5 GB at Q8_0
  • Qwen 3.5 122B-A10B: ~74.4 GB at Q4_K_M, ~130.5 GB at Q8_0

Qwen 3.5 is the latest generation of open-weight models from Alibaba Cloud. Building on the strong foundation of Qwen 3, the 3.5 family pushes quality and efficiency further with improved training data, better instruction following, and refined Mixture of Experts (MoE) routing. The lineup spans from a compact 4B dense model to the massive 397B-A17B MoE flagship, covering every hardware tier from budget laptops to multi-GPU data center rigs.

This guide provides exact VRAM requirements for every Qwen 3.5 variant at all major quantization levels, along with hardware recommendations and Apple Silicon guidance.

If you need the original Qwen 3 lineup instead, read Qwen 3 VRAM Requirements.

Qwen 3.5 Model Family Overview

Alibaba's Qwen 3.5 release includes both dense and MoE architectures:

ModelTypeTotal ParamsActive ParamsBest For
Qwen3.5-4BDense4B4BEdge devices, lightweight assistants
Qwen3.5-9BDense9B9BStrong all-rounder, great value
Qwen3.5-27BDense27B27BHigh-quality dense reasoning
Qwen3.5-35B-A3BMoE35B3BBest single-GPU efficiency
Qwen3.5-122B-A10BMoE122B10BProfessional workstation, Mac Pro
Qwen3.5-397B-A17BMoE397B17BFrontier-class, multi-GPU only

The MoE variants are the standout story. Qwen 3.5 35B-A3B activates only 3B parameters per token despite having 35B total, giving you dense-27B-class quality at 3B-class inference speed. The 122B-A10B and 397B-A17B follow the same principle at larger scale.

VRAM Requirements by Model and Quantization

These numbers represent model weight sizes calculated from calibrated per-parameter rates against real GGUF files. Add 1-2 GB for KV cache and runtime overhead at default context lengths.

ModelQ4_K_MQ5_K_MQ6_KQ8_0F16
Qwen3.5-4B2.4 GB2.9 GB3.3 GB4.3 GB8.2 GB
Qwen3.5-9B5.5 GB6.5 GB7.4 GB9.6 GB18.5 GB
Qwen3.5-27B16.5 GB19.4 GB22.1 GB28.9 GB55.4 GB
Qwen3.5-35B-A3B (MoE)21.4 GB25.2 GB28.7 GB37.5 GB71.8 GB
Qwen3.5-122B-A10B (MoE)74.4 GB87.8 GB100.0 GB130.5 GB250.1 GB
Qwen3.5-397B-A17B (MoE)242.2 GB285.8 GB325.5 GB424.8 GB813.9 GB

Key takeaway: The 35B-A3B MoE needs 21.4 GB at Q4 — it fits on a 24 GB RTX 4090 with room for a reasonable context window. Despite loading 35B parameters into memory, inference runs at the speed of a 3B dense model because only 3B parameters are active per token.

Best GPU for Each Qwen 3.5 Model

Qwen3.5-4B — Runs Almost Anywhere

At 2.4 GB (Q4), the 4B fits on virtually any modern GPU. Even a GTX 1060 6GB or Intel Arc A380 handles it at Q8. Great for always-on assistants, edge deployments, or quick prototyping.

Recommended hardware:

  • Any GPU with 4 GB or more VRAM (RTX 3060, RTX 4060, GTX 1060)
  • Mac M-series with 8 GB unified memory
  • CPU-only inference with 8 GB or more system RAM

Quick start:

ollama run qwen3.5:4b

Qwen3.5-9B — Best Under 12GB

The 9B dense model is the sweet spot of the Qwen 3.5 lineup. It delivers strong instruction-following, multilingual, and coding performance while staying under 10 GB at Q8.

Recommended hardware:

Quick start:

ollama run qwen3.5:9b

Qwen3.5-27B — The 24GB GPU Sweet Spot

The 27B dense model is the largest Qwen 3.5 that fits on a single consumer GPU at Q4. At 16.5 GB it leaves about 7 GB of headroom on a 24 GB card — enough for standard context lengths. Quality is a clear step up from the 9B, particularly for complex reasoning and long-form generation.

Recommended hardware:

Quick start:

ollama run qwen3.5:27b

Check compatibility: Can Qwen 3.5 27B run on RTX 4090?

Qwen3.5-35B-A3B (MoE) — Best Efficiency on 24GB

This is the model to watch. Despite having 35B total parameters, the MoE architecture activates only 3B per token. The result: you load 21.4 GB into VRAM at Q4, but inference is as fast as a 3B dense model while quality approaches that of a much larger dense network.

At Q4, it fits on a 24 GB GPU. Inference speed is excellent because the active compute footprint is tiny.

Recommended hardware:

Quick start:

ollama run qwen3.5:35b-a3b

Qwen3.5-122B-A10B (MoE) — Workstation and Apple Silicon Territory

With 74.4 GB at Q4, the 122B-A10B requires either professional GPUs or high-memory Apple Silicon. The payoff is substantial: 10B active parameters deliver quality that competes with frontier dense models, while inference speed remains practical.

Recommended hardware:

Qwen3.5-397B-A17B (MoE) — Multi-GPU or Ultra-High-Memory Mac

The flagship model needs 242 GB at Q4. This is firmly in multi-GPU or Apple Silicon Ultra territory. With 17B active parameters per token, the 397B-A17B delivers frontier-class quality while remaining feasible on high-end local hardware.

Recommended hardware:

  • H100 80GB x 4 GPUs — fits at Q4 with tensor parallelism
  • A100 80GB x 4 GPUs — fits at Q4
  • MI300X 192GB x 2 GPUs — fits at Q4
  • Apple M3 Ultra 512GB (custom config) — fits at Q4 with unified memory

Apple Silicon Guide — Qwen 3.5 on Mac

Apple Silicon's unified memory architecture makes it one of the best platforms for running large models locally. Here is what each Mac configuration can handle:

Mac ConfigurationBest Qwen 3.5 ModelQuantizationNotes
M4 16GBQwen3.5-4BQ8_0 (4.3 GB)Comfortable with headroom
M4 16GBQwen3.5-9BQ4_K_M (5.5 GB)Tight — leaves ~10 GB for OS and context
M4 Pro 24GBQwen3.5-9BQ8_0 (9.6 GB)Comfortable fit
M4 Pro 24GBQwen3.5-27BQ4_K_M (16.5 GB)Tight but functional
M4 Max 64GBQwen3.5-27BF16 (55.4 GB)Full precision, excellent quality
M4 Max 64GBQwen3.5-35B-A3BQ4_K_M (21.4 GB)Comfortable with large context
M4 Max 128GBQwen3.5-122B-A10BQ4_K_M (74.4 GB)Fits with headroom for context
M4 Ultra 192GBQwen3.5-122B-A10BQ5_K_M (87.8 GB)Generous headroom

MLX vs GGUF on Mac

Mac users have two main options for running Qwen 3.5 locally:

  • MLX (via mlx-lm or LM Studio): Apple's native ML framework. Optimized for Apple Silicon with minimal overhead. Best performance and memory efficiency on Mac. Look for mlx-community/Qwen3.5-* models on Hugging Face.
  • GGUF (via llama.cpp or Ollama): Cross-platform format. Slightly higher memory overhead on Mac than MLX but broader compatibility. Works well through Ollama's simple CLI.

For most Mac users, MLX is the better choice — it takes full advantage of the unified memory architecture and Metal GPU acceleration. GGUF via Ollama is the easier setup if you just want to get started quickly.

Explore Mac compatibility for all models on our Mac hardware pages.

Choosing the Right Quantization

Qwen 3.5 handles quantization well across the board. Here is a quick guide:

VRAM BudgetRecommended QuantQuality Impact
Very tight (4 GB or less)Q4_K_MMinimal loss for chat and general tasks
Normal (4-12 GB)Q5_K_MGood balance of quality and size
Comfortable (12-24 GB)Q6_KNear-lossless for most tasks
Generous (24 GB and above)Q8_0Effectively identical to full precision

For coding and structured output tasks, prefer Q5_K_M or higher. For casual conversation and summarization, Q4_K_M is perfectly adequate.

Read our quantization guide for a deeper look at how quant levels affect output quality.

Getting Started

  1. Auto-detect your hardware: Use Hardware Detection to see which Qwen 3.5 variant fits your GPU or Mac automatically
  2. Check your fit: Use the VRAM calculator to see which Qwen 3.5 variant matches your hardware
  3. Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
  4. Run your model:
    # Fast and light
    ollama run qwen3.5:9b
    
    # Best efficiency on 24GB GPU (MoE)
    ollama run qwen3.5:35b-a3b
    
    # High-quality dense
    ollama run qwen3.5:27b
    
  5. Browse all Qwen models on WillItRunAI: Browse Qwen models

Next Steps

Frequently Asked Questions

How much VRAM does Qwen 3.5 27B need?

Qwen 3.5 27B needs ~16.5 GB at Q4_K_M, ~19.4 GB at Q5_K_M, ~22.1 GB at Q6_K, and ~28.9 GB at Q8_0. An RTX 4090 (24 GB) runs it comfortably at Q4, and a 32 GB RTX 5090 handles Q5 with headroom.

Can RTX 4090 run Qwen 3.5?

Yes. The RTX 4090 (24 GB) runs Qwen 3.5 4B and 9B at Q8, Qwen 3.5 27B at Q4, and Qwen 3.5 35B-A3B (MoE) at Q4. It is one of the best consumer GPUs for the Qwen 3.5 family.

Can I run Qwen 3.5 on Mac?

Yes. Apple Silicon Macs with unified memory handle Qwen 3.5 well. An M4 with 16 GB runs the 4B and 9B variants. An M4 Pro 24 GB handles 27B at Q4. An M4 Max 128 GB fits the 122B-A10B MoE at Q4.

What's the best Qwen 3.5 model for 16GB VRAM?

Qwen 3.5 9B at Q8_0 (~9.6 GB) gives the best quality within 16 GB. You can also fit Qwen 3.5 27B at Q4_K_M (~16.5 GB) if you limit context length, though it will be tight.