Qwen 3.6 VRAM & Hardware Requirements — 35B-A3B MoE GPU Guide (2026)
Qwen 3.6 35B-A3B MoE: Q4_K_M ~21 GB, fits RTX 4090 24GB or Mac M4 Pro. Q8 ~37 GB needs 48 GB class. GPU and Mac buyer guide for 1M-context MoE.
This page covers Qwen 3.6 VRAM requirements and Qwen 3.6 hardware requirements for the 35B-A3B MoE variant across every quantization level (Q4_K_M, Q5_K_M, Q6_K, Q8_0, FP16), plus a buyer's guide to which GPU or Mac to actually pick and the 1M-context considerations for this MoE.
If you search for qwen 3.6 vram requirements, qwen3.6-35b-a3b vram requirements, or qwen3.6-35b-a3b hardware requirements, this is the canonical reference. Looking for the dense 27B instead? Jump to Qwen3.6-27B VRAM Requirements.
New: the dense Qwen3.6-27B released April 22, 2026 — smaller (16.8 GB Q4 vs 21 GB), better on coding benchmarks, and includes vision. See Qwen3.6-27B VRAM Requirements for the dedicated guide.
Also in the Qwen 3.6 family: Qwen 3.6 27B (dense, coding-focused) → — fits on 16 GB GPUs. For the original Qwen 3 and Qwen 3.5 families, see Qwen 3 / 3.5 GPU Requirements →.
Quick answers
- qwen3.6-35b-a3b VRAM (Q4_K_M): ~21 GB — fits on a 24 GB GPU.
- qwen3.6-35b-a3b VRAM (Q8_0): ~37 GB — needs 48 GB class or Mac M4 Max 64GB+.
- qwen3.6-35b-a3b release date: Open weights released April 16, 2026 on Hugging Face + ModelScope.
- qwen3.6-35b-a3b hardware requirements: 24 GB VRAM or 32 GB unified memory for useful Q4; 48 GB for Q8 with long context.
- Qwen3.6-27B (dense) VRAM: 16.8 GB at Q4_K_M — see dedicated page.
Qwen 3.6 family status (April 23, 2026)
| Variant | Status | VRAM Q4 | Best for |
|---|---|---|---|
| Qwen 3.6 Plus Preview | API only (March 30) | — | Cloud inference |
| Qwen3.6-35B-A3B MoE | Open weights April 16 | ~21 GB | Fast MoE chat |
| Qwen3.6-27B dense | Open weights April 22 | 16.8 GB | Coding, reasoning, vision |
Alibaba released Qwen 3.6 Plus via API on March 30-31, 2026, with a headline feature: a 1 million token native context window. Open-weight variants followed: 35B-A3B MoE on April 16, and the surprise release of a 27B dense variant on April 22 that beats the previous-gen Qwen3.5-397B-A17B flagship on coding while fitting in 16.8 GB.
Confirmed architecture (Qwen3.6-35B-A3B MoE)
| Feature | Qwen3.6-35B-A3B |
|---|---|
| Total parameters | 35 billion |
| Active per token | 3 billion (A3B) |
| Architecture | Mixture of Experts |
| Context window | 262K native / 1M via YaRN |
| License | Apache 2.0 |
| Release | April 16, 2026 on Hugging Face + ModelScope |
| Official GGUF | unsloth/Qwen3.6-35B-A3B-GGUF |
qwen3.6-35b-a3b exact VRAM table
| Quant | qwen3.6-35b-a3b VRAM | Fits on |
|---|---|---|
| Q4_K_M | ~21.4 GB | RTX 4090 24GB, RTX 5090 32GB, Mac M4 Pro 24GB |
| Q5_K_M | ~24.2 GB | RTX 5090 32GB, Mac M4 Max 36GB+ |
| Q6_K | ~28.4 GB | RTX 5090 32GB, Mac M4 Max 36GB+ |
| Q8_0 | ~37.5 GB | Dual RTX 4090, H100 80GB, Mac M4 Max 64GB |
| FP16 | ~71.8 GB | H100 80GB, Mac M4 Ultra 192GB |
These numbers assume default 4K-32K context. With the full 1M-token context, KV cache can add 20-40 GB on top — plan hardware accordingly.
qwen3.6-35b-a3b on RTX 3090 / RTX 4080 / RTX 5090
Common variant searches:
- qwen3.6-35b-a3b on RTX 3090 24GB: fits Q4_K_M tightly, Q5 requires trimming context.
- qwen3.6-35b-a3b on RTX 4080 16GB: does NOT fit at Q4 — use Qwen3.6-27B instead (16.8 GB Q4).
- qwen3.6-35b-a3b on RTX 5090 32GB: comfortable at Q6_K with long context; ideal hardware.
What hardware should I buy for Qwen 3.6 35B-A3B?
This is a buyer-focused breakdown — which GPU or Mac to actually purchase, ranked by price per token/sec.
| Tier | Hardware | VRAM / RAM | Fits | 2026 price |
|---|---|---|---|---|
| Minimum (used) | RTX 3090 24GB | 24 GB | Q4_K_M tight | $700-900 used |
| Minimum (new) | Mac M4 Pro 24GB | 24 GB unified | Q4_K_M tight | $1,999+ |
| Sweet spot | RTX 4090 24GB | 24 GB | Q4_K_M comfortable | $1,600-1,900 |
| Sweet spot | Mac M4 Max 36GB | 36 GB unified | Q4/Q5 comfortable | $3,199+ |
| Top value | RTX 5090 32GB | 32 GB | Q5/Q6 comfortable | $1,999-2,499 |
| Workstation | Mac M4 Max 64GB | 64 GB unified | Q6/Q8 + long context | $4,299+ |
| Datacenter | H100 80GB | 80 GB | Q8_0 + 1M context | $25,000+ |
Do NOT buy for the 35B-A3B class: RTX 4060 Ti 16GB, RTX 4070 Ti 16GB, RTX 5060, RTX 5070 — 16 GB is not enough even at Q4. At that VRAM tier run the dense Qwen3.6-27B (16.8 GB Q4) or Qwen3.5-27B instead.
Buyer decision tree
- Budget < $1,500: Used RTX 3090 24GB + 32 GB system RAM. Q4_K_M at ~60 tok/s.
- Budget $1,500-2,500 (Windows/Linux): New RTX 4090 24GB or RTX 5090 32GB. Q4-Q6 comfortably.
- Budget $2,500-4,500 (Mac preference): Mac M4 Max 36-64 GB. Unified memory wins for long context.
- Budget > $10,000 / team workstation: Dual RTX 5090 or Mac Studio M3 Ultra 96GB. Headroom for Q8 and concurrent models.
Expected tokens-per-second (Q4_K_M)
MoE sparsity means the 35B-A3B runs roughly at the speed of a 3B dense model despite loading 35B of weights:
| Hardware | qwen3.6-35b-a3b Q4 | Notes |
|---|---|---|
| RTX 3090 24GB | ~55-65 tok/s | Cheapest full-quality path |
| RTX 4090 24GB | ~70 tok/s | Sweet spot |
| RTX 5090 32GB | ~90 tok/s | Best consumer tier |
| Mac M4 Pro 24GB | ~35 tok/s | Good for interactive use |
| Mac M4 Max 64GB | ~42 tok/s | Room for Q5 + long context |
| H100 80GB | ~110 tok/s | Datacenter-tier |
Should I pick 27B dense or 35B-A3B MoE?
| Aspect | Qwen3.6-27B dense | Qwen3.6-35B-A3B MoE |
|---|---|---|
| VRAM Q4 | 16.8 GB | 21 GB |
| Coding (SWE-bench) | 77.2% | ~72% |
| Throughput at 24GB | Slower | Faster (MoE sparsity) |
| Vision / multimodal | ✅ | Text-only |
| 1M context support | ✅ (YaRN) | ✅ (native 1M) |
- Pick 27B dense if you code, need vision, or run on a 16 GB GPU.
- Pick 35B-A3B MoE if you want fastest tok/s on 24GB+, or agentic long-context workflows.
Full deep dive on the 27B: Qwen3.6-27B VRAM Requirements.
Related guides
- Qwen3.6-27B VRAM Requirements (dense, coding-king) — the new dense 27B
- Qwen3.6-35B-A3B Release Date — timeline detail
- Qwen 3.6 vs Gemma 4 — 27B Head-to-Head
- Qwen 3.5 35B-A3B VRAM Requirements — previous-gen sibling
- Qwen 3.5 Complete Guide — exact numbers for every Qwen 3.5 variant
- Qwen 3 / 3.5 Family GPU Requirements — original family overview
- VRAM Calculator — check any model against your hardware