How much VRAM does Qwen 3.6 need?

The Qwen3.6-35B-A3B MoE needs ~21 GB at Q4_K_M (same class as qwen3.5-35b-a3b which weighs 19.6 GB at Q4). Plan on 24 GB VRAM or 32 GB unified memory for useful context lengths. At Q8_0 it needs ~37 GB. The smaller dense Qwen3.6-27B needs ~16.8 GB at Q4 — see its dedicated page.

What are the hardware requirements for Qwen 3.6 35B-A3B?

At Q4_K_M (~21 GB) you need a 24 GB GPU (RTX 3090, RTX 4090, RTX 5090) or a Mac with 24-32 GB unified memory (M4 Pro 24GB tight, M4 Max 36GB comfortable). Q6_K comfortably fits RTX 5090 32GB or Mac M4 Max 36GB+. Q8_0 needs 48 GB class hardware (dual RTX 4090, H100 80GB, or Mac M4 Max 64GB).

What GPU should I buy to run Qwen 3.6 35B-A3B?

Cheapest path: a used RTX 3090 24GB (~$700-900) runs Q4_K_M at ~55-65 tok/s. Best new-GPU value: RTX 5090 32GB for Q6_K with headroom. Best all-rounder: Mac M4 Max 36-64GB for long-context unified memory. Avoid 16 GB cards (RTX 4060 Ti 16GB, RTX 5070) — 16 GB is not enough at Q4; run the dense Qwen3.6-27B instead.

Can I run Qwen 3.6 35B-A3B on 16 GB VRAM?

Not practically — Q4_K_M is ~21 GB, so 16 GB cards force a slow CPU/RAM offload (~5-10 tok/s). On 16 GB, run the dense Qwen3.6-27B (~16.8 GB Q4) instead, or step down to Qwen3.5-27B / Qwen3.5-9B.

How does Qwen 3.6 35B-A3B VRAM compare to Qwen 3.5 35B-A3B?

Same parameter count (35B total, 3B active), so VRAM is essentially identical: ~19.6 GB Q4 for Qwen 3.5 vs ~21 GB Q4 for Qwen 3.6 at default context. Any GPU that runs Qwen3.5-35B-A3B today runs Qwen3.6-35B-A3B. The difference is the 1M-token native context — at long context the KV cache can add tens of GB.

Is Qwen 3.6 available for local inference?

Yes. Open weights for Qwen3.6-35B-A3B shipped April 16, 2026 (Apache 2.0) on Hugging Face + ModelScope, with Unsloth/bartowski GGUFs following within 24-48h. The dense Qwen3.6-27B followed April 22, 2026.

What is new in Qwen 3.6?

The headline feature is a 1 million token native context window, four times larger than Qwen 3.5's 262K limit. This makes it particularly strong for long-document analysis, codebase understanding, and multi-turn agentic workflows.

April 20, 2026Updated May 20, 2026qwen, alibaba, vram, gpu-requirements, hardware-requirements, qwen-3-6, qwen3.6-35b-a3b, moe

Qwen 3.6 VRAM & Hardware Requirements — 35B-A3B MoE GPU Guide (2026)

Qwen 3.6 35B-A3B MoE: Q4_K_M ~21 GB, fits RTX 4090 24GB or Mac M4 Pro. Q8 ~37 GB needs 48 GB class. GPU and Mac buyer guide for 1M-context MoE.

This page covers Qwen 3.6 VRAM requirements and Qwen 3.6 hardware requirements for the 35B-A3B MoE variant across every quantization level (Q4_K_M, Q5_K_M, Q6_K, Q8_0, FP16), plus a buyer's guide to which GPU or Mac to actually pick and the 1M-context considerations for this MoE.

If you search for qwen 3.6 vram requirements, qwen3.6-35b-a3b vram requirements, or qwen3.6-35b-a3b hardware requirements, this is the canonical reference. Looking for the dense 27B instead? Jump to Qwen3.6-27B VRAM Requirements.

New: the dense Qwen3.6-27B released April 22, 2026 — smaller (16.8 GB Q4 vs 21 GB), better on coding benchmarks, and includes vision. See Qwen3.6-27B VRAM Requirements for the dedicated guide.

Also in the Qwen 3.6 family: Qwen 3.6 27B (dense, coding-focused) → — fits on 16 GB GPUs. For the original Qwen 3 and Qwen 3.5 families, see Qwen 3 / 3.5 GPU Requirements →.

Quick answers

qwen3.6-35b-a3b VRAM (Q4_K_M): ~21 GB — fits on a 24 GB GPU.
qwen3.6-35b-a3b VRAM (Q8_0): ~37 GB — needs 48 GB class or Mac M4 Max 64GB+.
qwen3.6-35b-a3b release date: Open weights released April 16, 2026 on Hugging Face + ModelScope.
qwen3.6-35b-a3b hardware requirements: 24 GB VRAM or 32 GB unified memory for useful Q4; 48 GB for Q8 with long context.
Qwen3.6-27B (dense) VRAM: 16.8 GB at Q4_K_M — see dedicated page.

Qwen 3.6 family status (April 23, 2026)

Variant	Status	VRAM Q4	Best for
Qwen 3.6 Plus Preview	API only (March 30)	—	Cloud inference
Qwen3.6-35B-A3B MoE	Open weights April 16	~21 GB	Fast MoE chat
Qwen3.6-27B dense	Open weights April 22	16.8 GB	Coding, reasoning, vision

Alibaba released Qwen 3.6 Plus via API on March 30-31, 2026, with a headline feature: a 1 million token native context window. Open-weight variants followed: 35B-A3B MoE on April 16, and the surprise release of a 27B dense variant on April 22 that beats the previous-gen Qwen3.5-397B-A17B flagship on coding while fitting in 16.8 GB.

Confirmed architecture (Qwen3.6-35B-A3B MoE)

Feature	Qwen3.6-35B-A3B
Total parameters	35 billion
Active per token	3 billion (A3B)
Architecture	Mixture of Experts
Context window	262K native / 1M via YaRN
License	Apache 2.0
Release	April 16, 2026 on Hugging Face + ModelScope
Official GGUF	unsloth/Qwen3.6-35B-A3B-GGUF

qwen3.6-35b-a3b exact VRAM table

Quant	qwen3.6-35b-a3b VRAM	Fits on
Q4_K_M	~21.4 GB	RTX 4090 24GB, RTX 5090 32GB, Mac M4 Pro 24GB
Q5_K_M	~24.2 GB	RTX 5090 32GB, Mac M4 Max 36GB+
Q6_K	~28.4 GB	RTX 5090 32GB, Mac M4 Max 36GB+
Q8_0	~37.5 GB	Dual RTX 4090, H100 80GB, Mac M4 Max 64GB
FP16	~71.8 GB	H100 80GB, Mac M4 Ultra 192GB

These numbers assume default 4K-32K context. With the full 1M-token context, KV cache can add 20-40 GB on top — plan hardware accordingly.

qwen3.6-35b-a3b on RTX 3090 / RTX 4080 / RTX 5090

Common variant searches:

qwen3.6-35b-a3b on RTX 3090 24GB: fits Q4_K_M tightly, Q5 requires trimming context.
qwen3.6-35b-a3b on RTX 4080 16GB: does NOT fit at Q4 — use Qwen3.6-27B instead (16.8 GB Q4).
qwen3.6-35b-a3b on RTX 5090 32GB: comfortable at Q6_K with long context; ideal hardware.

What hardware should I buy for Qwen 3.6 35B-A3B?

This is a buyer-focused breakdown — which GPU or Mac to actually purchase, ranked by price per token/sec.

Tier	Hardware	VRAM / RAM	Fits	2026 price
Minimum (used)	RTX 3090 24GB	24 GB	Q4_K_M tight	$700-900 used
Minimum (new)	Mac M4 Pro 24GB	24 GB unified	Q4_K_M tight	$1,999+
Sweet spot	RTX 4090 24GB	24 GB	Q4_K_M comfortable	$1,600-1,900
Sweet spot	Mac M4 Max 36GB	36 GB unified	Q4/Q5 comfortable	$3,199+
Top value	RTX 5090 32GB	32 GB	Q5/Q6 comfortable	$1,999-2,499
Workstation	Mac M4 Max 64GB	64 GB unified	Q6/Q8 + long context	$4,299+
Datacenter	H100 80GB	80 GB	Q8_0 + 1M context	$25,000+

Do NOT buy for the 35B-A3B class: RTX 4060 Ti 16GB, RTX 4070 Ti 16GB, RTX 5060, RTX 5070 — 16 GB is not enough even at Q4. At that VRAM tier run the dense Qwen3.6-27B (16.8 GB Q4) or Qwen3.5-27B instead.

Buyer decision tree

Budget < $1,500: Used RTX 3090 24GB + 32 GB system RAM. Q4_K_M at ~60 tok/s.
Budget $1,500-2,500 (Windows/Linux): New RTX 4090 24GB or RTX 5090 32GB. Q4-Q6 comfortably.
Budget $2,500-4,500 (Mac preference): Mac M4 Max 36-64 GB. Unified memory wins for long context.
Budget > $10,000 / team workstation: Dual RTX 5090 or Mac Studio M3 Ultra 96GB. Headroom for Q8 and concurrent models.

Expected tokens-per-second (Q4_K_M)

MoE sparsity means the 35B-A3B runs roughly at the speed of a 3B dense model despite loading 35B of weights:

Hardware	qwen3.6-35b-a3b Q4	Notes
RTX 3090 24GB	~55-65 tok/s	Cheapest full-quality path
RTX 4090 24GB	~70 tok/s	Sweet spot
RTX 5090 32GB	~90 tok/s	Best consumer tier
Mac M4 Pro 24GB	~35 tok/s	Good for interactive use
Mac M4 Max 64GB	~42 tok/s	Room for Q5 + long context
H100 80GB	~110 tok/s	Datacenter-tier

Should I pick 27B dense or 35B-A3B MoE?

Aspect	Qwen3.6-27B dense	Qwen3.6-35B-A3B MoE
VRAM Q4	16.8 GB	21 GB
Coding (SWE-bench)	77.2%	~72%
Throughput at 24GB	Slower	Faster (MoE sparsity)
Vision / multimodal	✅	Text-only
1M context support	✅ (YaRN)	✅ (native 1M)

Pick 27B dense if you code, need vision, or run on a 16 GB GPU.
Pick 35B-A3B MoE if you want fastest tok/s on 24GB+, or agentic long-context workflows.

Full deep dive on the 27B: Qwen3.6-27B VRAM Requirements.

Related guides

Qwen3.6-27B VRAM Requirements (dense, coding-king) — the new dense 27B
Qwen3.6-35B-A3B Release Date — timeline detail
Qwen 3.6 vs Gemma 4 — 27B Head-to-Head
Qwen 3.5 35B-A3B VRAM Requirements — previous-gen sibling
Qwen 3.5 Complete Guide — exact numbers for every Qwen 3.5 variant
Qwen 3 / 3.5 Family GPU Requirements — original family overview
VRAM Calculator — check any model against your hardware