Will It Run AI
qwen, alibaba, vram, gpu-requirements, hardware-requirements, qwen-3-6, qwen3.6-35b-a3b, moe

Qwen 3.6 VRAM & Hardware Requirements — 35B-A3B MoE GPU Guide (2026)

Qwen 3.6 35B-A3B MoE: Q4_K_M ~21 GB, fits RTX 4090 24GB or Mac M4 Pro. Q8 ~37 GB needs 48 GB class. GPU and Mac buyer guide for 1M-context MoE.

This page covers Qwen 3.6 VRAM requirements and Qwen 3.6 hardware requirements for the 35B-A3B MoE variant across every quantization level (Q4_K_M, Q5_K_M, Q6_K, Q8_0, FP16), plus a buyer's guide to which GPU or Mac to actually pick and the 1M-context considerations for this MoE.

If you search for qwen 3.6 vram requirements, qwen3.6-35b-a3b vram requirements, or qwen3.6-35b-a3b hardware requirements, this is the canonical reference. Looking for the dense 27B instead? Jump to Qwen3.6-27B VRAM Requirements.

New: the dense Qwen3.6-27B released April 22, 2026 — smaller (16.8 GB Q4 vs 21 GB), better on coding benchmarks, and includes vision. See Qwen3.6-27B VRAM Requirements for the dedicated guide.

Also in the Qwen 3.6 family: Qwen 3.6 27B (dense, coding-focused) → — fits on 16 GB GPUs. For the original Qwen 3 and Qwen 3.5 families, see Qwen 3 / 3.5 GPU Requirements →.

Quick answers

  • qwen3.6-35b-a3b VRAM (Q4_K_M): ~21 GB — fits on a 24 GB GPU.
  • qwen3.6-35b-a3b VRAM (Q8_0): ~37 GB — needs 48 GB class or Mac M4 Max 64GB+.
  • qwen3.6-35b-a3b release date: Open weights released April 16, 2026 on Hugging Face + ModelScope.
  • qwen3.6-35b-a3b hardware requirements: 24 GB VRAM or 32 GB unified memory for useful Q4; 48 GB for Q8 with long context.
  • Qwen3.6-27B (dense) VRAM: 16.8 GB at Q4_K_M — see dedicated page.

Qwen 3.6 family status (April 23, 2026)

VariantStatusVRAM Q4Best for
Qwen 3.6 Plus PreviewAPI only (March 30)Cloud inference
Qwen3.6-35B-A3B MoEOpen weights April 16~21 GBFast MoE chat
Qwen3.6-27B denseOpen weights April 2216.8 GBCoding, reasoning, vision

Alibaba released Qwen 3.6 Plus via API on March 30-31, 2026, with a headline feature: a 1 million token native context window. Open-weight variants followed: 35B-A3B MoE on April 16, and the surprise release of a 27B dense variant on April 22 that beats the previous-gen Qwen3.5-397B-A17B flagship on coding while fitting in 16.8 GB.

Confirmed architecture (Qwen3.6-35B-A3B MoE)

FeatureQwen3.6-35B-A3B
Total parameters35 billion
Active per token3 billion (A3B)
ArchitectureMixture of Experts
Context window262K native / 1M via YaRN
LicenseApache 2.0
ReleaseApril 16, 2026 on Hugging Face + ModelScope
Official GGUFunsloth/Qwen3.6-35B-A3B-GGUF

qwen3.6-35b-a3b exact VRAM table

Quantqwen3.6-35b-a3b VRAMFits on
Q4_K_M~21.4 GBRTX 4090 24GB, RTX 5090 32GB, Mac M4 Pro 24GB
Q5_K_M~24.2 GBRTX 5090 32GB, Mac M4 Max 36GB+
Q6_K~28.4 GBRTX 5090 32GB, Mac M4 Max 36GB+
Q8_0~37.5 GBDual RTX 4090, H100 80GB, Mac M4 Max 64GB
FP16~71.8 GBH100 80GB, Mac M4 Ultra 192GB

These numbers assume default 4K-32K context. With the full 1M-token context, KV cache can add 20-40 GB on top — plan hardware accordingly.

qwen3.6-35b-a3b on RTX 3090 / RTX 4080 / RTX 5090

Common variant searches:

  • qwen3.6-35b-a3b on RTX 3090 24GB: fits Q4_K_M tightly, Q5 requires trimming context.
  • qwen3.6-35b-a3b on RTX 4080 16GB: does NOT fit at Q4 — use Qwen3.6-27B instead (16.8 GB Q4).
  • qwen3.6-35b-a3b on RTX 5090 32GB: comfortable at Q6_K with long context; ideal hardware.

What hardware should I buy for Qwen 3.6 35B-A3B?

This is a buyer-focused breakdown — which GPU or Mac to actually purchase, ranked by price per token/sec.

TierHardwareVRAM / RAMFits2026 price
Minimum (used)RTX 3090 24GB24 GBQ4_K_M tight$700-900 used
Minimum (new)Mac M4 Pro 24GB24 GB unifiedQ4_K_M tight$1,999+
Sweet spotRTX 4090 24GB24 GBQ4_K_M comfortable$1,600-1,900
Sweet spotMac M4 Max 36GB36 GB unifiedQ4/Q5 comfortable$3,199+
Top valueRTX 5090 32GB32 GBQ5/Q6 comfortable$1,999-2,499
WorkstationMac M4 Max 64GB64 GB unifiedQ6/Q8 + long context$4,299+
DatacenterH100 80GB80 GBQ8_0 + 1M context$25,000+

Do NOT buy for the 35B-A3B class: RTX 4060 Ti 16GB, RTX 4070 Ti 16GB, RTX 5060, RTX 5070 — 16 GB is not enough even at Q4. At that VRAM tier run the dense Qwen3.6-27B (16.8 GB Q4) or Qwen3.5-27B instead.

Buyer decision tree

  • Budget < $1,500: Used RTX 3090 24GB + 32 GB system RAM. Q4_K_M at ~60 tok/s.
  • Budget $1,500-2,500 (Windows/Linux): New RTX 4090 24GB or RTX 5090 32GB. Q4-Q6 comfortably.
  • Budget $2,500-4,500 (Mac preference): Mac M4 Max 36-64 GB. Unified memory wins for long context.
  • Budget > $10,000 / team workstation: Dual RTX 5090 or Mac Studio M3 Ultra 96GB. Headroom for Q8 and concurrent models.

Expected tokens-per-second (Q4_K_M)

MoE sparsity means the 35B-A3B runs roughly at the speed of a 3B dense model despite loading 35B of weights:

Hardwareqwen3.6-35b-a3b Q4Notes
RTX 3090 24GB~55-65 tok/sCheapest full-quality path
RTX 4090 24GB~70 tok/sSweet spot
RTX 5090 32GB~90 tok/sBest consumer tier
Mac M4 Pro 24GB~35 tok/sGood for interactive use
Mac M4 Max 64GB~42 tok/sRoom for Q5 + long context
H100 80GB~110 tok/sDatacenter-tier

Should I pick 27B dense or 35B-A3B MoE?

AspectQwen3.6-27B denseQwen3.6-35B-A3B MoE
VRAM Q416.8 GB21 GB
Coding (SWE-bench)77.2%~72%
Throughput at 24GBSlowerFaster (MoE sparsity)
Vision / multimodalText-only
1M context support✅ (YaRN)✅ (native 1M)
  • Pick 27B dense if you code, need vision, or run on a 16 GB GPU.
  • Pick 35B-A3B MoE if you want fastest tok/s on 24GB+, or agentic long-context workflows.

Full deep dive on the 27B: Qwen3.6-27B VRAM Requirements.

Related guides

Frequently Asked Questions

How much VRAM does Qwen 3.6 need?

The Qwen3.6-35B-A3B MoE needs ~21 GB at Q4_K_M (same class as qwen3.5-35b-a3b which weighs 19.6 GB at Q4). Plan on 24 GB VRAM or 32 GB unified memory for useful context lengths. At Q8_0 it needs ~37 GB. The smaller dense Qwen3.6-27B needs ~16.8 GB at Q4 — see its dedicated page.

What are the hardware requirements for Qwen 3.6 35B-A3B?

At Q4_K_M (~21 GB) you need a 24 GB GPU (RTX 3090, RTX 4090, RTX 5090) or a Mac with 24-32 GB unified memory (M4 Pro 24GB tight, M4 Max 36GB comfortable). Q6_K comfortably fits RTX 5090 32GB or Mac M4 Max 36GB+. Q8_0 needs 48 GB class hardware (dual RTX 4090, H100 80GB, or Mac M4 Max 64GB).

What GPU should I buy to run Qwen 3.6 35B-A3B?

Cheapest path: a used RTX 3090 24GB (~$700-900) runs Q4_K_M at ~55-65 tok/s. Best new-GPU value: RTX 5090 32GB for Q6_K with headroom. Best all-rounder: Mac M4 Max 36-64GB for long-context unified memory. Avoid 16 GB cards (RTX 4060 Ti 16GB, RTX 5070) — 16 GB is not enough at Q4; run the dense Qwen3.6-27B instead.

Can I run Qwen 3.6 35B-A3B on 16 GB VRAM?

Not practically — Q4_K_M is ~21 GB, so 16 GB cards force a slow CPU/RAM offload (~5-10 tok/s). On 16 GB, run the dense Qwen3.6-27B (~16.8 GB Q4) instead, or step down to Qwen3.5-27B / Qwen3.5-9B.

How does Qwen 3.6 35B-A3B VRAM compare to Qwen 3.5 35B-A3B?

Same parameter count (35B total, 3B active), so VRAM is essentially identical: ~19.6 GB Q4 for Qwen 3.5 vs ~21 GB Q4 for Qwen 3.6 at default context. Any GPU that runs Qwen3.5-35B-A3B today runs Qwen3.6-35B-A3B. The difference is the 1M-token native context — at long context the KV cache can add tens of GB.

Is Qwen 3.6 available for local inference?

Yes. Open weights for Qwen3.6-35B-A3B shipped April 16, 2026 (Apache 2.0) on Hugging Face + ModelScope, with Unsloth/bartowski GGUFs following within 24-48h. The dense Qwen3.6-27B followed April 22, 2026.

What is new in Qwen 3.6?

The headline feature is a 1 million token native context window, four times larger than Qwen 3.5's 262K limit. This makes it particularly strong for long-document analysis, codebase understanding, and multi-turn agentic workflows.