Alibaba

Qwen 3.6 35B A3B

Name: Qwen 3.6 35B A3B
Rating: 94 (64 reviews)
Author: Alibaba

Frontier

6.7MDownloads2.4KLikesApr 2026Released262K tokensContextApache 2.0License98 ExceptionalQuality

Qwen 3.6 35B A3B (35B parameters) requires approximately 28.5 GB of VRAM with Q4_K_M quantization. As a Mixture of Experts model with 3B active parameters, it uses less memory than its total parameter count suggests. For the best balance of quality and speed, we recommend hardware with at least 33 GB of VRAM.

Get started

— copy & paste to run locally

Copy-paste commands to run Qwen 3.6 35B A3B on your machine.

Run

docker run --rm -it ghcr.io/ggerganov/llama.cpp:full \
  --hf-repo "Qwen/Qwen3.6-35B-A3B" \
  --hf-file "Qwen3.6-35B-A3B-Q4_K_M.gguf" \
  -c 4096 -ngl 99

Quick specs

Parameters35B (3B active)

Architecturemoe (MoE)

Context262K tokens

Modalitytext+vision

Min RAM13.7 GB

Rec. RAM21.3 GB (Q4_K_M)

LicenseApache 2.0

FamilyQwen

✓ Vision✓ Code✓ Chat✓ Reasoning

About this model

Qwen 3.6 35B A3B is the first open-weight Qwen 3.6 model, a multimodal MoE release focused on stronger agentic coding, long-context reasoning, and more stable repository-scale workflows.

•35B total params with only 3B active per token
•262K native context with preserve-thinking support
•Multimodal open-weights model tuned for coding and agent workflows

Related models

Inference speed

Qwen 3.6 35B A3B inference speed — tokens per second by GPU & Mac

Estimated decode speed (tokens/sec) for Qwen 3.6 35B A3B at Q4_K_M across popular GPUs and Apple Silicon, using the fastest local runtime per device. Fastest is RTX 5090 32GB at ~153 tok/s. Speed is memory-bandwidth bound, so cards that fit the whole model in VRAM run far faster than ones that offload to system RAM.

GPU / Mac	Memory	Quant	Speed (tok/s)	Fits?
RTX 5090 32GB	32 GB	Q4_K_M	152.6	Offloads
Mac Studio M3 Ultra 256GB	256 GB	Q4_K_M	70.8	Fits
Mac Studio M2 Ultra 128GB	128 GB	Q4_K_M	59.0	Fits
Mac Studio M1 Ultra 128GB	128 GB	Q4_K_M	55.9	Fits
MacBook Pro M4 Max 128GB	128 GB	Q4_K_M	43.7	Fits
MacBook Pro M4 Max 64GB	64 GB	Q4_K_M	43.7	Fits
RTX 4090 24GB	24 GB	Q4_K_M	34.1	Too big
RX 7900 XTX 24GB	24 GB	Q4_K_M	30.8	Too big
MacBook Pro M3 Max 64GB	64 GB	Q4_K_M	30.5	Fits
RTX 3090 24GB	24 GB	Q4_K_M	29.2	Too big
MacBook Pro M1 Max 64GB	64 GB	Q4_K_M	28.0	Fits
MacBook Pro M4 Pro 48GB	48 GB	Q4_K_M	26.7	Tight
RTX 4080 Super 16GB	16 GB	Q4_K_M	12.2	Too big
RTX 4070 12GB	12 GB	Q4_K_M	5.5	Too big
RTX 3060 12GB	12 GB	Q4_K_M	3.4	Too big
RTX 4060 8GB	8 GB	Q4_K_M	2.9	Too big

Estimates for single-stream decoding at Q4_K_M; real tokens/sec varies with prompt length, context, batch size, and runtime build. Prompt processing (prefill) is faster than the decode figures shown here.

Compare

How does Qwen 3.6 35B A3B compare?

Qwen 3.6 35B A3B vs Gemma 4 31B Qwen 3.6 35B A3B vs Qwen 3.6 27B

Quick picks

Best budgetS

Mac mini M4 64GB~$1,099 — 11 tok/s

Best overallS

NVIDIA A100 40GB~$10,000 — 126 tok/s

Best hardware

Top picks for Qwen 3.6 35B A3B

RTX PRO 5000 Blackwell 48GBS

Run this model

Qwen 3.6 35B A3B on NVIDIA A100 40GB Qwen 3.6 35B A3B on RTX 6000 Ada 48GB Qwen 3.6 35B A3B on RTX PRO 5000 Blackwell 48GB

Quantization

Qwen 3.6 35B A3B quantization — VRAM & quality by quant level

How much VRAM Qwen 3.6 35B A3B (35B) needs at each GGUF quant, and whether it fits a 24 GB card (RTX 4090 / 3090). The recommended Q4_K_M uses ~21.3 GB — about 43% less VRAM than Q8_0, at a small quality cost.

Quant	Bits	VRAM (weights)	Quality	Fits 24 GB?
Q2_K	2	13.7 GB	Low	Tight
Q3_K_S	3	17.2 GB	Low	Offloads
NVFP4	4	19.6 GB	Medium	Heavy offload
Q4_K_Mrecommended	4	21.3 GB	Medium	Heavy offload
Q5_K_M	5	25.2 GB	High	Too big
Q6_K	6	28.7 GB	High	Too big
Q8_0	8	37.5 GB	Very High	Too big
F16	16	71.8 GB	Maximum	Too big

VRAM shown is quantized weights only; add ~1–3 GB runtime overhead plus KV cache for your context length. Lower quants trade quality for memory — Q4_K_M is the usual sweet spot; Q2/Q3 only when you must fit a bigger model.

Quality benchmarks

Qwen 3.6 35B A3B benchmark scores

Benchmark verified

Coding

SWE-bench Verified73.4%

HumanEval+—

Aider Polyglot—

LiveCodeBench80.4%

Reasoning

MMLU-Pro85.2%

GPQA Diamond86.0%

MATH-500—

ARC Challenge—

Source: official · 2026-04-15

Hardware compatibility

Fit estimates across all hardware

Open calculator

Computing compatibility...

Memory breakdown

Reference: RTX 2060 6GB

Weights21.3 GB

KV Cache4.1 GB

Runtime2.4 GB

Headroom0.6 GB

Frequently asked questions

FAQ — Qwen 3.6 35B A3B

How much VRAM does Qwen 3.6 35B A3B need?

Qwen 3.6 35B A3B (35B parameters) requires approximately 28.5 GB of VRAM with Q4_K_M quantization. Lower quantizations like Q4_K_M use less memory but may reduce quality.

Can I run Qwen 3.6 35B A3B on a Mac mini M4 64GB?

Yes, Mac mini M4 64GB can run Qwen 3.6 35B A3B with a compatibility score of 92/100. It provides 64 GB of memory and achieves approximately 11.0 tokens per second.

What is the best quantization for Qwen 3.6 35B A3B?

The recommended quantization for Qwen 3.6 35B A3B is Q4_K_M, which offers the best balance between model quality and memory efficiency. Higher quantizations preserve more quality but require more VRAM.

What hardware is recommended for Qwen 3.6 35B A3B?

The top recommended hardware for Qwen 3.6 35B A3B: NVIDIA A100 40GB (score: 99/100), RTX 6000 Ada 48GB (score: 99/100), RTX PRO 5000 Blackwell 48GB (score: 99/100). These provide the best combination of memory, bandwidth, and compute for running this model locally.

Is Qwen 3.6 35B A3B good for chat?

Yes, Qwen 3.6 35B A3B is well-suited for chat as well as coding, reasoning, vision, agentic. It was designed with these use cases in mind.