Alibaba
Qwen 3.5 397B A17B (397B parameters) requires approximately 246.5 GB of VRAM with Q4_K_M quantization. As a Mixture of Experts model with 17B active parameters, it uses less memory than its total parameter count suggests. For the best balance of quality and speed, we recommend hardware with at least 284 GB of VRAM.
Get started
— copy & paste to run locallyCopy-paste commands to run Qwen 3.5 397B A17B on your machine.
Run
docker run --rm -it ghcr.io/ggerganov/llama.cpp:full \
--hf-repo "Qwen/Qwen3.5-397B-A17B-Instruct" \
--hf-file "Qwen3.5-397B-A17B-Instruct-Q4_K_M.gguf" \
-c 4096 -ngl 99Quick specs
About this model
Related models
Quick picks
Best hardware
Run this model
Quantization options
No hardware detected — fit column shows raw VRAM estimates
| Quant | Bits | VRAM | Quality | Fit |
|---|---|---|---|---|
Q2_K | 2 | 154.8 GB | Low | — |
Q3_K_S | 3 | 194.5 GB | Low | — |
NVFP4 | 4 | 222.3 GB | Medium | — |
Q4_K_M | 4 | 242.2 GB | Medium | — |
Q5_K_M | 5 | 285.8 GB | High | — |
Q6_K | 6 | 325.5 GB | High | — |
Q8_0 | 8 | 424.8 GB | Very High | — |
F16 | 16 | 813.8 GB | Maximum | — |
Quality benchmarks
Coding
Reasoning
General
Source: official · 2025-06-25
Hardware compatibility
Computing compatibility...
Memory breakdown
Frequently asked questions
Qwen 3.5 397B A17B (397B parameters) requires approximately 246.5 GB of VRAM with Q4_K_M quantization. Lower quantizations like Q4_K_M use less memory but may reduce quality.
The recommended quantization for Qwen 3.5 397B A17B is Q4_K_M, which offers the best balance between model quality and memory efficiency. Higher quantizations preserve more quality but require more VRAM.
The top recommended hardware for Qwen 3.5 397B A17B: AMD Instinct MI350X 288GB (score: 97/100), AMD Instinct MI325X 256GB (score: 84/100). These provide the best combination of memory, bandwidth, and compute for running this model locally.
Yes, Qwen 3.5 397B A17B is well-suited for chat as well as coding, reasoning, agentic. It was designed with these use cases in mind.
See also