Best Coding LLMs for Apple Silicon 24GB — Ranked 2026
Top local coding LLMs for 24GB Apple Silicon (M4 Pro, M3 Pro): Qwen3 Coder 30B, Qwen3.5-35B-A3B, DeepSeek Coder V2.5 ranked by SWE-bench and tok/s.
Best local coding LLMs for 24GB Apple Silicon in 2026 — ranked picks for M4 Pro, M4 Max 36GB, and M3 Pro, with tok/s estimates, recommended quantization, and integration notes for Cursor / Continue.dev / VSCode.
For the ranked model list against your specific hardware, see:
Top coding picks at 24GB unified memory
| Rank | Model | VRAM Q4 | tok/s (M4 Pro) | Best for |
|---|---|---|---|---|
| 1 | Qwen 3 Coder 30B-A3B | ~17 GB | ~30-35 | Overall coding champion; MoE sparsity keeps inference fast |
| 2 | Qwen 3.5 35B-A3B | ~21 GB | ~30 | Tight but strong general+coding MoE |
| 3 | Qwen 3 Coder 14B | ~8 GB | ~55 | Fastest respectable coding model; perfect for Cursor-style flows |
| 4 | Qwen 3.5 27B | ~16 GB | ~35 | Dense alternative; more predictable latency |
| 5 | DeepSeek Coder V2.5 Lite | ~14 GB | ~40 | Different style, strong on Python/TS |
| 6 | Qwen 3 14B | ~8 GB | ~50 | Not fine-tuned for code but fast and capable |
| 7 | Gemma 3 9B | ~6 GB | ~60 | Lightweight fallback; good for quick Q&A |
Why Qwen 3 Coder 30B-A3B wins
The MoE architecture (30B total, 3B active per token) gives it the knowledge breadth of a 30B dense model while running at the speed of a 3B dense model. On a 24GB M4 Pro Mac you get:
- ~17 GB loaded into unified memory
- ~7 GB headroom for KV cache and macOS/apps
- ~30-35 tok/s sustained (active-cooled Pro)
- Full 262K context without extra memory pressure
For repo-level refactors and agentic workflows (where the model generates multiple tool-calls per turn), this combination is unmatched at 24GB.
When to pick Qwen 3.5 35B-A3B instead
If you want the general-purpose MoE (chat + coding + reasoning), Qwen 3.5 35B-A3B edges out Qwen 3 Coder 30B-A3B on non-code tasks. Coding performance is very close. The cost is ~4 GB more VRAM — on 24GB Macs this means fewer open apps during sessions.
When open weights ship, Qwen3.6-35B-A3B will inherit this slot with the added 1M-context capability for agentic coding.
Quantization: why you want Q5_K_M for code
Code is syntax-sensitive. A missing bracket or quote character due to aggressive quantization destroys the output. Q4_K_M is acceptable for chat-style coding assistance but we have seen reliable quality gains moving to Q5_K_M or Q6_K:
| Quant | 30B-A3B VRAM | Code quality delta vs FP16 |
|---|---|---|
| Q4_K_M | ~17 GB | -3 to -5% (occasional syntax slips) |
| Q5_K_M | ~20 GB | -1 to -2% (effectively identical for most tasks) |
| Q6_K | ~24 GB | < -1% (near-lossless; won't fit 30B-A3B on 24GB Mac) |
| Q8_0 | ~32 GB | No measurable delta (requires 32GB+ Mac) |
On a 24GB Mac, stick with Q4_K_M for the 30B-A3B class. If you have a 36GB+ Mac, step up to Q5 or Q6.
Integration with coding tools
All of the picks above expose an OpenAI-compatible endpoint via Ollama or LM Studio, so any tool that speaks OpenAI works.
Ollama (recommended):
ollama pull qwen3-coder:30b-a3b
ollama run qwen3-coder:30b-a3b
# endpoint: http://localhost:11434/v1
LM Studio: Search Qwen3-Coder-30B-A3B-Instruct-GGUF, pick Q4_K_M, start server.
Cursor:
- Settings → Models → Add custom model
- Base URL:
http://localhost:11434/v1 - Model:
qwen3-coder:30b-a3b
Continue.dev (VSCode):
{
"models": [
{
"title": "Qwen 3 Coder 30B-A3B (local)",
"provider": "ollama",
"model": "qwen3-coder:30b-a3b"
}
]
}
MLX vs GGUF on Apple Silicon
- MLX (Apple's native ML framework) delivers ~15-25% faster tok/s than llama.cpp GGUF on M-series chips.
- GGUF is more mature, has wider tool support (Ollama, LM Studio, Continue.dev out of the box), and the ecosystem is larger.
- Recommendation for 2026: Start with GGUF via Ollama for ease of use. If you hit bandwidth limits and want the extra tok/s, switch to MLX with mlx-community models — see our Qwen 3.5 MLX guide.
What about coding on smaller Macs (16 GB)?
If you have a 16 GB Mac, the coding LLM roster is different — see Best AI models for a 16GB Mac for the tailored list. Short version: Qwen 3 Coder 14B at Q4_K_M or Gemma 4 E4B at Q8 are the daily drivers.