Qwen

Qwen 3.5 9B
FrontierLançado em Jun 2025Hugging FaceOllamaLM Studio
Por que vence
Qwen 3.5 9B is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Posição #1
SRunsMEASURED
Status de encaixe
Runs well
Encaixe: Runs well com contexto seguro de 32K.
Suporte de runtime: native via GGUF em cuda-local.
Pesos: 5.5 GB
Cache KV: 2.2 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Pontuação 122.0 combina correspondência de carga de trabalho, atualidade do catálogo, segurança de encaixe, cobertura de contexto, escolha de artefato, utilização de memória, throughput e latência.
CodeGeeX

CodeGeeX 4 9B
AtualLançado em Jul 2024Hugging FaceOllama
Por que vence
CodeGeeX 4 9B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Posição #2
ARunsEST.
Status de encaixe
Runs well
Encaixe: Runs well com contexto seguro de 116K.
Suporte de runtime: native via GGUF em cuda-local.
Pesos: 5.5 GB
Cache KV: 0.6 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Pontuação 114.6 combina correspondência de carga de trabalho, atualidade do catálogo, segurança de encaixe, cobertura de contexto, escolha de artefato, utilização de memória, throughput e latência.
Gemma

Gemma 4 E4B
FrontierLançado em Apr 2026Hugging FaceOllamaLM Studio
Por que vence
Gemma 4 E4B is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Posição #3
ARunsEST.
Status de encaixe
Runs well
Encaixe: Runs well com contexto seguro de 63K.
Suporte de runtime: native via GGUF em cuda-local.
Pesos: 4.9 GB
Cache KV: 1.3 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Pontuação 110.2 combina correspondência de carga de trabalho, atualidade do catálogo, segurança de encaixe, cobertura de contexto, escolha de artefato, utilização de memória, throughput e latência.
Codestral

Codestral Mamba 7B
AtualLançado em Jul 2024Hugging FaceOllama
Por que vence
Codestral Mamba 7B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Posição #4
ARunsEST.
Status de encaixe
Runs well
Encaixe: Runs well com contexto seguro de 184K.
Suporte de runtime: native via GGUF em cpu-gpu-local.
Pesos: 4.3 GB
Cache KV: 0.5 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Pontuação 107.2 combina correspondência de carga de trabalho, atualidade do catálogo, segurança de encaixe, cobertura de contexto, escolha de artefato, utilização de memória, throughput e latência.
Yi

Yi Coder 9B
AtualLançado em Sep 2024Hugging FaceOllamaLM Studio
Por que vence
Yi Coder 9B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Posição #5
BRunsEST.
Status de encaixe
Runs well
Encaixe: Runs well com contexto seguro de 48K.
Suporte de runtime: native via GGUF em cuda-local.
Pesos: 5.5 GB
Cache KV: 1.5 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Pontuação 106.6 combina correspondência de carga de trabalho, atualidade do catálogo, segurança de encaixe, cobertura de contexto, escolha de artefato, utilização de memória, throughput e latência.
Granite

Granite 4.1 8B
AtualLançado em Apr 2026Hugging FaceOllama
Por que vence
Granite 4.1 8B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Posição #6
ARunsEST.
Status de encaixe
Runs well
Encaixe: Runs well com contexto seguro de 33K.
Suporte de runtime: native via GGUF em cpu-gpu-local.
Pesos: 4.9 GB
Cache KV: 2.4 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Pontuação 102.3 combina correspondência de carga de trabalho, atualidade do catálogo, segurança de encaixe, cobertura de contexto, escolha de artefato, utilização de memória, throughput e latência.
Qwen

Qwen 2.5 Coder 7B
AtualLançado em Sep 2024Hugging FaceOllamaLM Studio
Por que vence
Qwen 2.5 Coder 7B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Posição #7
ARunsEST.
Status de encaixe
Runs well
Encaixe: Runs well com contexto seguro de 105K.
Suporte de runtime: native via GGUF em cpu-gpu-local.
Pesos: 4.3 GB
Cache KV: 0.9 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Pontuação 101.0 combina correspondência de carga de trabalho, atualidade do catálogo, segurança de encaixe, cobertura de contexto, escolha de artefato, utilização de memória, throughput e latência.
Qwen

Qwen 3 8B
FrontierLançado em Apr 2025Hugging FaceOllamaLM Studio
Por que vence
Qwen 3 8B is viable for Coding, but is not the most specialized choice. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Posição #8
SRunsEST.
Status de encaixe
Runs well
Encaixe: Runs well com contexto seguro de 37K.
Suporte de runtime: native via GGUF em cpu-gpu-local.
Pesos: 4.9 GB
Cache KV: 2.2 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Pontuação 99.6 combina correspondência de carga de trabalho, atualidade do catálogo, segurança de encaixe, cobertura de contexto, escolha de artefato, utilização de memória, throughput e latência.
Nemotron

Nemotron Nano 9B v2
FrontierLançado em Jun 2025Hugging FaceOllamaLM Studio
Por que vence
Nemotron Nano 9B v2 is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It should run, but memory headroom will be limited. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Tight · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Good · Bottleneck: Balanced
Posição #9
ATightEST.
Status de encaixe
Tight fit
Encaixe: Tight fit com contexto seguro de 29K.
Suporte de runtime: native via GGUF em cuda-local.
Pesos: 5.5 GB
Cache KV: 2.4 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Pontuação 99.4 combina correspondência de carga de trabalho, atualidade do catálogo, segurança de encaixe, cobertura de contexto, escolha de artefato, utilização de memória, throughput e latência.
Qwen

Qwen 3.5 4B
FrontierLançado em Jun 2025Hugging FaceOllamaLM Studio
Por que vence
Qwen 3.5 4B is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Posição #10
SRunsEST.
Status de encaixe
Runs well
Encaixe: Runs well com contexto seguro de 48K.
Suporte de runtime: native via GGUF em cpu-gpu-local.
Pesos: 3.3 GB
Cache KV: 2.2 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Pontuação 93.6 combina correspondência de carga de trabalho, atualidade do catálogo, segurança de encaixe, cobertura de contexto, escolha de artefato, utilização de memória, throughput e latência.