Qwen

Qwen 3.5 9B
FrontierVeröffentlicht Jun 2025Hugging FaceOllamaLM Studio
Warum empfohlen
Qwen 3.5 9B is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Passung: Runs well mit sicherem Kontext 32K.
Laufzeit-Support: native via GGUF auf cuda-local.
Gewichte: 5.5 GB
KV-Cache: 2.2 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Punktzahl 122.0 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.
CodeGeeX

CodeGeeX 4 9B
AktuellVeröffentlicht Jul 2024Hugging FaceOllama
Warum empfohlen
CodeGeeX 4 9B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Passung: Runs well mit sicherem Kontext 116K.
Laufzeit-Support: native via GGUF auf cuda-local.
Gewichte: 5.5 GB
KV-Cache: 0.6 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Punktzahl 114.6 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.
Gemma

Gemma 4 E4B
FrontierVeröffentlicht Apr 2026Hugging FaceOllamaLM Studio
Warum empfohlen
Gemma 4 E4B is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Passung: Runs well mit sicherem Kontext 63K.
Laufzeit-Support: native via GGUF auf cuda-local.
Gewichte: 4.9 GB
KV-Cache: 1.3 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Punktzahl 110.2 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.
Codestral

Codestral Mamba 7B
AktuellVeröffentlicht Jul 2024Hugging FaceOllama
Warum empfohlen
Codestral Mamba 7B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Passung: Runs well mit sicherem Kontext 184K.
Laufzeit-Support: native via GGUF auf cpu-gpu-local.
Gewichte: 4.3 GB
KV-Cache: 0.5 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Punktzahl 107.2 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.
Yi

Yi Coder 9B
AktuellVeröffentlicht Sep 2024Hugging FaceOllamaLM Studio
Warum empfohlen
Yi Coder 9B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Passung: Runs well mit sicherem Kontext 48K.
Laufzeit-Support: native via GGUF auf cuda-local.
Gewichte: 5.5 GB
KV-Cache: 1.5 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Punktzahl 106.6 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.
Granite

Granite 4.1 8B
AktuellVeröffentlicht Apr 2026Hugging FaceOllama
Warum empfohlen
Granite 4.1 8B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Passung: Runs well mit sicherem Kontext 33K.
Laufzeit-Support: native via GGUF auf cpu-gpu-local.
Gewichte: 4.9 GB
KV-Cache: 2.4 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Punktzahl 102.3 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.
Qwen

Qwen 2.5 Coder 7B
AktuellVeröffentlicht Sep 2024Hugging FaceOllamaLM Studio
Warum empfohlen
Qwen 2.5 Coder 7B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Passung: Runs well mit sicherem Kontext 105K.
Laufzeit-Support: native via GGUF auf cpu-gpu-local.
Gewichte: 4.3 GB
KV-Cache: 0.9 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Punktzahl 101.0 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.
Qwen

Qwen 3 8B
FrontierVeröffentlicht Apr 2025Hugging FaceOllamaLM Studio
Warum empfohlen
Qwen 3 8B is viable for Coding, but is not the most specialized choice. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Passung: Runs well mit sicherem Kontext 37K.
Laufzeit-Support: native via GGUF auf cpu-gpu-local.
Gewichte: 4.9 GB
KV-Cache: 2.2 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Punktzahl 99.6 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.
Nemotron

Nemotron Nano 9B v2
FrontierVeröffentlicht Jun 2025Hugging FaceOllamaLM Studio
Warum empfohlen
Nemotron Nano 9B v2 is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It should run, but memory headroom will be limited. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Tight · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Good · Bottleneck: Balanced
Passung: Tight fit mit sicherem Kontext 29K.
Laufzeit-Support: native via GGUF auf cuda-local.
Gewichte: 5.5 GB
KV-Cache: 2.4 GB
Backend: cuda-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Punktzahl 99.4 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.
Qwen

Qwen 3.5 4B
FrontierVeröffentlicht Jun 2025Hugging FaceOllamaLM Studio
Warum empfohlen
Qwen 3.5 4B is a specialized fit for Coding. It is a recent-generation family, which helps on current local SOTA workloads. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.
Capacity: Roomy · Bandwidth: Medium · Stack: Standard
Interactive: Good · Light API: Great · Bottleneck: Balanced
Passung: Runs well mit sicherem Kontext 48K.
Laufzeit-Support: native via GGUF auf cpu-gpu-local.
Gewichte: 3.3 GB
KV-Cache: 2.2 GB
Backend: cpu-gpu-local
Current limits
This setup is broadly balanced for this model.
No major red flags
This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.
Punktzahl 93.6 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.