Will It Run AI · Rechner

Sagen Sie uns, was Sie besitzen und was Sie tun möchten. Wir bewerten die lokalen Modelle, die für Sie infrage kommen.

Starten Sie mit Ihrer Hardware und Ihrem Workload, dann erhalten Sie eine Auswahl basierend auf Eignung, Geschwindigkeit und Runtime-Unterstützung — statt aus generischen Modelllisten oder Benchmark-Screenshots zu raten.

Starten Sie mit Ihrer Hardware So funktioniert das Ranking

Live-Katalog-Snapshot: 196 hardware profiles, 380 models, 24 runtimes. So bleibt der Rechner mit dem aktuellen Katalog synchron statt mit einer statischen Benchmark-Liste.

Wird ausgewertet

RTX 4070 12GB

Workload

Coding

Runtime

llama.cpp

Operating mode

Balanced

Eingaben

Wählen Sie die Hardware, Runtime und den Workload, den Sie testen möchten.

Verwenden Sie die erkannte Hardware, wenn sie korrekt ist, ändern Sie sie andernfalls, und starten Sie das Ranking neu, um realistische lokale KI-Optionen zu vergleichen.

Browser detection

Collecting GPU metadata…

Awaiting detection

Hardware

Custom hardware specs

RuntimeWorkloadOperating mode

Balanced for general local use. Keeps the ranking neutral across personal and serving workflows.

Update the hardware or workload and recalculate to refresh the ranking.

1. Eignung

Speichereignung und Reserven entscheiden, ob ein Modell auf der gewählten Hardware realistisch ist.

2. Workload

Die Bewertung belohnt Modelle, die zur gewählten Aufgabe passen, und bestraft veraltete Familien, wenn neuere Spezialversionen existieren.

3. Geschwindigkeit

Decode-Durchsatz und TTFT sorgen dafür, dass die Auswahl für reale Nutzung praktikabel ist — nicht nur theoretisch möglich.

Qwen

Qwen 3.5 9B

FrontierVeröffentlicht Jun 2025Hugging FaceOllamaLM Studio

Warum empfohlen

This model is a direct match for coding. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Capacity: Roomy · Bandwidth: Medium · Stack: Standard

Interactive: Good · Light API: Great · Bottleneck: Balanced

Rang #1

SRunsEST.

Punktzahl

130.7

Passungsstatus

Runs well

Passung: Runs well mit sicherem Kontext 32K.

Laufzeit-Support: unknown via n/a auf unknown.

Laufzeit

llama.cpp

Artefakt

n/a

Quant.

Q4_K_M

Dekodierung

71.5 tok/s

Sicherer Kontext

32K

Offizieller Kontext

131K

Support

n/a

TTFT

2708 ms

Gewichte: 5.5 GB

KV-Cache: 2.2 GB

Backend: unknown

Current limits

This setup is broadly balanced for this model.

No major red flags

This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.

Best next improvements

Punktzahl 130.7 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.

Gemma

Gemma 4 E4B

FrontierVeröffentlicht Apr 2026Hugging FaceOllamaLM Studio

Warum empfohlen

This model is a direct match for coding. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Capacity: Roomy · Bandwidth: Medium · Stack: Standard

Interactive: Good · Light API: Great · Bottleneck: Balanced

Rang #2

ARunsEST.

Punktzahl

112.1

Passungsstatus

Runs well

Passung: Runs well mit sicherem Kontext 63K.

Laufzeit-Support: unknown via n/a auf unknown.

Laufzeit

llama.cpp

Artefakt

n/a

Quant.

Q4_K_M

Dekodierung

55.7 tok/s

Sicherer Kontext

63K

Offizieller Kontext

128K

Support

n/a

TTFT

3474 ms

Gewichte: 4.9 GB

KV-Cache: 1.3 GB

Backend: unknown

Current limits

This setup is broadly balanced for this model.

No major red flags

This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.

Best next improvements

Punktzahl 112.1 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.

CodeGeeX

CodeGeeX 4 9B

AktuellVeröffentlicht Jul 2024Hugging FaceOllama

Warum empfohlen

This model is still usable for coding, but it is not the most specialized pick. It sits in the middle of the current model mix. It fits natively with comfortable headroom. Known channels: huggingface, ollama.

Capacity: Roomy · Bandwidth: Medium · Stack: Standard

Interactive: Good · Light API: Great · Bottleneck: Balanced

Rang #3

ARunsEST.

Punktzahl

108.4

Passungsstatus

Runs well

Passung: Runs well mit sicherem Kontext 116K.

Laufzeit-Support: unknown via n/a auf unknown.

Laufzeit

llama.cpp

Artefakt

n/a

Quant.

Q4_K_M

Dekodierung

69.3 tok/s

Sicherer Kontext

116K

Offizieller Kontext

131K

Support

n/a

TTFT

2794 ms

Gewichte: 5.5 GB

KV-Cache: 0.6 GB

Backend: unknown

Current limits

This setup is broadly balanced for this model.

No major red flags

This recommendation has enough memory headroom and acceptable estimated speed for the selected workload.

Best next improvements

Punktzahl 108.4 kombiniert Workload-Übereinstimmung, Katalogaktualität, Passungssicherheit, Kontextabdeckung, Artefaktwahl, Speicherauslastung, Durchsatz und Latenz.

Alle 380 Modelle

Full compatibility grid for RTX 4070 12GB

246 models fit · 9 excellent · 38 great

Grade

Model

Params

Tasks

Q4 VRAM

Decode

Context

Memory

Fit