KI-Modelle durchsuchen

380 Modells verfügbar

/

Status:

Sort:

4B131K ctx2.4 GBfrontier

denseTop tier

Qwen3.5 4B offers a sweet spot between capability and efficiency, handling coding and general tasks well on modest hardware.

Z.ai GLM-5.2

753.3B (40B active)200K ctx459.5 GBfrontier

moeTop tier

GLM-5.2 is Z.ai's flagship MoE model for long-horizon agentic tasks, with a native 1M-token context, flexible coding effort levels, and an improved DeepSeek Sparse Attention (DSA) architecture over GLM-5.1. 753B total parameters with ~40B activated per token (256 routed experts, 8 active, 1 shared).

Alibaba Qwen 3 8B

8B131K ctx4.9 GBfrontier

denseTop tier

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:

Alibaba Qwen3-Coder-Next

80B (3B active)256K ctx48.8 GBfrontier

moeTop tier

Today, we're announcing Qwen3-Coder-Next, an open-weight language model designed specifically for coding agents and local development. It features the following key enhancements:

Microsoft Phi-4-reasoning-plus 14B

14.7B33K ctx9 GBfrontier

denseTop tier

> [!IMPORTANT] > To fully take advantage of the model's capabilities, inference must use `temperature=0.8`, `top_k=50`, `top_p=0.95`, and `do_sample=True`. For more complex queries, set `max_new_tokens=32768` to allow for longer chain-of-thought (CoT).

Mistral Devstral Small 1.1

24B131K ctx14.6 GBcurrent

denseTop tier

Devstral is an agentic LLM for software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positions it as the #1 open source model on this benchmark.

Z.ai GLM-5.1

754B (40B active)200K ctx459.9 GBfrontier

moeTop tier

GLM-5.1 is Z.ai's next-generation flagship MoE model for agentic engineering, with significantly stronger coding capabilities than GLM-5. It achieves state-of-the-art performance on SWE-Bench Pro and sustains optimization over hundreds of rounds and thousands of tool calls on long-horizon agentic tasks.

Mistral AI Pixtral Large 124B

124B131K ctx75.6 GBfrontier

denseTop tier

Pixtral-Large-Instruct-2411 is a 124B multimodal model built on top of Mistral Large 2, i.e., Mistral-Large-Instruct-2407. Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.

Z.ai GLM-5

744B (40B active)200K ctx453.8 GBfrontier

moeTop tier

📍 Use GLM-5 API services on Z.ai API Platform.

DeepSeek DeepSeek V3.2

671B (37B active)128K ctx409.3 GBfrontier

moeTop tier

DeepSeek V3.2 is a 671B MoE model with 37B active parameters per token, using DeepSeek Sparse Attention and Multi-head Latent Attention. 128K context window. MIT licensed. Requires multi-GPU or high-memory Macs for local inference.

OpenAI GPT-OSS 20B

21B (3.6B active)128K ctx12.8 GBfrontier

moeHoch

GPT-OSS 20B is OpenAI's first open-weight model, a 21B-parameter mixture-of-experts model with 3.6B active parameters per token. Features configurable reasoning effort (low/medium/high), full chain-of-thought visibility, and agentic capabilities including function calling. Runs on devices with 16GB of memory using MXFP4 quantization.

Alibaba Qwen 3 235B A22B

235B (22B active)131K ctx143.4 GBfrontier

moeHoch

We introduce the updated version of the Qwen3-235B-A22B non-thinking mode, named Qwen3-235B-A22B-Instruct-2507, featuring the following key enhancements:

Alibaba Qwen3-Coder 480B A35B Instruct

480B (35B active)256K ctx292.8 GBfrontier

moeHoch

Today, we're announcing Qwen3-Coder, our most agentic code model to date. Qwen3-Coder is available in multiple sizes, but we're excited to introduce its most powerful variant first: Qwen3-Coder-480B-A35B-Instruct. featuring the following key enhancements:

NVIDIA Nemotron Cascade 2 30B A3B

30B (3B active)262K ctx18.3 GBfrontier

moeHoch

NVIDIA Nemotron Cascade 2 is a 30B MoE model with 3B active parameters, using a Mamba-2 + Transformer hybrid architecture. Gold medal at IMO 2025 and IOI 2025. 92% AIME 2025, 87% LiveCodeBench. Fits on a single RTX 4090.

Microsoft Phi-4 Mini Reasoning 4B

3.8B131K ctx2.3 GBfrontier

denseHoch

Phi-4 Mini Reasoning is Microsoft's compact reasoning model with chain-of-thought capabilities at just 3.8B parameters.

Google Gemma 4 31B

30.7B256K ctx18.7 GBfrontier

denseHoch

Gemma 4 31B is the largest and most capable open Gemma model. Dense architecture with 30.7B parameters. 256K context window. Achieves 2150 Codeforces ELO and 89.2% AIME 2026. Apache 2.0 licensed.

Jina AI Jina Embeddings v3

0.57B8K ctx0.3 GBcurrent

denseHoch

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

MiniMax MiniMax M2.7

230B (10B active)205K ctx140.3 GBfrontier

moeHoch

Self-evolving agent model with 230B total / 10B active MoE architecture. SOTA on SWE-Pro (56.2%) and Terminal Bench 2 (57%). Runs locally on 128GB Mac with Dynamic 4-bit GGUF.

BAAI BGE M3

0.57B8K ctx0.3 GBcurrent

denseHoch

For more details please refer to our github repo: https://github.com/FlagOpen/FlagEmbedding

Mistral Leanstral 119B A6B

119B (6.5B active)256K ctx72.6 GBcurrent

moeHoch

Leanstral is Mistral's open-weight proof and code agent for Lean 4 workflows, built on the Mistral Small 4 family with multimodal input, tool use, and long-context support.

DeepSeek DeepSeek Coder V2 236B

236B (21B active)131K ctx144 GBcurrent

moeHoch

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

DeepSeek DeepSeek R1 671B

671B (37B active)131K ctx409.3 GBfrontier

moeHoch

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

DeepSeek DeepSeek V3.1 671B

671B (37B active)131K ctx409.3 GBfrontier

moeHoch

DeepSeek V3.1 (V3-0324) is a major update to the DeepSeek V3 family, with substantial improvements in instruction following, coding, creative writing, and agentic capabilities.

NVIDIA Nemotron Nano 8B

8B131K ctx4.9 GBactive

denseHoch

Nemotron Nano 8B is NVIDIA's reasoning model derived from Llama 3.1 8B Instruct, post-trained for switchable reasoning with on/off modes. Achieves 95.4% on MATH-500 and 54.1% on GPQA Diamond with reasoning enabled. Fits on a single RTX GPU for local deployment.