KI-Modelle durchsuchen

380 Modells verfügbar

/

Status:

Sort:

32B131K ctx19.5 GBfrontier

denseHoch

EXAONE 4.0 is LG AI Research's flagship language model. The 32B variant offers strong multilingual performance with particular strength in Korean and English tasks.

Google Gemma 4 26B A4B

25.2B (3.8B active)256K ctx15.4 GBfrontier

moeHoch

Gemma 4 26B-A4B is Google's MoE model with 25.2B total parameters, 3.8B active per token (128 experts, 8 active). Matches much larger dense models at a fraction of the compute. 256K context. Apache 2.0.

Mistral Ministral 3 14B

14B262K ctx8.5 GBfrontier

multimodalHoch

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.

Meta Llama 4 Maverick 17B 128E

400B (17B active)1.0M ctx244 GBfrontier

moeHoch

Llama 4 Maverick is Meta's large MoE model with 17B active parameters and 128 experts (400B total). Delivers frontier-class performance on reasoning and coding while remaining deployable on a single node.

Meta Llama 3.3 70B

70B128K ctx42.7 GBcurrent

denseHoch

Llama 3.3 70B is Meta's most capable single-GPU-class model, offering improved reasoning and instruction following over Llama 3.1 70B. Supports 128K context with enhanced multilingual and code capabilities.

Mistral Codestral 2 25.08

22B256K ctx13.4 GBfrontier

denseHoch

Codestral 2 is Mistral AI's latest code-focused model with enhanced performance on code generation, refactoring, and documentation across dozens of programming languages.

DeepSeek DeepSeek V2.5 236B

236B (21B active)131K ctx144 GBcurrent

moeHoch

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit DeepSeek-V2 page for more information.

Mixedbread AI mxbai Embed Large

0.34B1K ctx0.2 GBcurrent

denseHoch

The crispy sentence embedding family from Mixedbread.

Mistral Mistral Small 3.2 24B

24B131K ctx14.6 GBcurrent

visionHoch

Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.

Meta Llama 3.1 405B

405B131K ctx247.1 GBfrontier

denseHoch

Llama 3.1 405B is Meta's largest open-weight model, competitive with GPT-4 class models across reasoning, coding, and multilingual tasks.

Alibaba Qwen 2.5 32B

32B131K ctx19.5 GBcurrent

denseHoch

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

Tsinghua/Zhipu CogVLM2 19B

19B8K ctx11.6 GBcurrent

denseHoch

👋 Wechat · 💡Online Demo · 🎈Github Page · 📑 Paper

Liquid AI LFM2 24B

24B131K ctx14.6 GBfrontier

denseHoch

LFM2 24B is Liquid AI's hybrid architecture model combining state-space and transformer layers for efficient long-context inference with strong reasoning capabilities.

Snowflake Snowflake Arctic Embed L

0.34B1K ctx0.2 GBcurrent

denseHoch

Google Gemma 3 27B

27B131K ctx16.5 GBcurrent

denseHoch

Gemma 3 27B is Google's flagship Gemma 3 model with 128K context and vision support. Delivers top-tier open model performance in reasoning, code, math, and multimodal understanding.

IBM Granite 4.1 30B

30B131K ctx18.3 GBcurrent

denseHoch

Granite 4.1 30B is IBM's serious local workstation dense decoder-only model, trained on roughly 15T tokens with 128K context. Q4 fits on a 24 GB GPU with room for KV cache, making it a strong enterprise-friendly assistant for RAG and coding. Apache 2.0 licensed.

InternLM InternVL2 8B

8B8K ctx4.9 GBcurrent

denseHoch

We are excited to announce the release of InternVL 2.0, the latest addition to the InternVL series of multimodal large language models. InternVL 2.0 features a variety of instruction-tuned models, ranging from 1 billion to 108 billion parameters. This repository contains the instruction-tuned InternVL2-8B model.

Moonshot AI Kimi Linear 48B A3B

48B1.0M ctx29.3 GBcurrent

linearHoch

Kimi Linear is Moonshot AI's long-context efficient architecture release, using Kimi Delta Attention to cut KV-cache pressure and improve decoding throughput at very long sequence lengths.

Nomic AI Nomic Embed Text v1.5

0.14B8K ctx0.1 GBcurrent

denseHoch

Exciting Update!: `nomic-embed-text-v1.5` is now multimodal! nomic-embed-vision-v1.5 is aligned to the embedding space of `nomic-embed-text-v1.5`, meaning any text embedding is multimodal!

Allen AI OLMo 2 32B

32B4K ctx19.5 GBactive

denseHoch

OLMo 2 32B is Allen AI's fully open 32B-parameter language model, the largest in the OLMo 2 family. Trained on 6T tokens from the Dolma dataset, post-trained with Tülu 3 SFT, DPO, and RLVR. First fully open model to outperform GPT-3.5 and GPT-4o mini on academic benchmarks.

DeepSeek DeepSeek V3 671B

671B (37B active)131K ctx409.3 GBcurrent

moeHoch

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

Mistral Mistral Small 24B

24B33K ctx14.6 GBlegacy

denseHoch

Mistral Small 3 ( 2501 ) sets a new benchmark in the "small" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models! This model is an instruction-fine-tuned version of the base model: Mistral-Small-24B-Base-2501.

Mistral Mistral Large 3

675B (41B active)256K ctx411.8 GBfrontier

moeHoch

Mistral-Large-Instruct-2411 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities extending Mistral-Large-Instruct-2407 with better Long Context, Function Calling and System Prompt.

Microsoft Phi-4 14B

14B16K ctx8.5 GBcurrent

denseHoch

Our training data is an extension of the data used for Phi-3 and includes a wide variety of sources from: