Will It Run AI

Explorar modelos de IA

374 modelos disponibles

/
Estado:
Sort:
GoogleGoogleGemma 4 26B A4B
25.2B (3.8B active)256K ctx15.4 GBfrontier
moeAlto

Gemma 4 26B-A4B is Google's MoE model with 25.2B total parameters, 3.8B active per token (128 experts, 8 active). Matches much larger dense models at a fraction of the compute. 256K context. Apache 2.0.

MistralMistralMinistral 3 14B
14B262K ctx8.5 GBfrontier
multimodalAlto

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.

MetaMetaLlama 4 Maverick 17B 128E
400B (17B active)1.0M ctx244 GBfrontier
moeAlto

Llama 4 Maverick is Meta's large MoE model with 17B active parameters and 128 experts (400B total). Delivers frontier-class performance on reasoning and coding while remaining deployable on a single node.

MetaMetaLlama 3.3 70B
70B128K ctx42.7 GBcurrent
denseAlto

Llama 3.3 70B is Meta's most capable single-GPU-class model, offering improved reasoning and instruction following over Llama 3.1 70B. Supports 128K context with enhanced multilingual and code capabilities.

MistralMistralCodestral 2 25.08
22B256K ctx13.4 GBfrontier
denseAlto

Codestral 2 is Mistral AI's latest code-focused model with enhanced performance on code generation, refactoring, and documentation across dozens of programming languages.

DeepSeekDeepSeekDeepSeek V2.5 236B
236B (21B active)131K ctx144 GBcurrent
moeAlto

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit DeepSeek-V2 page for more information.

Mixedbread AIMixedbread AImxbai Embed Large
0.34B1K ctx0.2 GBcurrent
denseAlto

The crispy sentence embedding family from Mixedbread.

MistralMistralMistral Small 3.2 24B
24B131K ctx14.6 GBcurrent
visionAlto

Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.

MetaMetaLlama 3.1 405B
405B131K ctx247.1 GBfrontier
denseAlto

Llama 3.1 405B is Meta's largest open-weight model, competitive with GPT-4 class models across reasoning, coding, and multilingual tasks.

AlibabaAlibabaQwen 2.5 32B
32B131K ctx19.5 GBcurrent
denseAlto

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

Tsinghua/ZhipuTsinghua/ZhipuCogVLM2 19B
19B8K ctx11.6 GBcurrent
denseAlto

👋 Wechat · 💡Online Demo · 🎈Github Page · 📑 Paper

Liquid AILiquid AILFM2 24B
24B131K ctx14.6 GBfrontier
denseAlto

LFM2 24B is Liquid AI's hybrid architecture model combining state-space and transformer layers for efficient long-context inference with strong reasoning capabilities.

SnowflakeSnowflakeSnowflake Arctic Embed L
0.34B1K ctx0.2 GBcurrent
denseAlto

News | Models | Usage | Evaluation | Contact | FAQ License | Acknowledgement

GoogleGoogleGemma 3 27B
27B131K ctx16.5 GBcurrent
denseAlto

Gemma 3 27B is Google's flagship Gemma 3 model with 128K context and vision support. Delivers top-tier open model performance in reasoning, code, math, and multimodal understanding.

IBMIBMGranite 4.1 30B
30B131K ctx18.3 GBcurrent
denseAlto

Granite 4.1 30B is IBM's serious local workstation dense decoder-only model, trained on roughly 15T tokens with 128K context. Q4 fits on a 24 GB GPU with room for KV cache, making it a strong enterprise-friendly assistant for RAG and coding. Apache 2.0 licensed.

InternLMInternLMInternVL2 8B
8B8K ctx4.9 GBcurrent
denseAlto

We are excited to announce the release of InternVL 2.0, the latest addition to the InternVL series of multimodal large language models. InternVL 2.0 features a variety of instruction-tuned models, ranging from 1 billion to 108 billion parameters. This repository contains the instruction-tuned InternVL2-8B model.

Moonshot AIMoonshot AIKimi Linear 48B A3B
48B1.0M ctx29.3 GBcurrent
linearAlto

Kimi Linear is Moonshot AI's long-context efficient architecture release, using Kimi Delta Attention to cut KV-cache pressure and improve decoding throughput at very long sequence lengths.

Nomic AINomic AINomic Embed Text v1.5
0.14B8K ctx0.1 GBcurrent
denseAlto

Exciting Update!: `nomic-embed-text-v1.5` is now multimodal! nomic-embed-vision-v1.5 is aligned to the embedding space of `nomic-embed-text-v1.5`, meaning any text embedding is multimodal!

Allen AIAllen AIOLMo 2 32B
32B4K ctx19.5 GBactive
denseAlto

OLMo 2 32B is Allen AI's fully open 32B-parameter language model, the largest in the OLMo 2 family. Trained on 6T tokens from the Dolma dataset, post-trained with Tülu 3 SFT, DPO, and RLVR. First fully open model to outperform GPT-3.5 and GPT-4o mini on academic benchmarks.

DeepSeekDeepSeekDeepSeek V3 671B
671B (37B active)131K ctx409.3 GBcurrent
moeAlto

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

MistralMistralMistral Small 24B
24B33K ctx14.6 GBlegacy
denseAlto

Mistral Small 3 ( 2501 ) sets a new benchmark in the "small" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models! This model is an instruction-fine-tuned version of the base model: Mistral-Small-24B-Base-2501.

MistralMistralMistral Large 3
675B (41B active)256K ctx411.8 GBfrontier
moeAlto

Mistral-Large-Instruct-2411 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities extending Mistral-Large-Instruct-2407 with better Long Context, Function Calling and System Prompt.

MicrosoftMicrosoftPhi-4 14B
14B16K ctx8.5 GBcurrent
denseAlto

Our training data is an extension of the data used for Phi-3 and includes a wide variety of sources from:

BAAIBAAIBGE Large EN v1.5
0.34B1K ctx0.2 GBcurrent
denseAlto

Model List | FAQ | Usage | Evaluation | Train | Contact | Citation | License