Explorar modelos de IA

380 modelos disponibles

/

Estado:

Sort:

32B131K ctx19.5 GBfrontier

denseAlto

EXAONE 4.0 is LG AI Research's flagship language model. The 32B variant offers strong multilingual performance with particular strength in Korean and English tasks.

Google Gemma 4 26B A4B

25.2B (3.8B active)256K ctx15.4 GBfrontier

moeAlto

Gemma 4 26B-A4B is Google's MoE model with 25.2B total parameters, 3.8B active per token (128 experts, 8 active). Matches much larger dense models at a fraction of the compute. 256K context. Apache 2.0.

Mistral Ministral 3 14B

14B262K ctx8.5 GBfrontier

multimodalAlto

The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.

Meta Llama 4 Maverick 17B 128E

400B (17B active)1.0M ctx244 GBfrontier

moeAlto

Llama 4 Maverick is Meta's large MoE model with 17B active parameters and 128 experts (400B total). Delivers frontier-class performance on reasoning and coding while remaining deployable on a single node.

Meta Llama 3.3 70B

70B128K ctx42.7 GBcurrent

denseAlto

Llama 3.3 70B is Meta's most capable single-GPU-class model, offering improved reasoning and instruction following over Llama 3.1 70B. Supports 128K context with enhanced multilingual and code capabilities.

Mistral Codestral 2 25.08

22B256K ctx13.4 GBfrontier

denseAlto

Codestral 2 is Mistral AI's latest code-focused model with enhanced performance on code generation, refactoring, and documentation across dozens of programming languages.

DeepSeek DeepSeek V2.5 236B

236B (21B active)131K ctx144 GBcurrent

moeAlto

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit DeepSeek-V2 page for more information.

Mixedbread AI mxbai Embed Large

0.34B1K ctx0.2 GBcurrent

denseAlto

The crispy sentence embedding family from Mixedbread.

Mistral Mistral Small 3.2 24B

24B131K ctx14.6 GBcurrent

visionAlto

Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.

Meta Llama 3.1 405B

405B131K ctx247.1 GBfrontier

denseAlto

Llama 3.1 405B is Meta's largest open-weight model, competitive with GPT-4 class models across reasoning, coding, and multilingual tasks.

Alibaba Qwen 2.5 32B

32B131K ctx19.5 GBcurrent

denseAlto

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

Tsinghua/Zhipu CogVLM2 19B

19B8K ctx11.6 GBcurrent

denseAlto

👋 Wechat · 💡Online Demo · 🎈Github Page · 📑 Paper

Liquid AI LFM2 24B

24B131K ctx14.6 GBfrontier

denseAlto

LFM2 24B is Liquid AI's hybrid architecture model combining state-space and transformer layers for efficient long-context inference with strong reasoning capabilities.

Snowflake Snowflake Arctic Embed L

0.34B1K ctx0.2 GBcurrent

denseAlto

Google Gemma 3 27B

27B131K ctx16.5 GBcurrent

denseAlto

Gemma 3 27B is Google's flagship Gemma 3 model with 128K context and vision support. Delivers top-tier open model performance in reasoning, code, math, and multimodal understanding.

IBM Granite 4.1 30B

30B131K ctx18.3 GBcurrent

denseAlto

Granite 4.1 30B is IBM's serious local workstation dense decoder-only model, trained on roughly 15T tokens with 128K context. Q4 fits on a 24 GB GPU with room for KV cache, making it a strong enterprise-friendly assistant for RAG and coding. Apache 2.0 licensed.

InternLM InternVL2 8B

8B8K ctx4.9 GBcurrent

denseAlto

We are excited to announce the release of InternVL 2.0, the latest addition to the InternVL series of multimodal large language models. InternVL 2.0 features a variety of instruction-tuned models, ranging from 1 billion to 108 billion parameters. This repository contains the instruction-tuned InternVL2-8B model.

Moonshot AI Kimi Linear 48B A3B

48B1.0M ctx29.3 GBcurrent

linearAlto

Kimi Linear is Moonshot AI's long-context efficient architecture release, using Kimi Delta Attention to cut KV-cache pressure and improve decoding throughput at very long sequence lengths.

Nomic AI Nomic Embed Text v1.5

0.14B8K ctx0.1 GBcurrent

denseAlto

Exciting Update!: `nomic-embed-text-v1.5` is now multimodal! nomic-embed-vision-v1.5 is aligned to the embedding space of `nomic-embed-text-v1.5`, meaning any text embedding is multimodal!

Allen AI OLMo 2 32B

32B4K ctx19.5 GBactive

denseAlto

OLMo 2 32B is Allen AI's fully open 32B-parameter language model, the largest in the OLMo 2 family. Trained on 6T tokens from the Dolma dataset, post-trained with Tülu 3 SFT, DPO, and RLVR. First fully open model to outperform GPT-3.5 and GPT-4o mini on academic benchmarks.

DeepSeek DeepSeek V3 671B

671B (37B active)131K ctx409.3 GBcurrent

moeAlto

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.

Mistral Mistral Small 24B

24B33K ctx14.6 GBlegacy

denseAlto

Mistral Small 3 ( 2501 ) sets a new benchmark in the "small" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models! This model is an instruction-fine-tuned version of the base model: Mistral-Small-24B-Base-2501.

Mistral Mistral Large 3

675B (41B active)256K ctx411.8 GBfrontier

moeAlto

Mistral-Large-Instruct-2411 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities extending Mistral-Large-Instruct-2407 with better Long Context, Function Calling and System Prompt.

Microsoft Phi-4 14B

14B16K ctx8.5 GBcurrent

denseAlto

Our training data is an extension of the data used for Phi-3 and includes a wide variety of sources from: