Explorar modelos de IA
374 modelos disponibles
Compact coding model with solid code completion and generation for resource-constrained environments.
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.
Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the repository for the 13 instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom.
Codestral Mamba is an open code model based on the Mamba2 architecture. It performs on par with state-of-the-art Transformer-based code models. \ You can read more in the official blog post.
Devstral 7B is Mistral AI's specialized coding model optimized for software development tasks. Features strong code generation, completion, and understanding across multiple programming languages.
Granite-8B-Code-Instruct-4K is a 8B parameter model fine tuned from *Granite-8B-Code-Base-4K* on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.
StarCoder 15B is BigCode's flagship code generation model trained on 1 trillion tokens from The Stack. Supports 80+ programming languages with 8K context and strong code completion capabilities.
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.
Granite 4.1 8B is IBM's sweet-spot dense decoder-only model, trained on roughly 15T tokens with 128K context. IBM reports the 8B instruct model matching or beating the previous Granite 4.0-H-Small 32B-A9B MoE in several comparisons. Apache 2.0 licensed for commercial RAG, coding, and assistant deployments.
The Pixtral-12B-2409 is a Multimodal Model of 12B parameters plus a 400M parameter vision encoder.
Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the repository for the 7B instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom.
Model type: LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: mistralai/Mistral-7B-Instruct-v0.2
StarCoder 7B is BigCode's code generation model trained on The Stack v1. Supports over 80 programming languages with fill-in-the-middle capability and 8K context window.
Gemma 4 E2B is Google's smallest Gemma 4 model with 5.1B total parameters (2.3B effective via Per-Layer Embeddings). Supports text, image, audio, and video natively. Apache 2.0 licensed. Built on Gemini 3 technology.
The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.
Qwen3.5 2B delivers competitive quality at minimal VRAM cost, suitable for laptops and entry-level GPUs.
Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.
Gemma 3 4B is Google's efficient Gemma 3 model supporting vision and text. Ideal for on-device applications requiring multimodal understanding with fast inference speeds.
Llama 3.1 8B is Meta's efficient general-purpose model supporting 128K context and multilingual text generation. Optimized for dialogue, summarization, reasoning, and code generation tasks.
Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the Apache 2.0 license.
InternLM has open-sourced a 7 billion parameter base model tailored for practical scenarios. The model has the following characteristics: - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. - It provides a versatile toolset for users to flexibly build their own workflows.
InternLM has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics: - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. - It supports an 8k context window length, enabling longer input sequences and stronger reasoning capabilities. - It provides a versatile toolset for users to flexibly build their own workflows.
MPT-30B Instruct is MosaicML's large instruction-tuned model offering strong reasoning and generation quality. Features 8K context with ALiBi encoding and efficient inference optimizations.