Explorar modelos de IA

380 modelos disponibles

/

Estado:

Sort:

67B4K ctx40.9 GBlegacy

denseLegacy

Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.

Mistral Ministral 8B

8B131K ctx4.9 GBcurrent

denseLegacy

We introduce two new state-of-the-art models for local intelligence, on-device computing, and at-the-edge use cases. We call them les Ministraux: Ministral 3B and Ministral 8B.

Alibaba Qwen 2.5 Coder 0.5B

0.5B131K ctx0.3 GBcurrent

denseLegacy

Ultra-lightweight coding assistant for edge deployment and code completion.

Alibaba Qwen 2.5 1.5B

1.5B131K ctx0.9 GBcurrent

denseLegacy

Qwen 2.5 1.5B is a compact model suitable for mobile and edge devices with decent chat and instruction following.

Google Gemma 3 1B

1B33K ctx0.6 GBcurrent

denseLegacy

Gemma 3 1B is Google's ultra-compact model from the Gemma 3 family. Optimized for mobile and edge inference with surprisingly capable text generation for its parameter count.

InternLM InternLM 20B

20B8K ctx12.2 GBlegacy

denseLegacy

InternLM2.5 has open-sourced a 20 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics:

HuggingFace SmolLM3 3B

3B128K ctx1.8 GBactive

denseLegacy

SmolLM3 is a fully open 3B-parameter language model with dual-mode reasoning, 128K context via YARN extrapolation, and native support for 6 languages. Pretrained on 11.2T tokens with a staged curriculum of web, code, math, and reasoning data. Post-trained with 140B reasoning tokens and Anchored Preference Optimization.

Alibaba Qwen 3 0.6B

0.6B33K ctx0.4 GBfrontier

denseLegacy

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:

Cohere Aya Expanse 32B

32B8K ctx19.5 GBcurrent

denseLegacy

Aya Expanse 32B is Cohere's massively multilingual model supporting 23 languages with strong instruction following.

Google Gemma 2 2B

2B8K ctx1.2 GBcurrent

denseLegacy

Gemma 2 2B is Google's lightweight model designed for on-device and edge deployment. Delivers strong text generation and reasoning performance at minimal resource cost.

IBM Granite 3.1 8B

8B128K ctx4.9 GBcurrent

state-spaceLegacy

Model Summary: Granite-3.1-8B-Instruct is a 8B parameter long-context instruct model finetuned from Granite-3.1-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging.

Meta Llama 3.2 1B

1B128K ctx0.6 GBlegacy

denseLegacy

Llama 3.2 1B is Meta's smallest text model designed for on-device inference. Optimized for multilingual text generation, summarization, and instruction following on resource-constrained hardware.

01.AI Yi 1.5 9B

9B4K ctx5.5 GBcurrent

denseLegacy

🐙 GitHub • 👾 Discord • 🐤 Twitter • 💬 WeChat

Alibaba Qwen 2.5 Math 7B

7B4K ctx4.3 GBcurrent

denseLegacy

> [!Warning] > > > 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. > >

Alibaba Qwen 3.5 0.6B

0.6B131K ctx0.4 GBfrontier

denseLegacy

Qwen3.5 is the latest generation of Alibaba's Qwen large language model family, bringing major improvements in reasoning, instruction following, and multilingual capability across both dense and MoE architectures.

Alibaba Qwen 2.5 0.5B

0.5B131K ctx0.3 GBcurrent

denseLegacy

Qwen 2.5 0.5B is an ultra-lightweight model for edge and IoT deployment, offering basic chat capability at minimal resource cost.

OpenChat OpenChat 7B

7B8K ctx4.3 GBlegacy

denseLegacy

Advancing Open-source Language Models with Mixed-Quality Data

Cohere Aya Expanse 8B

8B8K ctx4.9 GBcurrent

denseLegacy

Aya Expanse 8B is Cohere's multilingual model supporting 23 languages with strong cross-lingual transfer. Designed for global applications requiring high-quality generation across diverse languages.

01.AI Yi 34B Chat

34B200K ctx20.7 GBlegacy

denseLegacy

- they might want nothing more than destruction itself rather then anything else from their quest after immortality (and maybe someone should tell them about modern medicine)? In any event though – one thing remains true regardless : whether or not success comes easy depends entirely upon how much effort we put into conquering whatever challenges lie ahead along with having faith deep down inside ourselves too ;) So let’s get started now shall We?" pipeline_tag: text-generation

BigCode StarCoder2 15B

15B16K ctx9.2 GBcurrent

denseLegacy

- Project Website: bigcode-project.org - Paper: Link - Point of Contact: [email protected] - Languages: 600+ Programming languages

NVIDIA Nemotron Mini 4B

4B4K ctx2.4 GBcurrent

denseLegacy

Nemotron-Mini-4B-Instruct is a model for generating responses for roleplaying, retrieval augmented generation, and function calling. It is a small language model (SLM) optimized through distillation, pruning and quantization for speed and on-device deployment. It is a fine-tuned version of nvidia/Minitron-4B-Base, which was pruned and distilled from Nemotron-4 15B using our LLM compression technique. This instruct model is optimized for roleplay, RAG QA, and function calling in English. It supports a context length of 4,096 tokens. This model is ready for commercial use.

Teknium OpenHermes 2.5 7B

7B8K ctx4.3 GBcurrent

denseLegacy

*In the tapestry of Greek mythology, Hermes reigns as the eloquent Messenger of the Gods, a deity who deftly bridges the realms through the art of communication. It is in homage to this divine mediator that I name this advanced LLM "Hermes," a system crafted to navigate the complex intricacies of human discourse with celestial finesse.*

Cognitive Computations Dolphin 2.9 8B

8B33K ctx4.9 GBlegacy

denseLegacy

Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations

LMSYS Vicuna 7B

7B4K ctx4.3 GBlegacy

denseLegacy

Vicuna is a chat assistant trained by fine-tuning Llama 2 on user-shared conversations collected from ShareGPT.