浏览 AI 模型

共 380 个模型s可用

/

状态:

Sort:

13B33K ctx7.9 GBcurrent

denseMid

OLMo 2 13B is AI2's fully open research model with transparent training data and methodology. Designed for reproducible research with competitive performance on reasoning and general knowledge tasks.

Cohere Command R 35B

35B131K ctx21.3 GBcurrent

denseMid

Command R is Cohere's retrieval-augmented generation model optimized for enterprise use. Excels at long-context document processing, tool use, and grounded generation with citation support.

DeepSeek DeepSeek R1 Distill 70B

70B131K ctx42.7 GBfrontier

denseMid

DeepSeek R1 Distill 70B is a distilled reasoning model based on Llama 70B, offering strong chain-of-thought reasoning at a practical size.

Alibaba Qwen 2.5 7B

7B131K ctx4.3 GBcurrent

denseMid

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

Alibaba Qwen 2.5 Coder 3B

3B131K ctx1.8 GBcurrent

denseMid

Compact coding model with solid code completion and generation for resource-constrained environments.

DeepSeek DeepSeek R1 Distill 32B

32B33K ctx19.5 GBfrontier

denseMid

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

Meta CodeLlama 13B Instruct

13B16K ctx7.9 GBlegacy

denseMid

Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the repository for the 13 instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom.

Mistral AI Codestral Mamba 7B

7B262K ctx4.3 GBcurrent

state-spaceMid

Codestral Mamba is an open code model based on the Mamba2 architecture. It performs on par with state-of-the-art Transformer-based code models. \ You can read more in the official blog post.

DevStral AI DevStral 7B

7B8K ctx4.3 GBlegacy

denseMid

Devstral 7B is Mistral AI's specialized coding model optimized for software development tasks. Features strong code generation, completion, and understanding across multiple programming languages.

IBM Granite Code 8B

8B8K ctx4.9 GBcurrent

denseMid

Granite-8B-Code-Instruct-4K is a 8B parameter model fine tuned from *Granite-8B-Code-Base-4K* on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.

BigCode StarCoder 15B

15B8K ctx9.2 GBlegacy

denseMid

StarCoder 15B is BigCode's flagship code generation model trained on 1 trillion tokens from The Stack. Supports 80+ programming languages with 8K context and strong code completion capabilities.

DeepSeek DeepSeek R1 Distill 14B

14B33K ctx8.5 GBfrontier

denseMid

IBM Granite 4.1 8B

8B131K ctx4.9 GBcurrent

denseMid

Granite 4.1 8B is IBM's sweet-spot dense decoder-only model, trained on roughly 15T tokens with 128K context. IBM reports the 8B instruct model matching or beating the previous Granite 4.0-H-Small 32B-A9B MoE in several comparisons. Apache 2.0 licensed for commercial RAG, coding, and assistant deployments.

Liquid AI LFM2.5 8B A1B

8.5B (1.5B active)128K ctx5.2 GBfrontier

moeMid

LFM2.5-8B-A1B is Liquid AI's on-device MoE assistant: 8.3B total parameters with only 1.5B activated per token (32 experts, 4 active). Its hybrid convolution + attention backbone is optimized for fast, low-memory edge inference on consumer hardware.

Mistral AI Pixtral 12B

12B131K ctx7.3 GBcurrent

denseMid

The Pixtral-12B-2409 is a Multimodal Model of 12B parameters plus a 400M parameter vision encoder.

Meta CodeLlama 7B Instruct

7B16K ctx4.3 GBlegacy

denseMid

Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the repository for the 7B instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom.

LLaVA LLaVA 1.6 13B

13B4K ctx7.9 GBcurrent

denseMid

Model type: LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: mistralai/Mistral-7B-Instruct-v0.2

BigCode StarCoder 7B

7B8K ctx4.3 GBlegacy

denseMid

StarCoder 7B is BigCode's code generation model trained on The Stack v1. Supports over 80 programming languages with fill-in-the-middle capability and 8K context window.

Google Gemma 4 E2B

5.1B128K ctx3.1 GBfrontier

denseMid

Gemma 4 E2B is Google's smallest Gemma 4 model with 5.1B total parameters (2.3B effective via Per-Layer Embeddings). Supports text, image, audio, and video natively. Apache 2.0 licensed. Built on Gemini 3 technology.

Mistral Ministral 3 3B

3B262K ctx1.8 GBfrontier

multimodalMid

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

Alibaba Qwen 3.5 2B

2B131K ctx1.2 GBfrontier

denseMid

Qwen3.5 2B delivers competitive quality at minimal VRAM cost, suitable for laptops and entry-level GPUs.

NVIDIA Nemotron 70B

70B131K ctx42.7 GBcurrent

denseMid

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.

Zhipu GLM-4 9B

9B128K ctx5.5 GBcurrent

denseMid

2024/11/25, 我们建议使用从 `transformers>=4.46.0` 开始，使用 glm-4-9b-chat-hf 以减少后续 transformers 升级导致的兼容性问题。

Google Gemma 3 4B

4B128K ctx2.4 GBcurrent

denseMid

Gemma 3 4B is Google's efficient Gemma 3 model supporting vision and text. Ideal for on-device applications requiring multimodal understanding with fast inference speeds.