Will It Run AI

浏览 AI 模型

共 374 个模型s可用

/
状态:
Sort:
AlibabaAlibabaQwen 2.5 Coder 3B
3B131K ctx1.8 GBcurrent
denseMid

Compact coding model with solid code completion and generation for resource-constrained environments.

DeepSeekDeepSeekDeepSeek R1 Distill 32B
32B33K ctx19.5 GBfrontier
denseMid

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

MetaMetaCodeLlama 13B Instruct
13B16K ctx7.9 GBlegacy
denseMid

Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the repository for the 13 instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom.

Mistral AIMistral AICodestral Mamba 7B
7B262K ctx4.3 GBcurrent
state-spaceMid

Codestral Mamba is an open code model based on the Mamba2 architecture. It performs on par with state-of-the-art Transformer-based code models. \ You can read more in the official blog post.

DevStral AIDevStral AIDevStral 7B
7B8K ctx4.3 GBlegacy
denseMid

Devstral 7B is Mistral AI's specialized coding model optimized for software development tasks. Features strong code generation, completion, and understanding across multiple programming languages.

IBMIBMGranite Code 8B
8B8K ctx4.9 GBcurrent
denseMid

Granite-8B-Code-Instruct-4K is a 8B parameter model fine tuned from *Granite-8B-Code-Base-4K* on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.

BigCodeBigCodeStarCoder 15B
15B8K ctx9.2 GBlegacy
denseMid

StarCoder 15B is BigCode's flagship code generation model trained on 1 trillion tokens from The Stack. Supports 80+ programming languages with 8K context and strong code completion capabilities.

DeepSeekDeepSeekDeepSeek R1 Distill 14B
14B33K ctx8.5 GBfrontier
denseMid

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

IBMIBMGranite 4.1 8B
8B131K ctx4.9 GBcurrent
denseMid

Granite 4.1 8B is IBM's sweet-spot dense decoder-only model, trained on roughly 15T tokens with 128K context. IBM reports the 8B instruct model matching or beating the previous Granite 4.0-H-Small 32B-A9B MoE in several comparisons. Apache 2.0 licensed for commercial RAG, coding, and assistant deployments.

Mistral AIMistral AIPixtral 12B
12B131K ctx7.3 GBcurrent
denseMid

The Pixtral-12B-2409 is a Multimodal Model of 12B parameters plus a 400M parameter vision encoder.

MetaMetaCodeLlama 7B Instruct
7B16K ctx4.3 GBlegacy
denseMid

Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the repository for the 7B instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom.

LLaVALLaVALLaVA 1.6 13B
13B4K ctx7.9 GBcurrent
denseMid

Model type: LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: mistralai/Mistral-7B-Instruct-v0.2

BigCodeBigCodeStarCoder 7B
7B8K ctx4.3 GBlegacy
denseMid

StarCoder 7B is BigCode's code generation model trained on The Stack v1. Supports over 80 programming languages with fill-in-the-middle capability and 8K context window.

GoogleGoogleGemma 4 E2B
5.1B128K ctx3.1 GBfrontier
denseMid

Gemma 4 E2B is Google's smallest Gemma 4 model with 5.1B total parameters (2.3B effective via Per-Layer Embeddings). Supports text, image, audio, and video natively. Apache 2.0 licensed. Built on Gemini 3 technology.

MistralMistralMinistral 3 3B
3B262K ctx1.8 GBfrontier
multimodalMid

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

AlibabaAlibabaQwen 3.5 2B
2B131K ctx1.2 GBfrontier
denseMid

Qwen3.5 2B delivers competitive quality at minimal VRAM cost, suitable for laptops and entry-level GPUs.

NVIDIANVIDIANemotron 70B
70B131K ctx42.7 GBcurrent
denseMid

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.

ZhipuZhipuGLM-4 9B
9B128K ctx5.5 GBcurrent
denseMid

2024/11/25, 我们建议使用从 `transformers>=4.46.0` 开始,使用 glm-4-9b-chat-hf 以减少后续 transformers 升级导致的兼容性问题。

GoogleGoogleGemma 3 4B
4B128K ctx2.4 GBcurrent
denseMid

Gemma 3 4B is Google's efficient Gemma 3 model supporting vision and text. Ideal for on-device applications requiring multimodal understanding with fast inference speeds.

MetaMetaLlama 3.1 8B
8B128K ctx4.9 GBlegacy
denseMid

Llama 3.1 8B is Meta's efficient general-purpose model supporting 128K context and multilingual text generation. Optimized for dialogue, summarization, reasoning, and code generation tasks.

TIITIIFalcon 40B Instruct
40B8K ctx24.4 GBlegacy
denseMid

Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the Apache 2.0 license.

InternLMInternLMInternLM 7B
7B8K ctx4.3 GBlegacy
denseMid

InternLM has open-sourced a 7 billion parameter base model tailored for practical scenarios. The model has the following characteristics: - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. - It provides a versatile toolset for users to flexibly build their own workflows.

InternLMInternLMInternLM Chat 7B
7B8K ctx4.3 GBlegacy
denseMid

InternLM has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics: - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. - It supports an 8k context window length, enabling longer input sequences and stronger reasoning capabilities. - It provides a versatile toolset for users to flexibly build their own workflows.

MosaicMLMosaicMLMPT-30B-Instruct
30B8K ctx18.3 GBlegacy
denseMid

MPT-30B Instruct is MosaicML's large instruction-tuned model offering strong reasoning and generation quality. Features 8K context with ALiBi encoding and efficient inference optimizations.