Will It Run AI

浏览 AI 模型

共 374 个模型s可用

/
状态:
Sort:
Nous ResearchNous ResearchNous Dolphin 13B
13B16K ctx7.9 GBlegacy
denseMid

Dolphin 13B is a general-purpose uncensored model fine-tuned for broad capabilities including coding, reasoning, and creative writing without alignment restrictions.

Nous ResearchNous ResearchNous Hermes 1.0
9B16K ctx5.5 GBlegacy
denseMid

Nous Hermes is a fine-tuned model optimized for instruction following and helpful dialogue. Trained on curated datasets emphasizing quality responses, reasoning, and user alignment.

Allen AIAllen AIOLMo 2 7B
7B4K ctx4.3 GBcurrent
denseMid

OLMo 2 7B is Allen AI's fully open language model with open data, code, and weights.

Instinct AIInstinct AISolar 7B
7B8K ctx4.3 GBlegacy
denseMid

Solar 7B is Upstage's efficient language model built on a depth-upscaled architecture. Offers strong instruction following and reasoning performance optimized for single-GPU inference.

LMSYSLMSYSVicuna 13B
13B4K ctx7.9 GBlegacy
denseMid

Vicuna is a chat assistant trained by fine-tuning Llama 2 on user-shared conversations collected from ShareGPT.

WizardLMWizardLMWizardLM 13B
13B8K ctx7.9 GBlegacy
denseMid

Project Repo: https://github.com/nlpxucan/WizardLM

WizardLMWizardLMWizardMath 7B
7B4K ctx4.3 GBlegacy
denseMid

📃 [WizardLM] • 📃 [WizardCoder] • 📃 [WizardMath]

MicrosoftMicrosoftPhi 4 Mini 4B
4B128K ctx2.4 GBfrontier
denseBudget

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning and direct preference optimization to support precise instruction adherence and robust safety measures.

AlibabaAlibabaQwen 2.5 Coder 7B
7B131K ctx4.3 GBcurrent
denseBudget

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

AlibabaAlibabaQwen 3 1.7B
1.7B33K ctx1 GBfrontier
denseBudget

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:

GoogleGoogleGemma 2 27B
27B8K ctx16.5 GBcurrent
denseBudget

Gemma 2 27B is Google's largest Gemma 2 model, offering state-of-the-art performance among open models of similar size. Built on Gemini technology with strong reasoning, code, and multilingual capabilities.

AlibabaAlibabaQwen 2.5 3B
3B131K ctx1.8 GBcurrent
denseBudget

Qwen 2.5 3B provides a good balance of capability and efficiency, suitable for laptops and entry-level GPUs.

AlibabaAlibabaQwen 2.5 Coder 1.5B
1.5B33K ctx0.9 GBactive
denseBudget

Qwen 2.5 Coder 1.5B is Alibaba's compact code-specific language model from the Qwen2.5 Coder series. Trained on 5.5T tokens including source code, text-code grounding, and synthetic data. Features improvements in code generation, reasoning, and fixing while maintaining general and math capabilities.

LLaVALLaVALLaVA 1.5 7B
7B4K ctx4.3 GBlegacy
denseBudget

Model type: LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

CohereCohereCommand R+ 104B
104B131K ctx63.4 GBcurrent
denseBudget

Command R+ is Cohere's most capable open-weight model for enterprise RAG workloads. Offers superior long-context reasoning, multi-step tool use, and grounded generation with citations across 10 languages.

IBMIBMGranite 4.1 3B
3B131K ctx1.8 GBcurrent
denseBudget

Granite 4.1 3B is IBM's smallest Granite 4.1 dense decoder-only model, trained on roughly 15T tokens with 128K context. Apache 2.0 licensed and tuned for fast, commercially-friendly RAG, coding, and assistant workloads on small GPUs.

MicrosoftMicrosoftPhi 3 Mini 3.8B
3.8B128K ctx2.3 GBcurrent
denseBudget

The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support.

DeepSeekDeepSeekDeepSeek R1 Distill 7B
7B33K ctx4.3 GBactive
denseBudget

DeepSeek R1 Distill Qwen 7B is a 7B-parameter reasoning model distilled from the larger DeepSeek-R1. Based on Qwen2.5-Math-7B and fine-tuned on 800K samples from DeepSeek-R1, it delivers strong reasoning with 92.8% on MATH-500 and 49.1 on GPQA Diamond while being far more efficient than the full 671B model.

DeepSeekDeepSeekDeepSeek R1 Distill 8B
8B33K ctx4.9 GBfrontier
denseBudget

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

BaichuanBaichuanBaichuan 13B
13B8K ctx7.9 GBlegacy
denseBudget

Baichuan-13B-Chat为Baichuan-13B系列模型中对齐后的版本,预训练模型可见Baichuan-13B-Base。

BaichuanBaichuanBaichuan 7B
7B8K ctx4.3 GBlegacy
denseBudget

Baichuan-7B是由百川智能开发的一个开源的大规模预训练模型。基于Transformer结构,在大约1.2万亿tokens上训练的70亿参数模型,支持中英双语,上下文窗口长度为4096。在标准的中文和英文权威benchmark(C-EVAL/MMLU)上均取得同尺寸最好的效果。

CerebrasCerebrasCerebras-GPT 13B
13B131K ctx7.9 GBlegacy
denseBudget

Check out our Blog Post and arXiv paper!

TIITIIFalcon 7B Instruct
7B8K ctx4.3 GBlegacy
denseBudget

Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. It is made available under the Apache 2.0 license.

IBMIBMGranite Code 3B
3B8K ctx1.8 GBcurrent
denseBudget

Granite Code 3B is IBM's compact code generation model for enterprise use.