浏览 AI 模型

共 380 个模型s可用

/

状态:

Sort:

8B128K ctx4.9 GBlegacy

denseMid

Llama 3.1 8B is Meta's efficient general-purpose model supporting 128K context and multilingual text generation. Optimized for dialogue, summarization, reasoning, and code generation tasks.

TII Falcon 40B Instruct

40B8K ctx24.4 GBlegacy

denseMid

Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the Apache 2.0 license.

InternLM InternLM 7B

7B8K ctx4.3 GBlegacy

denseMid

InternLM has open-sourced a 7 billion parameter base model tailored for practical scenarios. The model has the following characteristics: - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. - It provides a versatile toolset for users to flexibly build their own workflows.

InternLM InternLM Chat 7B

7B8K ctx4.3 GBlegacy

denseMid

InternLM has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics: - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. - It supports an 8k context window length, enabling longer input sequences and stronger reasoning capabilities. - It provides a versatile toolset for users to flexibly build their own workflows.

MosaicML MPT-30B-Instruct

30B8K ctx18.3 GBlegacy

denseMid

MPT-30B Instruct is MosaicML's large instruction-tuned model offering strong reasoning and generation quality. Features 8K context with ALiBi encoding and efficient inference optimizations.

Nous Research Nous Dolphin 13B

13B16K ctx7.9 GBlegacy

denseMid

Dolphin 13B is a general-purpose uncensored model fine-tuned for broad capabilities including coding, reasoning, and creative writing without alignment restrictions.

Nous Research Nous Hermes 1.0

9B16K ctx5.5 GBlegacy

denseMid

Nous Hermes is a fine-tuned model optimized for instruction following and helpful dialogue. Trained on curated datasets emphasizing quality responses, reasoning, and user alignment.

Allen AI OLMo 2 7B

7B4K ctx4.3 GBcurrent

denseMid

OLMo 2 7B is Allen AI's fully open language model with open data, code, and weights.

Instinct AI Solar 7B

7B8K ctx4.3 GBlegacy

denseMid

Solar 7B is Upstage's efficient language model built on a depth-upscaled architecture. Offers strong instruction following and reasoning performance optimized for single-GPU inference.

LMSYS Vicuna 13B

13B4K ctx7.9 GBlegacy

denseMid

Vicuna is a chat assistant trained by fine-tuning Llama 2 on user-shared conversations collected from ShareGPT.

WizardLM WizardLM 13B

13B8K ctx7.9 GBlegacy

denseMid

Project Repo: https://github.com/nlpxucan/WizardLM

WizardLM WizardMath 7B

7B4K ctx4.3 GBlegacy

denseMid

📃 [WizardLM] • 📃 [WizardCoder] • 📃 [WizardMath]

Microsoft Phi 4 Mini 4B

4B128K ctx2.4 GBfrontier

denseBudget

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning and direct preference optimization to support precise instruction adherence and robust safety measures.

Alibaba Qwen 2.5 Coder 7B

7B131K ctx4.3 GBcurrent

denseBudget

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

Alibaba Qwen 3 1.7B

1.7B33K ctx1 GBfrontier

denseBudget

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:

Google Gemma 2 27B

27B8K ctx16.5 GBcurrent

denseBudget

Gemma 2 27B is Google's largest Gemma 2 model, offering state-of-the-art performance among open models of similar size. Built on Gemini technology with strong reasoning, code, and multilingual capabilities.

Alibaba Qwen 2.5 3B

3B131K ctx1.8 GBcurrent

denseBudget

Qwen 2.5 3B provides a good balance of capability and efficiency, suitable for laptops and entry-level GPUs.

Alibaba Qwen 2.5 Coder 1.5B

1.5B33K ctx0.9 GBactive

denseBudget

Qwen 2.5 Coder 1.5B is Alibaba's compact code-specific language model from the Qwen2.5 Coder series. Trained on 5.5T tokens including source code, text-code grounding, and synthetic data. Features improvements in code generation, reasoning, and fixing while maintaining general and math capabilities.

LLaVA LLaVA 1.5 7B

7B4K ctx4.3 GBlegacy

denseBudget

Model type: LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

Cohere Command R+ 104B

104B131K ctx63.4 GBcurrent

denseBudget

Command R+ is Cohere's most capable open-weight model for enterprise RAG workloads. Offers superior long-context reasoning, multi-step tool use, and grounded generation with citations across 10 languages.

IBM Granite 4.1 3B

3B131K ctx1.8 GBcurrent

denseBudget

Granite 4.1 3B is IBM's smallest Granite 4.1 dense decoder-only model, trained on roughly 15T tokens with 128K context. Apache 2.0 licensed and tuned for fast, commercially-friendly RAG, coding, and assistant workloads on small GPUs.

Microsoft Phi 3 Mini 3.8B

3.8B128K ctx2.3 GBcurrent

denseBudget

The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. The model belongs to the Phi-3 family with the Mini version in two variants 4K and 128K which is the context length (in tokens) that it can support.

DeepSeek DeepSeek R1 Distill 7B

7B33K ctx4.3 GBactive

denseBudget

DeepSeek R1 Distill Qwen 7B is a 7B-parameter reasoning model distilled from the larger DeepSeek-R1. Based on Qwen2.5-Math-7B and fine-tuned on 800K samples from DeepSeek-R1, it delivers strong reasoning with 92.8% on MATH-500 and 49.1 on GPQA Diamond while being far more efficient than the full 671B model.

DeepSeek DeepSeek R1 Distill 8B

8B33K ctx4.9 GBfrontier

denseBudget

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.