浏览 AI 模型

共 380 个模型s可用

/

状态:

Sort:

BAAI BGE Large EN v1.5

0.34B1K ctx0.2 GBcurrent

denseHigh

Google Gemma 4 12B

12B262K ctx7.3 GBfrontier

denseHigh

Gemma 4 12B is Google's dense multimodal (text, image, and audio) model in the Gemma 4 family, offering strong reasoning and code generation with a 256K context window. Balances quality against practical single-GPU resource requirements.

Alibaba Qwen 3 4B

4B33K ctx2.4 GBcurrent

denseHigh

We introduce the updated version of the Qwen3-4B non-thinking mode, named Qwen3-4B-Instruct-2507, featuring the following key enhancements:

Mistral AI Mistral Small 3.1 24B

24B131K ctx14.6 GBfrontier

denseHigh

Mistral Small 3.1 is an updated version of Mistral Small with improved instruction following and vision capabilities.

Meta Llama 3.1 70B

70B128K ctx42.7 GBlegacy

denseHigh

Llama 3.1 70B is Meta's high-capability open model with 128K context window. Excels at complex reasoning, multilingual tasks, code generation, and tool use with quality competitive with leading proprietary models.

Alibaba Qwen 2.5 72B

72B131K ctx43.9 GBcurrent

denseHigh

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:

Alibaba Qwen 2.5 14B

14B131K ctx8.5 GBcurrent

denseHigh

OpenBMB MiniCPM-V 2.6 8B

8B2K ctx4.9 GBcurrent

denseHigh

MiniCPM-V 2.6 is OpenBMB's compact multimodal model supporting image and video understanding alongside text. Delivers strong visual reasoning and OCR capabilities at 8B parameter scale.

Mistral Ministral 3 8B

8B262K ctx4.9 GBfrontier

multimodalHigh

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

DeepReinforce Ornith 1.0 35B A3B

35.1B (3B active)262K ctx21.4 GBfrontier

moeHigh

Ornith-1.0-35B is DeepReinforce's lightweight self-improving coding agent, built on Qwen 3.5. A 35B-total MoE (256 experts, 8 active, ~3B activated per token) with a hybrid linear + full-attention backbone, designed for efficient single-GPU agentic deployment with a 262K context.

Google Gemma 3 12B

12B131K ctx7.3 GBcurrent

denseHigh

Gemma 3 12B is Google's mid-range Gemma 3 model with vision capabilities. Offers strong reasoning, code generation, and image understanding balanced with practical resource requirements.

IBM Granite Code 20B

20B8K ctx12.2 GBcurrent

denseHigh

Granite-20B-Code-Instruct-8K is a 20B parameter model fine tuned from *Granite-20B-Code-Base-8K* on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.

Alibaba Qwen 2.5 VL 7B

7B33K ctx4.3 GBcurrent

denseHigh

license: apache-2.0 language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers

Defog SQLCoder 7B

7B8K ctx4.3 GBcurrent

denseHigh

The model weights were updated at 7 AM UTC on Feb 7, 2024. The new model weights lead to a much more performant model – particularly for joins.

NVIDIA Nemotron Nano 9B v2

9B131K ctx5.5 GBfrontier

denseMid

Nemotron Nano 9B v2 is an updated version of NVIDIA's compact reasoning model with improved instruction following, coding, and math capabilities.

Google DiffusionGemma 26B A4B

25.8B (4B active)262K ctx15.7 GBfrontier

moeMid

DiffusionGemma 26B A4B is Google's block-diffusion language model in the Gemma family: instead of left-to-right autoregression it denoises blocks of tokens in parallel. 25.8B total parameters with ~4B activated per token (128 experts, 8 active) and multimodal image-text input.

DeepSeek DeepSeek Coder V2 16B

16B (2.4B active)131K ctx9.8 GBcurrent

moeMid

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.

Tsinghua/Zhipu CodeGeeX 4 9B

9B131K ctx5.5 GBcurrent

denseMid

We introduce CodeGeeX4-ALL-9B, the open-source version of the latest CodeGeeX4 model series. It is a multilingual code generation model continually trained on the GLM-4-9B, significantly enhancing its code generation capabilities. Using a single CodeGeeX4-ALL-9B model, it can support comprehensive functions such as code completion and generation, code interpreter, web search, function call, repository-level code Q&A, covering various scenarios of software development. CodeGeeX4-ALL-9B has achieved highly competitive performance on public benchmarks, such as BigCodeBench and NaturalCodeBench.

Meta Llama 4 Scout 17B 16E

109B (17B active)10.5M ctx66.5 GBfrontier

moeMid

Llama 4 Scout is Meta's efficient Mixture-of-Experts model with 17B active parameters across 16 experts. Supports a 10M token context window and natively handles text, images, and video inputs.

Magistral AI Magistral 7B

7B8K ctx4.3 GBlegacy

denseMid

Magistral 7B is Mistral AI's reasoning-focused model designed for complex analytical and mathematical tasks. Features chain-of-thought capabilities for step-by-step problem solving.

Alibaba Qwen 2.5 Coder 32B

32B131K ctx19.5 GBcurrent

denseMid

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

Google Gemma 4 E4B

8B128K ctx4.9 GBfrontier

denseMid

Gemma 4 E4B is Google's mid-range on-device model with 8B total parameters (4.5B effective). Default Gemma 4 model on Ollama. Supports text and image. Apache 2.0 licensed.

Sentence Transformers All MiniLM L6 v2

0.02B0K ctx0 GBcurrent

denseMid

This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.

IBM Granite Code 34B

34B8K ctx20.7 GBcurrent

denseMid

Granite Code 34B is IBM's largest code generation model, strong across 100+ programming languages.