浏览 AI 模型
共 374 个模型s可用
Mistral Small 3.1 is an updated version of Mistral Small with improved instruction following and vision capabilities.
Llama 3.1 70B is Meta's high-capability open model with 128K context window. Excels at complex reasoning, multilingual tasks, code generation, and tool use with quality competitive with leading proprietary models.
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
MiniCPM-V 2.6 is OpenBMB's compact multimodal model supporting image and video understanding alongside text. Delivers strong visual reasoning and OCR capabilities at 8B parameter scale.
A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
Gemma 3 12B is Google's mid-range Gemma 3 model with vision capabilities. Offers strong reasoning, code generation, and image understanding balanced with practical resource requirements.
Granite-20B-Code-Instruct-8K is a 20B parameter model fine tuned from *Granite-20B-Code-Base-8K* on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.
license: apache-2.0 language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers
The model weights were updated at 7 AM UTC on Feb 7, 2024. The new model weights lead to a much more performant model – particularly for joins.
Nemotron Nano 9B v2 is an updated version of NVIDIA's compact reasoning model with improved instruction following, coding, and math capabilities.
We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks.
We introduce CodeGeeX4-ALL-9B, the open-source version of the latest CodeGeeX4 model series. It is a multilingual code generation model continually trained on the GLM-4-9B, significantly enhancing its code generation capabilities. Using a single CodeGeeX4-ALL-9B model, it can support comprehensive functions such as code completion and generation, code interpreter, web search, function call, repository-level code Q&A, covering various scenarios of software development. CodeGeeX4-ALL-9B has achieved highly competitive performance on public benchmarks, such as BigCodeBench and NaturalCodeBench.
Llama 4 Scout is Meta's efficient Mixture-of-Experts model with 17B active parameters across 16 experts. Supports a 10M token context window and natively handles text, images, and video inputs.
Magistral 7B is Mistral AI's reasoning-focused model designed for complex analytical and mathematical tasks. Features chain-of-thought capabilities for step-by-step problem solving.
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:
Gemma 4 E4B is Google's mid-range on-device model with 8B total parameters (4.5B effective). Default Gemma 4 model on Ollama. Supports text and image. Apache 2.0 licensed.
This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
Granite Code 34B is IBM's largest code generation model, strong across 100+ programming languages.
OLMo 2 13B is AI2's fully open research model with transparent training data and methodology. Designed for reproducible research with competitive performance on reasoning and general knowledge tasks.
Command R is Cohere's retrieval-augmented generation model optimized for enterprise use. Excels at long-context document processing, tool use, and grounded generation with citation support.
DeepSeek R1 Distill 70B is a distilled reasoning model based on Llama 70B, offering strong chain-of-thought reasoning at a practical size.
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: