KI-Modelle durchsuchen

374 Modells verfügbar

/
Status:
Sort:
AlibabaAlibabaQwen 2.5 Coder 3B
3B131K ctx1.8 GBcurrent
denseMittel

Compact coding model with solid code completion and generation for resource-constrained environments.

DeepSeekDeepSeekDeepSeek R1 Distill 32B
32B33K ctx19.5 GBfrontier
denseMittel

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

MetaMetaCodeLlama 13B Instruct
13B16K ctx7.9 GBlegacy
denseMittel

Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the repository for the 13 instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom.

Mistral AIMistral AICodestral Mamba 7B
7B262K ctx4.3 GBcurrent
state-spaceMittel

Codestral Mamba is an open code model based on the Mamba2 architecture. It performs on par with state-of-the-art Transformer-based code models. \ You can read more in the official blog post.

DevStral AIDevStral AIDevStral 7B
7B8K ctx4.3 GBlegacy
denseMittel

Devstral 7B is Mistral AI's specialized coding model optimized for software development tasks. Features strong code generation, completion, and understanding across multiple programming languages.

IBMIBMGranite Code 8B
8B8K ctx4.9 GBcurrent
denseMittel

Granite-8B-Code-Instruct-4K is a 8B parameter model fine tuned from *Granite-8B-Code-Base-4K* on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.

BigCodeBigCodeStarCoder 15B
15B8K ctx9.2 GBlegacy
denseMittel

StarCoder 15B is BigCode's flagship code generation model trained on 1 trillion tokens from The Stack. Supports 80+ programming languages with 8K context and strong code completion capabilities.

DeepSeekDeepSeekDeepSeek R1 Distill 14B
14B33K ctx8.5 GBfrontier
denseMittel

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

IBMIBMGranite 4.1 8B
8B131K ctx4.9 GBcurrent
denseMittel

Granite 4.1 8B is IBM's sweet-spot dense decoder-only model, trained on roughly 15T tokens with 128K context. IBM reports the 8B instruct model matching or beating the previous Granite 4.0-H-Small 32B-A9B MoE in several comparisons. Apache 2.0 licensed for commercial RAG, coding, and assistant deployments.

Mistral AIMistral AIPixtral 12B
12B131K ctx7.3 GBcurrent
denseMittel

The Pixtral-12B-2409 is a Multimodal Model of 12B parameters plus a 400M parameter vision encoder.

MetaMetaCodeLlama 7B Instruct
7B16K ctx4.3 GBlegacy
denseMittel

Code Llama is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 34 billion parameters. This is the repository for the 7B instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding. Links to other models can be found in the index at the bottom.

LLaVALLaVALLaVA 1.6 13B
13B4K ctx7.9 GBcurrent
denseMittel

Model type: LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: mistralai/Mistral-7B-Instruct-v0.2

BigCodeBigCodeStarCoder 7B
7B8K ctx4.3 GBlegacy
denseMittel

StarCoder 7B is BigCode's code generation model trained on The Stack v1. Supports over 80 programming languages with fill-in-the-middle capability and 8K context window.

GoogleGoogleGemma 4 E2B
5.1B128K ctx3.1 GBfrontier
denseMittel

Gemma 4 E2B is Google's smallest Gemma 4 model with 5.1B total parameters (2.3B effective via Per-Layer Embeddings). Supports text, image, audio, and video natively. Apache 2.0 licensed. Built on Gemini 3 technology.

MistralMistralMinistral 3 3B
3B262K ctx1.8 GBfrontier
multimodalMittel

The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.

AlibabaAlibabaQwen 3.5 2B
2B131K ctx1.2 GBfrontier
denseMittel

Qwen3.5 2B delivers competitive quality at minimal VRAM cost, suitable for laptops and entry-level GPUs.

NVIDIANVIDIANemotron 70B
70B131K ctx42.7 GBcurrent
denseMittel

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.

ZhipuZhipuGLM-4 9B
9B128K ctx5.5 GBcurrent
denseMittel

2024/11/25, 我们建议使用从 `transformers>=4.46.0` 开始,使用 glm-4-9b-chat-hf 以减少后续 transformers 升级导致的兼容性问题。

GoogleGoogleGemma 3 4B
4B128K ctx2.4 GBcurrent
denseMittel

Gemma 3 4B is Google's efficient Gemma 3 model supporting vision and text. Ideal for on-device applications requiring multimodal understanding with fast inference speeds.

MetaMetaLlama 3.1 8B
8B128K ctx4.9 GBlegacy
denseMittel

Llama 3.1 8B is Meta's efficient general-purpose model supporting 128K context and multilingual text generation. Optimized for dialogue, summarization, reasoning, and code generation tasks.

TIITIIFalcon 40B Instruct
40B8K ctx24.4 GBlegacy
denseMittel

Falcon-40B-Instruct is a 40B parameters causal decoder-only model built by TII based on Falcon-40B and finetuned on a mixture of Baize. It is made available under the Apache 2.0 license.

InternLMInternLMInternLM 7B
7B8K ctx4.3 GBlegacy
denseMittel

InternLM has open-sourced a 7 billion parameter base model tailored for practical scenarios. The model has the following characteristics: - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. - It provides a versatile toolset for users to flexibly build their own workflows.

InternLMInternLMInternLM Chat 7B
7B8K ctx4.3 GBlegacy
denseMittel

InternLM has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios. The model has the following characteristics: - It leverages trillions of high-quality tokens for training to establish a powerful knowledge base. - It supports an 8k context window length, enabling longer input sequences and stronger reasoning capabilities. - It provides a versatile toolset for users to flexibly build their own workflows.

MosaicMLMosaicMLMPT-30B-Instruct
30B8K ctx18.3 GBlegacy
denseMittel

MPT-30B Instruct is MosaicML's large instruction-tuned model offering strong reasoning and generation quality. Features 8K context with ALiBi encoding and efficient inference optimizations.