KI-Modelle durchsuchen

380 Modells verfügbar

/

Status:

Sort:

Baichuan Baichuan 13B

13B8K ctx7.9 GBlegacy

denseBudget

Baichuan-13B-Chat为Baichuan-13B系列模型中对齐后的版本，预训练模型可见Baichuan-13B-Base。

Baichuan Baichuan 7B

7B8K ctx4.3 GBlegacy

denseBudget

Baichuan-7B是由百川智能开发的一个开源的大规模预训练模型。基于Transformer结构，在大约1.2万亿tokens上训练的70亿参数模型，支持中英双语，上下文窗口长度为4096。在标准的中文和英文权威benchmark（C-EVAL/MMLU）上均取得同尺寸最好的效果。

Cerebras Cerebras-GPT 13B

13B131K ctx7.9 GBlegacy

denseBudget

Check out our Blog Post and arXiv paper!

TII Falcon 7B Instruct

7B8K ctx4.3 GBlegacy

denseBudget

Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. It is made available under the Apache 2.0 license.

IBM Granite Code 3B

3B8K ctx1.8 GBcurrent

denseBudget

Granite Code 3B is IBM's compact code generation model for enterprise use.

MosaicML MPT-7B-Instruct

7B8K ctx4.3 GBlegacy

denseBudget

MPT-7B Instruct is MosaicML's instruction-tuned model with a commercially permissive license. Supports 65K context with ALiBi positional encoding for efficient long-document processing.

Cognitive Computations Samantha 7B

7B4K ctx4.3 GBlegacy

denseBudget

Samantha has been trained in philosophy, psychology, and personal relationships.

Microsoft Phi 3.5 Mini 4B

4B128K ctx2.4 GBlegacy

denseBudget

Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length. The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures.

Mistral Mixtral 8x7B

47B (13B active)33K ctx28.7 GBcurrent

moeBudget

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest

Google Gemma 2 9B

9B8K ctx5.5 GBcurrent

denseBudget

Gemma 2 9B is Google's mid-size open model built on Gemini research. Features improved reasoning and safety with a novel architecture optimized for efficient inference on consumer hardware.

Meta Llama 3.2 11B Vision

11B16K ctx6.7 GBlegacy

visionBudget

Llama 3.2 11B Vision is Meta's multimodal model that processes both text and images. Supports visual question answering, image captioning, and document understanding alongside standard text generation.

Alibaba Qwen 2.5 Coder 14B

14B131K ctx8.5 GBcurrent

denseBudget

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

Mistral Mixtral 8x22B

141B (39B active)66K ctx86 GBcurrent

moeBudget

Alibaba Qwen 2.5 Math 72B

72B4K ctx43.9 GBfrontier

denseBudget

> [!Warning] > > > 🚨 Qwen2.5-Math mainly supports solving English and Chinese math problems through CoT and TIR. We do not recommend using this series of models for other tasks. > >

01.AI Yi 1.5 34B

34B4K ctx20.7 GBcurrent

denseBudget

🐙 GitHub • 👾 Discord • 🐤 Twitter • 💬 WeChat

Meta Llama 3.2 3B

3B128K ctx1.8 GBlegacy

denseBudget

Llama 3.2 3B is Meta's compact multilingual text model optimized for edge and mobile deployment. Supports summarization, instruction following, and text generation with strong performance for its size class.

Mistral Mistral Nemo 12B

12B128K ctx7.3 GBcurrent

denseBudget

The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407. Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

Mistral Mistral 7B Instruct v0.3

7B8K ctx4.3 GBlegacy

denseBudget

The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.

01.AI Yi Coder 9B

9B131K ctx5.5 GBcurrent

denseBudget

🐙 GitHub • 👾 Discord • 🐤 Twitter • 💬 WeChat

Liquid AI LFM2.5 350M

0.35B128K ctx0.2 GBactive

denseBudget

LFM2.5-350M is Liquid AI's ultra-compact 354M-parameter model for on-device and embedded deployment, using the hybrid convolution + attention LFM2 backbone with a 128K context window.

TinyLlama TinyLlama 1.1B

1.1B4K ctx0.7 GBlegacy

denseBudget

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01.

Microsoft Phi 3 Medium 14B

14B128K ctx8.5 GBcurrent

denseLegacy

The Phi-3-Medium-128K-Instruct is a 14B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Medium version in two variants 4k and 128K which is the context length (in tokens) that it can support.

DeepSeek DeepSeek R1 1.5B

1.5B33K ctx0.9 GBactive

denseLegacy

DeepSeek R1 Distill Qwen 1.5B is a compact reasoning model distilled from DeepSeek-R1, based on Qwen2.5-Math-1.5B. Fine-tuned on 800K curated samples, it achieves 83.9% on MATH-500 and supports chain-of-thought reasoning on resource-constrained devices.

Mistral AI Codestral 22B

22B33K ctx13.4 GBcurrent

denseLegacy