Cosmos Diffusion 7B

Name: Cosmos Diffusion 7B
Author: NVIDIA

Stable

by NVIDIA

7B diffusion model from NVIDIA's Cosmos platform for physical AI and world modeling. Generates physically plausible videos from text descriptions. Part of NVIDIA's Physical AI initiative.

7B DiT — NVIDIA Physical AI
Physically plausible world simulation
Part of Cosmos platform for robotics and simulation
Text-to-world generation

HuggingFace Paper Documentation

8K downloads233 likes

Your hardware

Detecting...

Parameters7B

Max Resolution1024×576

Max Frames57

FPS24

ArchitectureDIT

Licensenvidia-open-model-license

Image Quality Benchmarks

Measured quality metrics for Cosmos Diffusion 7B outputs.

Human Preference Score76%

How often humans prefer this model's output (0-100%)

Aesthetic Score7.0

Visual quality and composition rating (5-9 scale)

This model requires 26+ GB VRAM for basic video generation. A GPU with 24GB+ VRAM is recommended.

VRAM by Scenario

VRAM estimates at FP16 and FP8 precision. FP8 uses ~40% less memory with minimal quality loss. Grade shows how well each GPU handles the generation workload.

FP16 (full precision)

Scenario	VRAM	RTX 4090 24GB	RTX 3060 12GB	RTX 4060 8GB	MacBook Pro M4 Pro 24GB
512×512 · 25 frames	25.9 GB	B	F	F	F
768×512 · 25 frames	26.3 GB	B	F	F	F
768×512 · 100 frames	27.2 GB	B	F	F	F
1280×720 · 25 frames	27.5 GB	B	F	F	F

FP8 (quantized — ~40% less VRAM)

Scenario	VRAM	RTX 4090 24GB	RTX 3060 12GB	RTX 4060 8GB	MacBook Pro M4 Pro 24GB
512×512 · 25 frames	13.7 GB	S	B	F	S
768×512 · 25 frames	14.0 GB	S	D	F	A
768×512 · 100 frames	14.9 GB	S	D	F	A
1280×720 · 25 frames	15.2 GB	S	D	F	A

Optimization Tips

Turbo / LCM distillation

Use distilled scheduler at 4-8 steps for faster iteration

Run with Python

Run with Python (diffusers)

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "nvidia/Cosmos-1.0-Diffusion-7B-Text2World",
    torch_dtype=torch.float16
)
pipe.to("cuda")

frames = pipe(
    prompt="your prompt here",
    num_inference_steps=35,
    guidance_scale=7.5,
    num_frames=57,
).frames[0]
# Save frames or export as video

Get started

Setup instructions for running Cosmos Diffusion 7B locally

1. Download the model

Get the checkpoint from HuggingFace

2. Place in:

ComfyUI/models/checkpoints/

3. Launch ComfyUI

python main.py

Note: Video generation requires video output nodes. Install ComfyUI-VideoHelperSuite from the ComfyUI Manager for SaveAnimatedWEBP or VHS_VideoCombine nodes.

Memory Breakdown

VRAM allocation for 25 frames at 768×512 on RTX 4090 24GB

Required: 26.3 GBAvailable: 24.0 GB

Weights14.0 GB

VAE0.2 GB

Text Encoder9.4 GB

Activations0.9 GB

Overhead0.5 GB

Estimated Generation Time

25 frames at 768×512, 30 steps, FP16.

RTX 4090 24GB~4m 38s

RTX 3060 12GB~9m 3s

RTX 4060 8GB~13m 35s

MacBook Pro M4 Pro 24GB~37m 18s

Sample Outputs

Available Formats & Downloads

Download Cosmos Diffusion 7B in different precisions. Lower precision = less VRAM but slight quality loss.

Formato	Precisão	Tamanho	Provedor
safetensorsRecomendado	BF16	14.0 GB	official	Baixar

Related Workflows

Browse Workflows →

CogVideoX 5B5B · THUDM Mochi 1 Preview10B · Genmo Wan2.2 TI2V 5B5B · Wan-AI

Frequently asked questions

FAQ — Cosmos Diffusion 7B

How much VRAM does Cosmos Diffusion 7B need for video?

Cosmos Diffusion 7B (7B parameters) requires approximately 26.3 GB of VRAM at FP16 precision for generating 25 frames at 768×512. Video generation typically requires more VRAM than image generation due to temporal attention layers.

Can I run Cosmos Diffusion 7B on RTX 4090?

Cosmos Diffusion 7B can run on the RTX 4090 with sequential offloading, though video generation will be significantly slower than native fit.

How long does it take to generate a video with Cosmos Diffusion 7B?

On a reference GPU (RTX 4090 24GB), Cosmos Diffusion 7B generates a 25-frame video at 768×512 in approximately ~4m 38s at FP16 with 30 inference steps. Faster GPUs with higher memory bandwidth will reduce generation time.

What resolution and frame count does Cosmos Diffusion 7B support?

Cosmos Diffusion 7B supports up to 1024×576 resolution and 57 frames per generation at 24 FPS. Higher resolutions and frame counts require proportionally more VRAM.

About Cosmos Diffusion 7B

Use cases

video-generationtext-to-videoworld-modelphysical-ai

Recommended runtimes

diffusers