Mochi 1 Preview

Name: Mochi 1 Preview
Author: Genmo

Stable

by Genmo

10B parameter video generation model from Genmo using AsymmDiT architecture with T5-XXL text encoder. Generates 848x480 videos at 30fps with strong motion quality. Apache 2.0 licensed.

10B AsymmDiT — strong motion quality
848x480 at 30fps
Apache 2.0 — fully open for commercial use
~22GB VRAM with model offloading

HuggingFace GitHub Documentation

11K downloads1K likes

Your hardware

Detecting...

Parameters10B

Max Resolution848×480

Max Frames84

FPS30

Architecture3D-DIT

Licenseapache-2.0

Image Quality Benchmarks

Measured quality metrics for Mochi 1 Preview outputs.

Human Preference Score78%

How often humans prefer this model's output (0-100%)

Aesthetic Score7.0

Visual quality and composition rating (5-9 scale)

This model requires 38+ GB VRAM for basic video generation. A GPU with 24GB+ VRAM is recommended.

VRAM by Scenario

VRAM estimates at FP16 and FP8 precision. FP8 uses ~40% less memory with minimal quality loss. Grade shows how well each GPU handles the generation workload.

FP16 (full precision)

Scenario	VRAM	RTX 4090 24GB	RTX 3060 12GB	RTX 4060 8GB	MacBook Pro M4 Pro 24GB
512×512 · 25 frames	35.8 GB	F	F	F	F
768×512 · 25 frames	37.9 GB	F	F	F	F
768×512 · 100 frames	44.2 GB	F	F	F	F
1280×720 · 25 frames	46.4 GB	F	F	F	F

FP8 (quantized — ~40% less VRAM)

Scenario	VRAM	RTX 4090 24GB	RTX 3060 12GB	RTX 4060 8GB	MacBook Pro M4 Pro 24GB
512×512 · 25 frames	20.4 GB	A	F	F	D
768×512 · 25 frames	22.5 GB	B	F	F	D
768×512 · 100 frames	28.8 GB	D	F	F	F
1280×720 · 25 frames	30.9 GB	D	F	F	F

Optimization Tips

Turbo / LCM distillation

Use distilled scheduler at 4-8 steps for faster iteration

Run with Python

Run with Python (diffusers)

from diffusers import MochiPipeline
import torch

pipe = MochiPipeline.from_pretrained(
    "genmo/mochi-1-preview",
    torch_dtype=torch.float16
)
pipe.to("cuda")

frames = pipe(
    prompt="your prompt here",
    num_inference_steps=64,
    guidance_scale=4.5,
    num_frames=84,
).frames[0]
# Save frames or export as video

Get started

Setup instructions for running Mochi 1 Preview locally

1. Download the model

Get the checkpoint from HuggingFace

2. Place in:

ComfyUI/models/checkpoints/

3. Launch ComfyUI

python main.py

Note: Video generation requires video output nodes. Install ComfyUI-VideoHelperSuite from the ComfyUI Manager for SaveAnimatedWEBP or VHS_VideoCombine nodes.

Memory Breakdown

VRAM allocation for 25 frames at 768×512 on RTX 4090 24GB

Required: 37.9 GBAvailable: 24.0 GB

Weights20.0 GB

VAE0.2 GB

Text Encoder9.4 GB

Activations6.0 GB

Overhead0.5 GB

Estimated Generation Time

25 frames at 768×512, 30 steps, FP16.

RTX 4090 24GB~2m 45s

RTX 3060 12GB~10m 25s

RTX 4060 8GB~15m 40s

MacBook Pro M4 Pro 24GB~22m 18s

Sample Outputs

Available Formats & Downloads

Download Mochi 1 Preview in different precisions. Lower precision = less VRAM but slight quality loss.

格式	精度	大小	提供商
safetensors推荐	FP16	20.0 GB	official	下载

Related Workflows

Browse Workflows →

Wan2.2 TI2V 5B5B · Wan-AI CogVideoX 5B5B · THUDM HunyuanVideo 1.58.3B · Tencent

Frequently asked questions

FAQ — Mochi 1 Preview

How much VRAM does Mochi 1 Preview need for video?

Mochi 1 Preview (10B parameters) requires approximately 37.9 GB of VRAM at FP16 precision for generating 25 frames at 768×512. Video generation typically requires more VRAM than image generation due to temporal attention layers.

Can I run Mochi 1 Preview on RTX 4090?

Mochi 1 Preview exceeds the RTX 4090's 24 GB VRAM at FP16 for video generation. Consider reducing resolution, frame count, or using a GPU with more VRAM.

How long does it take to generate a video with Mochi 1 Preview?

On a reference GPU (RTX 4090 24GB), Mochi 1 Preview generates a 25-frame video at 768×512 in approximately ~2m 45s at FP16 with 30 inference steps. Faster GPUs with higher memory bandwidth will reduce generation time.

What resolution and frame count does Mochi 1 Preview support?

Mochi 1 Preview supports up to 848×480 resolution and 84 frames per generation at 30 FPS. Higher resolutions and frame counts require proportionally more VRAM.

About Mochi 1 Preview

Use cases

video-generationtext-to-videocinematic

Recommended runtimes

comfyuidiffusers