Will It Run AI

CogVideoX 2B

Stable

by THUDM

Lightweight 2B video generation model from Tsinghua University. Most accessible CogVideoX variant, runs on 8GB+ VRAM with quantization. Generates 6-second clips at 8fps. Apache 2.0 licensed.

  • Only 2B params — most accessible CogVideoX
  • Runs on 8GB+ VRAM with quantization
  • 6-second clips at 8fps
  • Apache 2.0 — fully open for commercial use

Your hardware

Detecting...

Parameters2B
Max Resolution720×480
Max Frames49
FPS8
Architecture3D-DIT
Licenseapache-2.0

Image Quality Benchmarks

Measured quality metrics for CogVideoX 2B outputs.

Human Preference Score65%

How often humans prefer this model's output (0-100%)

Aesthetic Score6.2

Visual quality and composition rating (5-9 scale)

This model requires 21+ GB VRAM for basic video generation. A GPU with 24GB+ VRAM is recommended.

VRAM by Scenario

VRAM estimates at FP16 and FP8 precision. FP8 uses ~40% less memory with minimal quality loss. Grade shows how well each GPU handles the generation workload.

FP16 (full precision)

ScenarioVRAMRTX 4090 24GBRTX 3060 12GBRTX 4060 8GBMacBook Pro M4 Pro 24GB
512×512 · 25 frames19.0 GBSFFB
768×512 · 25 frames21.1 GBAFFD
768×512 · 100 frames27.4 GBBFFF
1280×720 · 25 frames29.6 GBDFFF

FP8 (quantized — ~40% less VRAM)

ScenarioVRAMRTX 4090 24GBRTX 3060 12GBRTX 4060 8GBMacBook Pro M4 Pro 24GB
512×512 · 25 frames12.0 GBSBFS
768×512 · 25 frames14.1 GBSDFA
768×512 · 100 frames20.4 GBAFFD
1280×720 · 25 frames22.5 GBBFFD

Optimization Tips

Turbo / LCM distillation

Use distilled scheduler at 4-8 steps for faster iteration

Run with Python

Run with Python (diffusers)
from diffusers import CogVideoXPipeline
import torch

pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-2b",
    torch_dtype=torch.float16
)
pipe.to("cuda")

frames = pipe(
    prompt="your prompt here",
    num_inference_steps=50,
    guidance_scale=6.0,
    num_frames=49,
).frames[0]
# Save frames or export as video

Get started

Setup instructions for running CogVideoX 2B locally

1. Download the model

Get the checkpoint from HuggingFace

2. Place in:

ComfyUI/models/checkpoints/

3. Launch ComfyUI

python main.py
Note: Video generation requires video output nodes. Install ComfyUI-VideoHelperSuite from the ComfyUI Manager for SaveAnimatedWEBP or VHS_VideoCombine nodes.

Memory Breakdown

VRAM allocation for 25 frames at 768×512 on RTX 4090 24GB

Required: 21.1 GBAvailable: 24.0 GB
Weights4.0 GB
VAE0.2 GB
Text Encoder9.4 GB
Activations6.0 GB
Overhead0.5 GB

Estimated Generation Time

25 frames at 768×512, 30 steps, FP16.

RTX 4090 24GB~1m 28s
RTX 3060 12GB~5m 28s
RTX 4060 8GB~8m 15s
MacBook Pro M4 Pro 24GB~35m 8s

Sample Outputs

Available Formats & Downloads

Download CogVideoX 2B in different precisions. Lower precision = less VRAM but slight quality loss.

格式精度大小提供商
safetensorsFP164.3 GBofficial下载

LoRA Ecosystem

Limited

Few LoRAs available for CogVideoX-2B.

Related Workflows

You might also like

Frequently asked questions

FAQ — CogVideoX 2B

How much VRAM does CogVideoX 2B need for video?

CogVideoX 2B (2B parameters) requires approximately 21.1 GB of VRAM at FP16 precision for generating 25 frames at 768×512. Video generation typically requires more VRAM than image generation due to temporal attention layers.

Can I run CogVideoX 2B on RTX 4090?

Yes, the RTX 4090 (24 GB VRAM) can run CogVideoX 2B at FP16. Expected generation time is around ~1m 28s for a 25-frame clip.

How long does it take to generate a video with CogVideoX 2B?

On a reference GPU (RTX 4090 24GB), CogVideoX 2B generates a 25-frame video at 768×512 in approximately ~1m 28s at FP16 with 30 inference steps. Faster GPUs with higher memory bandwidth will reduce generation time.

What resolution and frame count does CogVideoX 2B support?

CogVideoX 2B supports up to 720×480 resolution and 49 frames per generation at 8 FPS. Higher resolutions and frame counts require proportionally more VRAM.

About CogVideoX 2B

Use cases
video-generationtext-to-videoaccessible
Recommended runtimes
comfyuidiffusers

See also