CogVideoX 2B

Name: CogVideoX 2B
Author: THUDM

Stable

by THUDM

Lightweight 2B video generation model from Tsinghua University. Most accessible CogVideoX variant, runs on 8GB+ VRAM with quantization. Generates 6-second clips at 8fps. Apache 2.0 licensed.

Only 2B params — most accessible CogVideoX
Runs on 8GB+ VRAM with quantization
6-second clips at 8fps
Apache 2.0 — fully open for commercial use

HuggingFace GitHub Paper Documentation

17K downloads353 likes

Your hardware

Detecting...

Parameters2B

Max Resolution720×480

Max Frames49

FPS8

Architecture3D-DIT

Licenseapache-2.0

Image Quality Benchmarks

Measured quality metrics for CogVideoX 2B outputs.

Human Preference Score65%

How often humans prefer this model's output (0-100%)

Aesthetic Score6.2

Visual quality and composition rating (5-9 scale)

This model requires 21+ GB VRAM for basic video generation. A GPU with 24GB+ VRAM is recommended.

VRAM by Scenario

VRAM estimates at FP16 and FP8 precision. FP8 uses ~40% less memory with minimal quality loss. Grade shows how well each GPU handles the generation workload.

FP16 (full precision)

Scenario	VRAM	RTX 4090 24GB	RTX 3060 12GB	RTX 4060 8GB	MacBook Pro M4 Pro 24GB
512×512 · 25 frames	19.0 GB	S	F	F	B
768×512 · 25 frames	21.1 GB	A	F	F	D
768×512 · 100 frames	27.4 GB	B	F	F	F
1280×720 · 25 frames	29.6 GB	D	F	F	F

FP8 (quantized — ~40% less VRAM)

Scenario	VRAM	RTX 4090 24GB	RTX 3060 12GB	RTX 4060 8GB	MacBook Pro M4 Pro 24GB
512×512 · 25 frames	12.0 GB	S	B	F	S
768×512 · 25 frames	14.1 GB	S	D	F	A
768×512 · 100 frames	20.4 GB	A	F	F	D
1280×720 · 25 frames	22.5 GB	B	F	F	D

Optimization Tips

Turbo / LCM distillation

Use distilled scheduler at 4-8 steps for faster iteration

Run with Python

Run with Python (diffusers)

from diffusers import CogVideoXPipeline
import torch

pipe = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-2b",
    torch_dtype=torch.float16
)
pipe.to("cuda")

frames = pipe(
    prompt="your prompt here",
    num_inference_steps=50,
    guidance_scale=6.0,
    num_frames=49,
).frames[0]
# Save frames or export as video

Get started

Setup instructions for running CogVideoX 2B locally

1. Download the model

Get the checkpoint from HuggingFace

2. Place in:

ComfyUI/models/checkpoints/

3. Launch ComfyUI

python main.py

Note: Video generation requires video output nodes. Install ComfyUI-VideoHelperSuite from the ComfyUI Manager for SaveAnimatedWEBP or VHS_VideoCombine nodes.

Memory Breakdown

VRAM allocation for 25 frames at 768×512 on RTX 4090 24GB

Required: 21.1 GBAvailable: 24.0 GB

Weights4.0 GB

VAE0.2 GB

Text Encoder9.4 GB

Activations6.0 GB

Overhead0.5 GB

Estimated Generation Time

25 frames at 768×512, 30 steps, FP16.

RTX 4090 24GB~1m 28s

RTX 3060 12GB~5m 28s

RTX 4060 8GB~8m 15s

MacBook Pro M4 Pro 24GB~35m 8s

Sample Outputs

Available Formats & Downloads

Download CogVideoX 2B in different precisions. Lower precision = less VRAM but slight quality loss.

格式	精度	大小	提供商
safetensors	FP16	4.3 GB	official	下载

LoRA Ecosystem

Limited

Few LoRAs available for CogVideoX-2B.

Related Workflows

Browse Workflows →

Wan Video 2.1 1.3B1.3B · Alibaba FramePack I2V13B · lllyasviel CogVideoX 5B5B · THUDM

Frequently asked questions

FAQ — CogVideoX 2B

How much VRAM does CogVideoX 2B need for video?

CogVideoX 2B (2B parameters) requires approximately 21.1 GB of VRAM at FP16 precision for generating 25 frames at 768×512. Video generation typically requires more VRAM than image generation due to temporal attention layers.

Can I run CogVideoX 2B on RTX 4090?

Yes, the RTX 4090 (24 GB VRAM) can run CogVideoX 2B at FP16. Expected generation time is around ~1m 28s for a 25-frame clip.

How long does it take to generate a video with CogVideoX 2B?

On a reference GPU (RTX 4090 24GB), CogVideoX 2B generates a 25-frame video at 768×512 in approximately ~1m 28s at FP16 with 30 inference steps. Faster GPUs with higher memory bandwidth will reduce generation time.

What resolution and frame count does CogVideoX 2B support?

CogVideoX 2B supports up to 720×480 resolution and 49 frames per generation at 8 FPS. Higher resolutions and frame counts require proportionally more VRAM.

About CogVideoX 2B

Use cases

video-generationtext-to-videoaccessible

Recommended runtimes

comfyuidiffusers