Video Generation Models

Check if your GPU can run LTX Video, CogVideoX and other video generation models locally. Video models are VRAM-hungry — most require 16GB+ for basic generation.

Not sure which model to pick? Browse by workflow — choose by what you want to create.

Wan Video 2.2 14B text-to-video generated frameVideoAlibabaAlibabaWan Video 2.2 14B
14B paramsup to 1280×720~47 GB VRAM81 frames max
video-generationtext-to-videoimage-to-video
Top tier

Updated version of Wan Video 2.1. Improved motion quality, better temporal coherence, and higher visual fidelity. 14B parameter 3D DiT architecture with Apache 2.0 license. Supports text-to-video and text+image-to-video generation.

16 fps · 3D-DIT
Video frame generated by LTX-Video 1.3B showing text-to-video capabilitiesVideoLightricksLightricksLTX Video 13B
13B paramsup to 1280×720~35.6 GB VRAM257 frames max
video-generationtext-to-videoimage-to-video
Top tier

Highest quality LTX Video model at 13B parameters. Available in dev (best quality), distilled (faster), and FP8 (lower VRAM) variants. Produces high-fidelity video with strong temporal coherence.

24 fps · 3D-DIT
Wan Video 2.1 14B text-to-video generated frameVideoAlibabaAlibabaWan Video 2.1 14B
14B paramsup to 1280×720~47 GB VRAM81 frames max
video-generationtext-to-videoimage-to-video
Top tier

State-of-the-art open-source video generation model from Alibaba. 14B parameter 3D DiT with exceptional motion quality, temporal coherence, and visual fidelity. Supports text-to-video and image-to-video.

16 fps · 3D-DIT
LTX Video 2B sample frame from trailerVideoLightricksLightricksLTX Video 2B
2B paramsup to 1280×720~13.6 GB VRAM161 frames max
video-generationtext-to-videoimage-to-video
High

Lightweight 2B video generation model from Lightricks (v0.9.8). Available in dev, distilled, and FP8 variants. The distilled version generates video faster than real-time at lower resolutions. For best quality, use the 13B variant instead.

24 fps · 3D-DIT
MAGI-1 autoregressive video generation sample frameVideoSand AISand AIMAGI-1
24B paramsup to 1280×720~57.6 GB VRAM120 frames max
video-generationstreaming-videocinematic
High

24B autoregressive diffusion model for streaming video generation. Produces high-quality cinematic video with strong temporal coherence. Requires 80GB+ VRAM for full inference. Apache 2.0 licensed.

24 fps · 3D-DIT
HunyuanVideo text-to-video generation sample frameVideoTencentTencentHunyuanVideo
13B paramsup to 1280×720~40.2 GB VRAM129 frames max
video-generationtext-to-video
High

Large-scale video generation model from Tencent. 13B parameter 3D DiT with Hunyuan-Large MLLM text encoder (~7B, not T5-based). Strong motion quality and visual fidelity up to 720p.

24 fps · 3D-DIT
Helios 14B real-time video generation at 19.5 FPSVideoBestWishYshHelios 14B
14B paramsup to 1280×720~37.6 GB VRAM240 frames max
video-generationreal-time-videolong-video
High

14B distilled video model achieving 19.5 FPS real-time generation. Based on Wan2.1-T2V-14B with pyramid distillation for minute-scale coherent video. Apache 2.0 licensed.

24 fps · 3D-DIT
HunyuanVideo 1.5 text-to-video generation sample frameVideoTencentTencentHunyuanVideo 1.5
8.3B paramsup to 1280×720~30.8 GB VRAM129 frames max
video-generationtext-to-videoimage-to-video
High

Consumer-oriented successor to HunyuanVideo 13B from Tencent. 8.3B parameter 3D DiT supporting both text-to-video and image-to-video (T2V + I2V). Step-distilled variant runs 480p at ~75s on RTX 4090; minimum ~14GB VRAM with offload in FP16.

24 fps · 3D-DIT
SkyReels V2 14B infinite-length video generation sampleVideoSkyworkSkyReels V2 14B
14B paramsup to 1280×720~37.6 GB VRAM121 frames max
video-generationlong-videoinfinite-length
High

14B diffusion-forcing video model supporting infinite-length video generation at 720p. Uses autoregressive diffusion-forcing architecture for seamless long-form video without quality degradation.

24 fps · 3D-DIT
Mochi 1 Preview video generation showcase from GenmoVideoGenmoGenmoMochi 1 Preview
10B paramsup to 848×480~29.6 GB VRAM84 frames max
video-generationtext-to-videocinematic
High

10B parameter video generation model from Genmo using AsymmDiT architecture with T5-XXL text encoder. Generates 848x480 videos at 30fps with strong motion quality. Apache 2.0 licensed.

30 fps · 3D-DIT
Wan Video 2.2 TI2V 5B text+image-to-video sample inputVideoWan-AIWan2.2 TI2V 5B
5B paramsup to 832×480~19.6 GB VRAM81 frames max
video-generationtext-image-to-videoaccessible
High

5B text+image-to-video model from the Wan 2.2 family. Runs on consumer GPUs with 8GB+ VRAM. Takes text and reference image as input to generate coherent video clips.

16 fps · 3D-DIT
CogVideoX-5B text-to-video generation sample frameVideoTHUDMCogVideoX 5B
5B paramsup to 720×480~19.6 GB VRAM49 frames max
video-generationtext-to-video
High

Open-source video generation model from Tsinghua University. 3D full-attention transformer with expert adaptive LayerNorm. Generates 6-second clips at 8fps.

8 fps · 3D-DIT
NVIDIA Cosmos Diffusion 7B world generation frameVideoNVIDIANVIDIACosmos Diffusion 7B
7B paramsup to 1024×576~23.6 GB VRAM57 frames max
video-generationtext-to-videoworld-model
High

7B diffusion model from NVIDIA's Cosmos platform for physical AI and world modeling. Generates physically plausible videos from text descriptions. Part of NVIDIA's Physical AI initiative.

24 fps · DIT
FramePack I2V image-to-video generation with HunyuanVideo backboneVideolllyasvielFramePack I2V
13B paramsup to 1280×720~40.2 GB VRAM129 frames max
video-generationimage-to-videolow-vram
High

Viral low-VRAM video generation model based on HunyuanVideo architecture. Uses a novel next-frame prediction approach that inverts the diffusion process to pack future frames into the noise of the current frame, enabling video generation with only 6GB VRAM. Image-to-video with strong motion quality.

30 fps · 3D-DIT
Wan Video 2.1 1.3B text-to-video generated frameVideoAlibabaAlibabaWan Video 2.1 1.3B
1.3B paramsup to 832×480~21.6 GB VRAM81 frames max
video-generationtext-to-videofast-generation
High

Lightweight video generation model from Alibaba. Only 1.3B params — runs on consumer GPUs with 8GB+ VRAM. Good quality for its size, excellent for rapid iteration.

16 fps · 3D-DIT
CogVideoX-2B text-to-video generation sample frameVideoTHUDMCogVideoX 2B
2B paramsup to 720×480~13.6 GB VRAM49 frames max
video-generationtext-to-videoaccessible
Mid

Lightweight 2B video generation model from Tsinghua University. Most accessible CogVideoX variant, runs on 8GB+ VRAM with quantization. Generates 6-second clips at 8fps. Apache 2.0 licensed.

8 fps · 3D-DIT
Animated scene generated by AnimateDiff v1.5-3 showing a sunset over the oceanVideoguoywwAnimateDiff v1.5.3
0.4B paramsup to 512×512~1.2 GB VRAM16 frames max
video-generationanimationmotion
Mid

Motion adapter module that plugs into any SD 1.5 checkpoint to generate short animated clips. Only 0.4B extra parameters on top of the SD 1.5 base model. Generates 16 frames at 8fps (2-second clips).

8 fps · UNET