Video Generation Models
Check if your GPU can run LTX Video, CogVideoX and other video generation models locally. Video models are VRAM-hungry — most require 16GB+ for basic generation.
Not sure which model to pick? Browse by workflow — choose by what you want to create.
VideoUpdated version of Wan Video 2.1. Improved motion quality, better temporal coherence, and higher visual fidelity. 14B parameter 3D DiT architecture with Apache 2.0 license. Supports text-to-video and text+image-to-video generation.
VideoHighest quality LTX Video model at 13B parameters. Available in dev (best quality), distilled (faster), and FP8 (lower VRAM) variants. Produces high-fidelity video with strong temporal coherence.
VideoState-of-the-art open-source video generation model from Alibaba. 14B parameter 3D DiT with exceptional motion quality, temporal coherence, and visual fidelity. Supports text-to-video and image-to-video.
VideoLightweight 2B video generation model from Lightricks (v0.9.8). Available in dev, distilled, and FP8 variants. The distilled version generates video faster than real-time at lower resolutions. For best quality, use the 13B variant instead.
Video24B autoregressive diffusion model for streaming video generation. Produces high-quality cinematic video with strong temporal coherence. Requires 80GB+ VRAM for full inference. Apache 2.0 licensed.
VideoLarge-scale video generation model from Tencent. 13B parameter 3D DiT with Hunyuan-Large MLLM text encoder (~7B, not T5-based). Strong motion quality and visual fidelity up to 720p.
VideoBestWishYshHelios 14B14B distilled video model achieving 19.5 FPS real-time generation. Based on Wan2.1-T2V-14B with pyramid distillation for minute-scale coherent video. Apache 2.0 licensed.
VideoConsumer-oriented successor to HunyuanVideo 13B from Tencent. 8.3B parameter 3D DiT supporting both text-to-video and image-to-video (T2V + I2V). Step-distilled variant runs 480p at ~75s on RTX 4090; minimum ~14GB VRAM with offload in FP16.
VideoSkyworkSkyReels V2 14B14B diffusion-forcing video model supporting infinite-length video generation at 720p. Uses autoregressive diffusion-forcing architecture for seamless long-form video without quality degradation.
Video10B parameter video generation model from Genmo using AsymmDiT architecture with T5-XXL text encoder. Generates 848x480 videos at 30fps with strong motion quality. Apache 2.0 licensed.
VideoWan-AIWan2.2 TI2V 5B5B text+image-to-video model from the Wan 2.2 family. Runs on consumer GPUs with 8GB+ VRAM. Takes text and reference image as input to generate coherent video clips.
VideoTHUDMCogVideoX 5BOpen-source video generation model from Tsinghua University. 3D full-attention transformer with expert adaptive LayerNorm. Generates 6-second clips at 8fps.
Video7B diffusion model from NVIDIA's Cosmos platform for physical AI and world modeling. Generates physically plausible videos from text descriptions. Part of NVIDIA's Physical AI initiative.
VideolllyasvielFramePack I2VViral low-VRAM video generation model based on HunyuanVideo architecture. Uses a novel next-frame prediction approach that inverts the diffusion process to pack future frames into the noise of the current frame, enabling video generation with only 6GB VRAM. Image-to-video with strong motion quality.
VideoLightweight video generation model from Alibaba. Only 1.3B params — runs on consumer GPUs with 8GB+ VRAM. Good quality for its size, excellent for rapid iteration.
VideoTHUDMCogVideoX 2BLightweight 2B video generation model from Tsinghua University. Most accessible CogVideoX variant, runs on 8GB+ VRAM with quantization. Generates 6-second clips at 8fps. Apache 2.0 licensed.
VideoguoywwAnimateDiff v1.5.3Motion adapter module that plugs into any SD 1.5 checkpoint to generate short animated clips. Only 0.4B extra parameters on top of the SD 1.5 base model. Generates 16 frames at 8fps (2-second clips).