Wan2.2 TI2V 5B

Name: Wan2.2 TI2V 5B
Author: Wan-AI

Frontier

by Wan-AI

5B text+image-to-video model from the Wan 2.2 family. Runs on consumer GPUs with 8GB+ VRAM. Takes text and reference image as input to generate coherent video clips.

5B params — runs on 8GB+ VRAM
Text + image to video generation
Apache 2.0 — fully open for commercial use
Part of the Wan 2.2 family

HuggingFace Paper Documentation

52K downloads115 likes

Your hardware

Detecting...

Parameters5B

Max Resolution832×480

Max Frames81

FPS16

Architecture3D-DIT

Licenseapache-2.0

Image Quality Benchmarks

Measured quality metrics for Wan2.2 TI2V 5B outputs.

Human Preference Score78%

How often humans prefer this model's output (0-100%)

Aesthetic Score7.2

Visual quality and composition rating (5-9 scale)

This model requires 27+ GB VRAM for basic video generation. A GPU with 24GB+ VRAM is recommended.

VRAM by Scenario

VRAM estimates at FP16 and FP8 precision. FP8 uses ~40% less memory with minimal quality loss. Grade shows how well each GPU handles the generation workload.

FP16 (full precision)

Scenario	VRAM	RTX 4090 24GB	RTX 3060 12GB	RTX 4060 8GB	MacBook Pro M4 Pro 24GB
512×512 · 25 frames	25.3 GB	B	F	F	F
768×512 · 25 frames	27.4 GB	B	F	F	F
768×512 · 100 frames	33.7 GB	F	F	F	F
1280×720 · 25 frames	35.9 GB	F	F	F	F

FP8 (quantized — ~40% less VRAM)

Scenario	VRAM	RTX 4090 24GB	RTX 3060 12GB	RTX 4060 8GB	MacBook Pro M4 Pro 24GB
512×512 · 25 frames	15.1 GB	S	D	F	A
768×512 · 25 frames	17.2 GB	S	F	F	B
768×512 · 100 frames	23.5 GB	B	F	F	D
1280×720 · 25 frames	25.7 GB	B	F	F	F

Optimization Tips

Turbo / LCM distillation

Use distilled scheduler at 4-8 steps for faster iteration

Run with Python

Run with Python (diffusers)

from diffusers import WanPipeline
import torch

pipe = WanPipeline.from_pretrained(
    "Wan-AI/Wan2.2-TI2V-5B-Diffusers",
    torch_dtype=torch.float16
)
pipe.to("cuda")

frames = pipe(
    prompt="your prompt here",
    num_inference_steps=50,
    guidance_scale=5.0,
    num_frames=81,
).frames[0]
# Save frames or export as video

Get started

Setup instructions for running Wan2.2 TI2V 5B locally

1. Download the model

Get the checkpoint from HuggingFace

2. Place in:

ComfyUI/models/checkpoints/

3. Launch ComfyUI

python main.py

Note: Video generation requires video output nodes. Install ComfyUI-VideoHelperSuite from the ComfyUI Manager for SaveAnimatedWEBP or VHS_VideoCombine nodes.

Memory Breakdown

VRAM allocation for 25 frames at 768×512 on RTX 4090 24GB

Required: 27.4 GBAvailable: 24.0 GB

Weights10.0 GB

VAE0.2 GB

Text Encoder9.4 GB

Activations6.0 GB

Overhead0.5 GB

Estimated Generation Time

25 frames at 768×512, 30 steps, FP16.

RTX 4090 24GB~4m 23s

RTX 3060 12GB~7m 52s

RTX 4060 8GB~11m 53s

MacBook Pro M4 Pro 24GB~16m 55s

Sample Outputs

Available Formats & Downloads

Download Wan2.2 TI2V 5B in different precisions. Lower precision = less VRAM but slight quality loss.

Format	Präzision	Größe	Anbieter
safetensorsEmpfohlen	BF16	10.5 GB	official	Herunterladen

LoRA Ecosystem

Limited

Early LoRA ecosystem. Compatible with some Wan 2.1 LoRAs.

Related Workflows

Mochi 1 Preview10B · Genmo CogVideoX 5B5B · THUDM HunyuanVideo 1.58.3B · Tencent

Frequently asked questions

FAQ — Wan2.2 TI2V 5B

How much VRAM does Wan2.2 TI2V 5B need for video?

Wan2.2 TI2V 5B (5B parameters) requires approximately 27.4 GB of VRAM at FP16 precision for generating 25 frames at 768×512. Video generation typically requires more VRAM than image generation due to temporal attention layers.

Can I run Wan2.2 TI2V 5B on RTX 4090?

Wan2.2 TI2V 5B can run on the RTX 4090 with sequential offloading, though video generation will be significantly slower than native fit.

How long does it take to generate a video with Wan2.2 TI2V 5B?

On a reference GPU (RTX 4090 24GB), Wan2.2 TI2V 5B generates a 25-frame video at 768×512 in approximately ~4m 23s at FP16 with 30 inference steps. Faster GPUs with higher memory bandwidth will reduce generation time.

What resolution and frame count does Wan2.2 TI2V 5B support?

Wan2.2 TI2V 5B supports up to 832×480 resolution and 81 frames per generation at 16 FPS. Higher resolutions and frame counts require proportionally more VRAM.

About Wan2.2 TI2V 5B

Use cases

video-generationtext-image-to-videoaccessible

Recommended runtimes

comfyuidiffusers