Wan Video 2.1 1.3B

Name: Wan Video 2.1 1.3B
Author: Alibaba

Frontier

by Alibaba

Lightweight video generation model from Alibaba. Only 1.3B params — runs on consumer GPUs with 8GB+ VRAM. Good quality for its size, excellent for rapid iteration.

Only 1.3B params — runs on 8GB VRAM
Same architecture as Wan 14B, distilled
Good quality-to-size ratio for video
Apache 2.0 license

HuggingFace GitHub Paper Documentation

15K downloads437 likes

Your hardware

Detecting...

Parameters1.3B

Max Resolution832×480

Max Frames81

FPS16

Architecture3D-DIT

Licenseapache-2.0

Image Quality Benchmarks

Measured quality metrics for Wan Video 2.1 1.3B outputs.

Human Preference Score68%

How often humans prefer this model's output (0-100%)

Aesthetic Score6.2

Visual quality and composition rating (5-9 scale)

VRAM by Scenario

VRAM estimates at FP16 and FP8 precision. FP8 uses ~40% less memory with minimal quality loss. Grade shows how well each GPU handles the generation workload.

FP16 (full precision)

Scenario	VRAM	RTX 4090 24GB	RTX 3060 12GB	RTX 4060 8GB	MacBook Pro M4 Pro 24GB
512×512 · 25 frames	9.1 GB	S●	S●	B●	S●
768×512 · 25 frames	9.8 GB	S●	A●	D●	S●
768×512 · 100 frames	11.8 GB	S●	B●	F●	F●
1280×720 · 25 frames	12.6 GB	S●	B●	F●	F●

FP8 (quantized — ~40% less VRAM)

Scenario	VRAM	RTX 4090 24GB	RTX 3060 12GB	RTX 4060 8GB	MacBook Pro M4 Pro 24GB
512×512 · 25 frames	16.2 GB	S	D	F	B
768×512 · 25 frames	18.3 GB	S	F	F	B
768×512 · 100 frames	24.6 GB	B	F	F	F
1280×720 · 25 frames	26.7 GB	B	F	F	F

Optimization Tips

Turbo / LCM distillation

Use distilled scheduler at 4-8 steps for faster iteration

Run with Python

Run with Python (diffusers)

from diffusers import WanPipeline
import torch

pipe = WanPipeline.from_pretrained(
    "Wan-AI/Wan2.1-T2V-1.3B",
    torch_dtype=torch.float16
)
pipe.to("cuda")

frames = pipe(
    prompt="your prompt here",
    num_inference_steps=50,
    guidance_scale=5.0,
    num_frames=81,
).frames[0]
# Save frames or export as video

Get started

Setup instructions for running Wan Video 2.1 1.3B locally

1. Download the model

Get the checkpoint from HuggingFace

2. Place in:

ComfyUI/models/checkpoints/

3. Launch ComfyUI

python main.py

Note: Video generation requires video output nodes. Install ComfyUI-VideoHelperSuite from the ComfyUI Manager for SaveAnimatedWEBP or VHS_VideoCombine nodes.

Memory Breakdown

VRAM allocation for 25 frames at 768×512 on RTX 4090 24GB

Required: 9.8 GBAvailable: 24.0 GB

Weights2.6 GB

VAE0.2 GB

Text Encoder18.8 GB

Activations6.0 GB

Overhead0.5 GB

Estimated Generation Time

25 frames at 768×512, 30 steps, FP16.

RTX 4090 24GB~1m 13s

RTX 3060 12GB~4m 35s

RTX 4060 8GB~6m 55s

MacBook Pro M4 Pro 24GB~9m 53s

Sample Outputs

Available Formats & Downloads

Download Wan Video 2.1 1.3B in different precisions. Lower precision = less VRAM but slight quality loss.

格式	精度	大小	提供商
safetensors推荐	FP16	2.6 GB	official	下载

LoRA Ecosystem

Limited

Few LoRAs available for the 1.3B variant.

Related Workflows

Browse Workflows →

FramePack I2V13B · lllyasviel CogVideoX 2B2B · THUDM CogVideoX 5B5B · THUDM

Frequently asked questions

FAQ — Wan Video 2.1 1.3B

How much VRAM does Wan Video 2.1 1.3B need for video?

Wan Video 2.1 1.3B (1.3B parameters) requires approximately 9.8 GB of VRAM at FP16 precision for generating 25 frames at 768×512. Video generation typically requires more VRAM than image generation due to temporal attention layers.

Can I run Wan Video 2.1 1.3B on RTX 4090?

Yes, the RTX 4090 (24 GB VRAM) can run Wan Video 2.1 1.3B at FP16. Expected generation time is around ~1m 13s for a 25-frame clip.

How long does it take to generate a video with Wan Video 2.1 1.3B?

On a reference GPU (RTX 4090 24GB), Wan Video 2.1 1.3B generates a 25-frame video at 768×512 in approximately ~1m 13s at FP16 with 30 inference steps. Faster GPUs with higher memory bandwidth will reduce generation time.

What resolution and frame count does Wan Video 2.1 1.3B support?

Wan Video 2.1 1.3B supports up to 832×480 resolution and 81 frames per generation at 16 FPS. Higher resolutions and frame counts require proportionally more VRAM.

About Wan Video 2.1 1.3B

Use cases

video-generationtext-to-videofast-generation

Recommended runtimes

comfyuidiffusers