FramePack I2V

Name: FramePack I2V
Author: lllyasviel

Frontier

by lllyasviel

Viral low-VRAM video generation model based on HunyuanVideo architecture. Uses a novel next-frame prediction approach that inverts the diffusion process to pack future frames into the noise of the current frame, enabling video generation with only 6GB VRAM. Image-to-video with strong motion quality.

Generates AI video with only 6GB VRAM
Based on HunyuanVideo architecture optimized for low VRAM
Novel next-frame prediction packs future frames into noise
Image-to-video with strong temporal coherence
Apache 2.0 licensed — fully open source

HuggingFace GitHub Documentation

38K downloads124 likes

Your hardware

Detecting...

Parameters13B

Max Resolution1280×720

Max Frames129

FPS30

Architecture3D-DIT

Licenseapache-2.0

Image Quality Benchmarks

Measured quality metrics for FramePack I2V outputs.

Human Preference Score75%

How often humans prefer this model's output (0-100%)

Aesthetic Score7.0

Visual quality and composition rating (5-9 scale)

VRAM by Scenario

VRAM estimates at FP16 and FP8 precision. FP8 uses ~40% less memory with minimal quality loss. Grade shows how well each GPU handles the generation workload.

FP16 (full precision)

Scenario	VRAM	RTX 4090 24GB	RTX 3060 12GB	RTX 4060 8GB	MacBook Pro M4 Pro 24GB
512×512 · 25 frames	6.4 GB	S●	S●	S●	F●
768×512 · 25 frames	6.7 GB	S●	S●	A●	F●
768×512 · 100 frames	7.6 GB	S●	S●	B●	F●
1280×720 · 25 frames	7.8 GB	S●	S●	B●	F●

FP8 (quantized — ~40% less VRAM)

Scenario	VRAM	RTX 4090 24GB	RTX 3060 12GB	RTX 4060 8GB	MacBook Pro M4 Pro 24GB
512×512 · 25 frames	25.9 GB	B	F	F	F
768×512 · 25 frames	28.0 GB	D	F	F	F
768×512 · 100 frames	34.3 GB	F	F	F	F
1280×720 · 25 frames	36.5 GB	F	F	F	F

Optimization Tips

Turbo / LCM distillation

Use distilled scheduler at 4-8 steps for faster iteration

Run with Python

Run with Python (diffusers)

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "lllyasviel/FramePackI2V_HY",
    torch_dtype=torch.float16
)
pipe.to("cuda")

frames = pipe(
    prompt="your prompt here",
    num_inference_steps=25,
    guidance_scale=7.5,
    num_frames=129,
).frames[0]
# Save frames or export as video

Get started

Setup instructions for running FramePack I2V locally

1. Download the model

Get the checkpoint from HuggingFace

2. Place in:

ComfyUI/models/checkpoints/

3. Launch ComfyUI

python main.py

Note: Video generation requires video output nodes. Install ComfyUI-VideoHelperSuite from the ComfyUI Manager for SaveAnimatedWEBP or VHS_VideoCombine nodes.

Memory Breakdown

VRAM allocation for 25 frames at 768×512 on RTX 4090 24GB

Required: 6.7 GBAvailable: 24.0 GB

Weights26.0 GB

VAE0.2 GB

Text Encoder14.0 GB

Activations6.0 GB

Overhead0.5 GB

Estimated Generation Time

25 frames at 768×512, 30 steps, FP16.

RTX 4090 24GB~3m 3s

RTX 3060 12GB~11m 33s

RTX 4060 8GB~17m 25s

MacBook Pro M4 Pro 24GB~24m 45s

Sample Outputs

Available Formats & Downloads

Download FramePack I2V in different precisions. Lower precision = less VRAM but slight quality loss.

Format	Präzision	Größe	Anbieter
Offizielle Gewichte
safetensorsEmpfohlen	FP16	26.0 GB	official	Herunterladen
Community-Konvertierungen
safetensorsCommunity	FP8	13.0 GB	community	Herunterladen

LoRA Ecosystem

Limited

Very new model; LoRA ecosystem is still emerging.

Related Workflows

Image-to-Video →

CogVideoX 5B5B · THUDM Wan Video 2.1 1.3B1.3B · Alibaba Cosmos Diffusion 7B7B · NVIDIA

Frequently asked questions

FAQ — FramePack I2V

How much VRAM does FramePack I2V need for video?

FramePack I2V (13B parameters) requires approximately 6.7 GB of VRAM at FP16 precision for generating 25 frames at 768×512. Video generation typically requires more VRAM than image generation due to temporal attention layers.

Can I run FramePack I2V on RTX 4090?

Yes, the RTX 4090 (24 GB VRAM) can run FramePack I2V at FP16. Expected generation time is around ~3m 3s for a 25-frame clip.

How long does it take to generate a video with FramePack I2V?

On a reference GPU (RTX 4090 24GB), FramePack I2V generates a 25-frame video at 768×512 in approximately ~3m 3s at FP16 with 30 inference steps. Faster GPUs with higher memory bandwidth will reduce generation time.

What resolution and frame count does FramePack I2V support?

FramePack I2V supports up to 1280×720 resolution and 129 frames per generation at 30 FPS. Higher resolutions and frame counts require proportionally more VRAM.

About FramePack I2V

Use cases

video-generationimage-to-videolow-vram

Recommended runtimes

comfyuidiffusers