by lllyasviel
Viral low-VRAM video generation model based on HunyuanVideo architecture. Uses a novel next-frame prediction approach that inverts the diffusion process to pack future frames into the noise of the current frame, enabling video generation with only 6GB VRAM. Image-to-video with strong motion quality.
Your hardware
Detecting...
Measured quality metrics for FramePack I2V outputs.
How often humans prefer this model's output (0-100%)
Visual quality and composition rating (5-9 scale)
VRAM estimates at FP16 and FP8 precision. FP8 uses ~40% less memory with minimal quality loss. Grade shows how well each GPU handles the generation workload.
| Scenario | VRAM | RTX 4090 24GB | RTX 3060 12GB | RTX 4060 8GB | MacBook Pro M4 Pro 24GB |
|---|---|---|---|---|---|
| 512×512 · 25 frames | 6.4 GB | S● | S● | S● | F● |
| 768×512 · 25 frames | 6.7 GB | S● | S● | A● | F● |
| 768×512 · 100 frames | 7.6 GB | S● | S● | B● | F● |
| 1280×720 · 25 frames | 7.8 GB | S● | S● | B● | F● |
| Scenario | VRAM | RTX 4090 24GB | RTX 3060 12GB | RTX 4060 8GB | MacBook Pro M4 Pro 24GB |
|---|---|---|---|---|---|
| 512×512 · 25 frames | 25.9 GB | B | F | F | F |
| 768×512 · 25 frames | 28.0 GB | D | F | F | F |
| 768×512 · 100 frames | 34.3 GB | F | F | F | F |
| 1280×720 · 25 frames | 36.5 GB | F | F | F | F |
Turbo / LCM distillation
Use distilled scheduler at 4-8 steps for faster iteration
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"lllyasviel/FramePackI2V_HY",
torch_dtype=torch.float16
)
pipe.to("cuda")
frames = pipe(
prompt="your prompt here",
num_inference_steps=25,
guidance_scale=7.5,
num_frames=129,
).frames[0]
# Save frames or export as videoGet started
Setup instructions for running FramePack I2V locally
1. Download the model
Get the checkpoint from HuggingFace
2. Place in:
ComfyUI/models/checkpoints/3. Launch ComfyUI
python main.pyVRAM allocation for 25 frames at 768×512 on RTX 4090 24GB
25 frames at 768×512, 30 steps, FP16.
Download FramePack I2V in different precisions. Lower precision = less VRAM but slight quality loss.
Very new model; LoRA ecosystem is still emerging.
Frequently asked questions
FramePack I2V (13B parameters) requires approximately 6.7 GB of VRAM at FP16 precision for generating 25 frames at 768×512. Video generation typically requires more VRAM than image generation due to temporal attention layers.
Yes, the RTX 4090 (24 GB VRAM) can run FramePack I2V at FP16. Expected generation time is around ~3m 3s for a 25-frame clip.
On a reference GPU (RTX 4090 24GB), FramePack I2V generates a 25-frame video at 768×512 in approximately ~3m 3s at FP16 with 30 inference steps. Faster GPUs with higher memory bandwidth will reduce generation time.
FramePack I2V supports up to 1280×720 resolution and 129 frames per generation at 30 FPS. Higher resolutions and frame counts require proportionally more VRAM.
See also