How much VRAM does HunyuanVideo 1.5 need?

HunyuanVideo 1.5 is an 8.3B model. At FP16 with the text encoder loaded on GPU, the full pipeline requires approximately 24–28 GB VRAM. With FP8 quantization it drops to roughly 14–16 GB, fitting an RTX 4080 Super 16GB or RTX 4060 Ti 16GB. With CPU text-encoder offloading (t5_cpu) and FP8, peak GPU VRAM falls to approximately 8–12 GB on consumer hardware.

Can HunyuanVideo 1.5 run on a 12 GB GPU?

Yes, with the right configuration. Use FP8 quantization and offload the Hunyuan-Large MLLM text encoder to CPU RAM. With this setup, the RTX 4070 12GB and RTX 4070 Super 12GB can run HunyuanVideo 1.5 at 480p, generating 4–6 second clips. Expect slower generation than a 24 GB card. Plan for at least 24 GB of system RAM to hold the offloaded text encoder.

What is the difference between HunyuanVideo 1.5 and the original HunyuanVideo?

The original HunyuanVideo is a 13B 3D DiT model requiring 47–58 GB VRAM at FP16. HunyuanVideo 1.5 is the successor at 8.3B parameters — deliberately sized for consumer hardware. It also adds image-to-video (I2V) capability alongside text-to-video (T2V), uses an improved text encoder, and requires roughly 40% less VRAM at equivalent precision.

Does HunyuanVideo 1.5 support image-to-video?

Yes. HunyuanVideo 1.5 supports both text-to-video (T2V) and image-to-video (I2V) generation. The I2V mode takes a reference image plus a text prompt and generates a video clip that extends the provided image. The VRAM overhead for I2V is similar to T2V — the reference image adds negligible memory compared to the model weights.

Is HunyuanVideo 1.5 open source and commercially usable?

HunyuanVideo 1.5 is released under the Tencent Hunyuan Community License, which permits research and personal use but restricts commercial deployment. It is not Apache-2.0 or MIT licensed. Verify the current license terms on the HuggingFace model card (tencent/HunyuanVideo-1.5) before commercial use.

What is FP8 quantization for video generation?

FP8 quantization replaces 16-bit floating-point weights with 8-bit, halving the memory footprint with minimal quality loss. For HunyuanVideo 1.5, FP8 cuts the transformer backbone from ~17 GB to ~9 GB. ComfyUI's native FP8 nodes and diffusers' torch.float8_e4m3fn support both handle this automatically. Quality difference from FP16 is very small and typically imperceptible on 720p output.

May 20, 2026video-generation, vram-requirements, hunyuanvideo, fp8, gguf, tencent, comfyui

HunyuanVideo 1.5 VRAM Requirements — FP16, FP8, and Practical GPU Guide (2026)

HunyuanVideo 1.5 (8.3B T2V+I2V) is far more consumer-friendly than the 13B original. VRAM table at FP16/FP8/GGUF, recommended GPUs from 12 GB to 48 GB.

HunyuanVideo 1.5 is Tencent's deliberate answer to the accessibility problem. The original HunyuanVideo (13B) was technically impressive but required 47–58 GB VRAM at FP16 — well beyond consumer reach without aggressive community workarounds. The 1.5 revision cuts the model to 8.3B parameters, adds image-to-video (I2V) capability, and was designed from the start to run on mainstream consumer GPUs.

The result is a video model that fits on an RTX 4080 Super at FP8, and on a 12 GB GPU with text encoder offloading. This guide covers the exact HunyuanVideo 1.5 VRAM requirements at every precision level and which GPUs can run it.

Quick verdict: HunyuanVideo 1.5 is the most consumer-accessible model in the HunyuanVideo family. ~14 GB at FP8 makes it practical on 16 GB GPUs, and ~8–12 GB with offloading opens it to 12 GB cards. For the original 13B model's requirements, see the video generation GPU guide 2026.

HunyuanVideo 1.5 Architecture

Feature	Value
Architecture	3D Diffusion Transformer (3D DiT)
Parameters	8.3B (VERIFIED — model card)
Text encoder	Hunyuan-Large MLLM (separate, ~7B on GPU or offloaded)
VAE	CausalVAE (temporal, ~0.1B)
Developer	Tencent
Released	2025-11-20
License	Tencent Hunyuan Community License (non-commercial research)
Modalities	Text-to-video (T2V) + Image-to-video (I2V)
Max frames	129
Max resolution	720p (1280×720)
HuggingFace	tencent/HunyuanVideo-1.5
Predecessor	HunyuanVideo (13B)

HunyuanVideo 1.5 keeps the same 3D DiT architecture as the original but shrinks the denoising backbone from 13B to 8.3B parameters. The Hunyuan-Large MLLM text encoder remains a large component — approximately 7B parameters — but it can be offloaded to CPU RAM without catastrophic speed loss, which is the key strategy for fitting the pipeline on 12–16 GB GPUs.

HunyuanVideo 1.5 VRAM Requirements

All numbers reflect total VRAM usage during generation: transformer weights + text encoder (if on GPU) + VAE + activations. Resolution is 720p (1280×720) unless noted.

Precision	Text encoder	VRAM (720p)	VRAM (480p)	Min GPU
FP16 (full)	On GPU	~28 GB	~24 GB	RTX 3090 24GB (tight), RTX 4090 24GB
FP8 (transformer)	On GPU	~18–20 GB	~14–16 GB	RTX 4080 Super 16GB (FP8+offload), RTX 4090 24GB
FP8 (transformer)	CPU offload	~10–12 GB	~8–10 GB	RTX 4070 Super 12GB, RTX 4070 Ti Super 16GB
GGUF Q4 (transformer)	CPU offload	~9–11 GB	~7–9 GB	RTX 4070 12GB, RTX 4060 Ti 16GB

Note on text encoder: The Hunyuan-Large MLLM text encoder is approximately 7B parameters (~14 GB at FP16, ~7 GB at FP8). When "CPU offload" is used, this component lives in system RAM and is passed to GPU only during conditioning. This keeps peak GPU VRAM low but requires 24–32 GB system RAM and adds generation overhead (~10–20% slower per clip).

Spec source: Params (8.3B) and architecture are VERIFIED from the HuggingFace model card. VRAM figures are estimates derived from catalog data and community benchmarks — treat as reliable guidance with ±1–2 GB margin.

Precision Comparison

Aspect	FP16	FP8	GGUF Q4
VRAM (T.E. on GPU, 720p)	~28 GB	~18–20 GB	~15–17 GB
VRAM (T.E. offloaded, 720p)	~18 GB	~10–12 GB	~9–11 GB
Quality vs FP16	Reference	Minimal loss	Slight softening
Text rendering	Excellent	Excellent	Good
ComfyUI support	Native	FP8 cast nodes	GGUF nodes
Speed penalty	Baseline	~10–15% slower	~20–35% slower

Recommendation: FP8 with CPU text encoder offload is the sweet spot for most consumer GPU users. The quality loss is negligible, and it unlocks HunyuanVideo 1.5 on 10–12 GB GPUs.

GPU Recommendations

12 GB — RTX 4070 12GB, RTX 4070 Super 12GB, RTX 3060 12GB

HunyuanVideo 1.5 is accessible at 12 GB with the right setup:

Use FP8 quantization on the transformer backbone
Offload the text encoder to CPU RAM (requires 24+ GB system RAM)
Expect 480p output; 720p is possible but very slow and pushes VRAM limits
Generation time: approximately 3–8 minutes per 4-second clip depending on step count

The RTX 4070 12GB and RTX 4070 Super 12GB both work well at this tier with the above configuration. The RTX 3060 12GB is viable but slower.

Verdict: Functional but requires patience. Best suited for quality-focused users who can accept longer generation times. If speed matters more, consider Wan Video 1.3B which runs faster at this tier.

16 GB — RTX 4060 Ti 16GB, RTX 4070 Ti Super 16GB, RTX 4080 Super 16GB

The comfortable tier for HunyuanVideo 1.5:

RTX 4060 Ti 16GB: FP8 with text encoder offloaded — ~12 GB peak. 480p–720p. Slower due to narrower memory bus.
RTX 4070 Ti Super 16GB: FP8 with optional text encoder on GPU for 720p. ~15–16 GB peak. Strong generation speed.
RTX 4080 Super 16GB: FP8 pipeline fits fully on GPU at 480p. 720p may need text encoder on CPU. Best value at this tier for HunyuanVideo 1.5.

Verdict: 16 GB is the recommended minimum for comfortable HunyuanVideo 1.5 use. The RTX 4080 Super 16GB or RTX 4070 Ti Super 16GB offer the best performance-per-VRAM at this tier.

24 GB — RTX 4090, RTX 3090

The no-compromise tier:

RTX 4090 24GB: Full FP16 at 720p. Text encoder on GPU at all times. Fastest generation — typically 45–90 seconds per 4-second clip at 50 steps.
RTX 3090 24GB: Runs FP16 similarly to the 4090, though slower overall compute. FP8 option saves VRAM headroom for longer clips (more frames).

The 24 GB tier gives HunyuanVideo 1.5 room to generate 720p at FP16 with the full text encoder loaded and still have headroom for longer frame counts.

Verdict: Best local experience. The RTX 4090 24GB is the recommended GPU for serious HunyuanVideo 1.5 workflows.

Apple Silicon Macs

Apple Silicon's unified memory pool is a natural fit for HunyuanVideo 1.5's memory profile:

Mac	Memory	Recommended config	Est. speed
M4 Pro 24GB	24 GB unified	FP16 (tight) or FP8	~120–180 sec/clip
M4 Pro 48GB	48 GB unified	FP16 comfortably	~90–120 sec/clip
M4 Max 36GB	36 GB unified	FP16	~80–100 sec/clip
M4 Max 48GB	48 GB unified	FP16 with headroom	~70–90 sec/clip

Apple Silicon is 2–4× slower per clip than equivalent NVIDIA hardware, but the large unified memory means no CPU offloading tricks are needed. The M4 Pro 24GB is the minimum practical Mac for HunyuanVideo 1.5 at FP16.

Running HunyuanVideo 1.5 with diffusers

FP8 + CPU text encoder offload (12–16 GB GPUs)

import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from transformers import LlamaModel

# Load transformer in FP8
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    "tencent/HunyuanVideo-1.5",
    subfolder="transformer",
    torch_dtype=torch.float8_e4m3fn,
)

pipe = HunyuanVideoPipeline.from_pretrained(
    "tencent/HunyuanVideo-1.5",
    transformer=transformer,
    torch_dtype=torch.bfloat16,
)

# Keep text encoder in CPU RAM — offload on demand
pipe.enable_sequential_cpu_offload()

frames = pipe(
    prompt="A serene mountain lake at sunrise, mist rising off the water",
    num_inference_steps=50,
    num_frames=81,
    height=480,
    width=848,
    guidance_scale=6.0,
).frames[0]

FP16 full (24 GB GPU)

import torch
from diffusers import HunyuanVideoPipeline

pipe = HunyuanVideoPipeline.from_pretrained(
    "tencent/HunyuanVideo-1.5",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

frames = pipe(
    prompt="A serene mountain lake at sunrise, mist rising off the water",
    num_inference_steps=50,
    num_frames=129,
    height=720,
    width=1280,
    guidance_scale=6.0,
).frames[0]

HunyuanVideo 1.5 vs Original HunyuanVideo — Key Differences

Aspect	HunyuanVideo (original)	HunyuanVideo 1.5
Parameters	13B	8.3B
Modalities	T2V only	T2V + I2V
VRAM (FP16 full)	~47–58 GB	~24–28 GB
VRAM (FP8, T.E. offloaded)	~8 GB (with tiling)	~10–12 GB
Min consumer GPU	RTX 4090 (with tiling)	RTX 4070 12GB (FP8 + offload)
Max frames	129	129
Max resolution	720p	720p
License	Tencent Hunyuan Community	Tencent Hunyuan Community

HunyuanVideo 1.5 is the better choice for most consumer deployments. The original 13B model only becomes relevant if you have 48+ GB VRAM and want maximum quality at full precision.

System RAM Requirements

When offloading the text encoder to CPU RAM, system memory matters as much as GPU VRAM:

24 GB system RAM: Minimum for text encoder offload. Leaves little room for the OS and other apps.
32 GB system RAM: Recommended. Comfortable with the text encoder loaded and ComfyUI running.
64 GB system RAM: Ideal for workflows running the full pipeline without any memory pressure.

Use a fast NVMe SSD for model storage — HunyuanVideo 1.5 weights are approximately 17 GB on disk at FP16.

Related Guides

Video Generation GPU Guide 2026 — full comparison across all open video models
Wan 2.2 VRAM Requirements — the other leading open video family
Best AI Video Generation Models (Local) — which model fits your use case
Diffusion Model Calculator — check HunyuanVideo 1.5 against your specific GPU
HunyuanVideo 1.5 model page — instant fit check for your hardware