HunyuanVideo 1.5 VRAM Requirements — FP16, FP8, and Practical GPU Guide (2026)
HunyuanVideo 1.5 (8.3B T2V+I2V) is far more consumer-friendly than the 13B original. VRAM table at FP16/FP8/GGUF, recommended GPUs from 12 GB to 48 GB.
HunyuanVideo 1.5 is Tencent's deliberate answer to the accessibility problem. The original HunyuanVideo (13B) was technically impressive but required 47–58 GB VRAM at FP16 — well beyond consumer reach without aggressive community workarounds. The 1.5 revision cuts the model to 8.3B parameters, adds image-to-video (I2V) capability, and was designed from the start to run on mainstream consumer GPUs.
The result is a video model that fits on an RTX 4080 Super at FP8, and on a 12 GB GPU with text encoder offloading. This guide covers the exact HunyuanVideo 1.5 VRAM requirements at every precision level and which GPUs can run it.
Quick verdict: HunyuanVideo 1.5 is the most consumer-accessible model in the HunyuanVideo family. ~14 GB at FP8 makes it practical on 16 GB GPUs, and ~8–12 GB with offloading opens it to 12 GB cards. For the original 13B model's requirements, see the video generation GPU guide 2026.
HunyuanVideo 1.5 Architecture
| Feature | Value |
|---|---|
| Architecture | 3D Diffusion Transformer (3D DiT) |
| Parameters | 8.3B (VERIFIED — model card) |
| Text encoder | Hunyuan-Large MLLM (separate, ~7B on GPU or offloaded) |
| VAE | CausalVAE (temporal, ~0.1B) |
| Developer | Tencent |
| Released | 2025-11-20 |
| License | Tencent Hunyuan Community License (non-commercial research) |
| Modalities | Text-to-video (T2V) + Image-to-video (I2V) |
| Max frames | 129 |
| Max resolution | 720p (1280×720) |
| HuggingFace | tencent/HunyuanVideo-1.5 |
| Predecessor | HunyuanVideo (13B) |
HunyuanVideo 1.5 keeps the same 3D DiT architecture as the original but shrinks the denoising backbone from 13B to 8.3B parameters. The Hunyuan-Large MLLM text encoder remains a large component — approximately 7B parameters — but it can be offloaded to CPU RAM without catastrophic speed loss, which is the key strategy for fitting the pipeline on 12–16 GB GPUs.
HunyuanVideo 1.5 VRAM Requirements
All numbers reflect total VRAM usage during generation: transformer weights + text encoder (if on GPU) + VAE + activations. Resolution is 720p (1280×720) unless noted.
| Precision | Text encoder | VRAM (720p) | VRAM (480p) | Min GPU |
|---|---|---|---|---|
| FP16 (full) | On GPU | ~28 GB | ~24 GB | RTX 3090 24GB (tight), RTX 4090 24GB |
| FP8 (transformer) | On GPU | ~18–20 GB | ~14–16 GB | RTX 4080 Super 16GB (FP8+offload), RTX 4090 24GB |
| FP8 (transformer) | CPU offload | ~10–12 GB | ~8–10 GB | RTX 4070 Super 12GB, RTX 4070 Ti Super 16GB |
| GGUF Q4 (transformer) | CPU offload | ~9–11 GB | ~7–9 GB | RTX 4070 12GB, RTX 4060 Ti 16GB |
Note on text encoder: The Hunyuan-Large MLLM text encoder is approximately 7B parameters (~14 GB at FP16, ~7 GB at FP8). When "CPU offload" is used, this component lives in system RAM and is passed to GPU only during conditioning. This keeps peak GPU VRAM low but requires 24–32 GB system RAM and adds generation overhead (~10–20% slower per clip).
Spec source: Params (8.3B) and architecture are VERIFIED from the HuggingFace model card. VRAM figures are estimates derived from catalog data and community benchmarks — treat as reliable guidance with ±1–2 GB margin.
Precision Comparison
| Aspect | FP16 | FP8 | GGUF Q4 |
|---|---|---|---|
| VRAM (T.E. on GPU, 720p) | ~28 GB | ~18–20 GB | ~15–17 GB |
| VRAM (T.E. offloaded, 720p) | ~18 GB | ~10–12 GB | ~9–11 GB |
| Quality vs FP16 | Reference | Minimal loss | Slight softening |
| Text rendering | Excellent | Excellent | Good |
| ComfyUI support | Native | FP8 cast nodes | GGUF nodes |
| Speed penalty | Baseline | ~10–15% slower | ~20–35% slower |
Recommendation: FP8 with CPU text encoder offload is the sweet spot for most consumer GPU users. The quality loss is negligible, and it unlocks HunyuanVideo 1.5 on 10–12 GB GPUs.
GPU Recommendations
12 GB — RTX 4070 12GB, RTX 4070 Super 12GB, RTX 3060 12GB
HunyuanVideo 1.5 is accessible at 12 GB with the right setup:
- Use FP8 quantization on the transformer backbone
- Offload the text encoder to CPU RAM (requires 24+ GB system RAM)
- Expect 480p output; 720p is possible but very slow and pushes VRAM limits
- Generation time: approximately 3–8 minutes per 4-second clip depending on step count
The RTX 4070 12GB and RTX 4070 Super 12GB both work well at this tier with the above configuration. The RTX 3060 12GB is viable but slower.
Verdict: Functional but requires patience. Best suited for quality-focused users who can accept longer generation times. If speed matters more, consider Wan Video 1.3B which runs faster at this tier.
16 GB — RTX 4060 Ti 16GB, RTX 4070 Ti Super 16GB, RTX 4080 Super 16GB
The comfortable tier for HunyuanVideo 1.5:
- RTX 4060 Ti 16GB: FP8 with text encoder offloaded — ~12 GB peak. 480p–720p. Slower due to narrower memory bus.
- RTX 4070 Ti Super 16GB: FP8 with optional text encoder on GPU for 720p. ~15–16 GB peak. Strong generation speed.
- RTX 4080 Super 16GB: FP8 pipeline fits fully on GPU at 480p. 720p may need text encoder on CPU. Best value at this tier for HunyuanVideo 1.5.
Verdict: 16 GB is the recommended minimum for comfortable HunyuanVideo 1.5 use. The RTX 4080 Super 16GB or RTX 4070 Ti Super 16GB offer the best performance-per-VRAM at this tier.
24 GB — RTX 4090, RTX 3090
The no-compromise tier:
- RTX 4090 24GB: Full FP16 at 720p. Text encoder on GPU at all times. Fastest generation — typically 45–90 seconds per 4-second clip at 50 steps.
- RTX 3090 24GB: Runs FP16 similarly to the 4090, though slower overall compute. FP8 option saves VRAM headroom for longer clips (more frames).
The 24 GB tier gives HunyuanVideo 1.5 room to generate 720p at FP16 with the full text encoder loaded and still have headroom for longer frame counts.
Verdict: Best local experience. The RTX 4090 24GB is the recommended GPU for serious HunyuanVideo 1.5 workflows.
Apple Silicon Macs
Apple Silicon's unified memory pool is a natural fit for HunyuanVideo 1.5's memory profile:
| Mac | Memory | Recommended config | Est. speed |
|---|---|---|---|
| M4 Pro 24GB | 24 GB unified | FP16 (tight) or FP8 | ~120–180 sec/clip |
| M4 Pro 48GB | 48 GB unified | FP16 comfortably | ~90–120 sec/clip |
| M4 Max 36GB | 36 GB unified | FP16 | ~80–100 sec/clip |
| M4 Max 48GB | 48 GB unified | FP16 with headroom | ~70–90 sec/clip |
Apple Silicon is 2–4× slower per clip than equivalent NVIDIA hardware, but the large unified memory means no CPU offloading tricks are needed. The M4 Pro 24GB is the minimum practical Mac for HunyuanVideo 1.5 at FP16.
Running HunyuanVideo 1.5 with diffusers
FP8 + CPU text encoder offload (12–16 GB GPUs)
import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from transformers import LlamaModel
# Load transformer in FP8
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
"tencent/HunyuanVideo-1.5",
subfolder="transformer",
torch_dtype=torch.float8_e4m3fn,
)
pipe = HunyuanVideoPipeline.from_pretrained(
"tencent/HunyuanVideo-1.5",
transformer=transformer,
torch_dtype=torch.bfloat16,
)
# Keep text encoder in CPU RAM — offload on demand
pipe.enable_sequential_cpu_offload()
frames = pipe(
prompt="A serene mountain lake at sunrise, mist rising off the water",
num_inference_steps=50,
num_frames=81,
height=480,
width=848,
guidance_scale=6.0,
).frames[0]
FP16 full (24 GB GPU)
import torch
from diffusers import HunyuanVideoPipeline
pipe = HunyuanVideoPipeline.from_pretrained(
"tencent/HunyuanVideo-1.5",
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
frames = pipe(
prompt="A serene mountain lake at sunrise, mist rising off the water",
num_inference_steps=50,
num_frames=129,
height=720,
width=1280,
guidance_scale=6.0,
).frames[0]
HunyuanVideo 1.5 vs Original HunyuanVideo — Key Differences
| Aspect | HunyuanVideo (original) | HunyuanVideo 1.5 |
|---|---|---|
| Parameters | 13B | 8.3B |
| Modalities | T2V only | T2V + I2V |
| VRAM (FP16 full) | ~47–58 GB | ~24–28 GB |
| VRAM (FP8, T.E. offloaded) | ~8 GB (with tiling) | ~10–12 GB |
| Min consumer GPU | RTX 4090 (with tiling) | RTX 4070 12GB (FP8 + offload) |
| Max frames | 129 | 129 |
| Max resolution | 720p | 720p |
| License | Tencent Hunyuan Community | Tencent Hunyuan Community |
HunyuanVideo 1.5 is the better choice for most consumer deployments. The original 13B model only becomes relevant if you have 48+ GB VRAM and want maximum quality at full precision.
System RAM Requirements
When offloading the text encoder to CPU RAM, system memory matters as much as GPU VRAM:
- 24 GB system RAM: Minimum for text encoder offload. Leaves little room for the OS and other apps.
- 32 GB system RAM: Recommended. Comfortable with the text encoder loaded and ComfyUI running.
- 64 GB system RAM: Ideal for workflows running the full pipeline without any memory pressure.
Use a fast NVMe SSD for model storage — HunyuanVideo 1.5 weights are approximately 17 GB on disk at FP16.
Related Guides
- Video Generation GPU Guide 2026 — full comparison across all open video models
- Wan 2.2 VRAM Requirements — the other leading open video family
- Best AI Video Generation Models (Local) — which model fits your use case
- Diffusion Model Calculator — check HunyuanVideo 1.5 against your specific GPU
- HunyuanVideo 1.5 model page — instant fit check for your hardware