NVIDIA

RTX 5090 Laptop 24GB

Name: RTX 5090 Laptop 24GB
Brand: NVIDIA

RTX 50 LaptopLaptopBlackwellMOBILECUDA

24GB

VRAM

896GB/s

Bandwidth

52TFLOPS

FP16 Compute

832TOPS

INT8 Inference

RTX 5090 Laptop 24GBCategory AvgMacBook Pro M4 Max 36GB

Operating mode

Choose the operating mode for this hardware

Use this to bias workload recommendations toward responsiveness, background autonomy, lighter serving, or multi-GPU scale-out.

Current mode

Balanced

Balanced for general local use. Keeps the ranking neutral across personal and serving workflows.

See Full AI Tier List for RTX 5090 Laptop 24GB →

About this GPU for AI

The RTX 5090 Laptop is NVIDIA's Blackwell mobile flagship, featuring 24 GB of GDDR7 at 896 GB/s bandwidth in a 95–150W TGP package. Based on the GB203 die (not the desktop RTX 5090's GB202), it delivers 52 TFLOPS FP16 and 1,824 AI TOPS — making it the first laptop GPU with enough VRAM to run 70B models at Q3/Q4 without CPU offloading. Available from March 2025, it represents a major step forward for portable AI inference compared to the 16 GB Ada laptop generation.

Beyond LLMs

AI Capability Matrix

What AI tasks this GPU can handle — from text generation to image and video creation.

Capability	Status	Representative Model	Detail
LLM Chat (7B)	Runs natively	Llama 3.1 8B Q4	—
LLM Coding (30B)	Runs natively	Qwen 3 30B Q4	—
LLM Large (70B)	Won’t fit	Llama 3.1 70B Q4	—
Image Gen (SDXL)	Runs natively	SDXL 1.0 FP16	~~5.7s per image
Image Gen (Flux)	Runs with offload	Flux.1 Dev FP16	~~25.7s per image
Image Gen (SD 3.5)	Runs natively	SD 3.5 Large FP16	~~31.4s per image
Video Short (25f)	Runs natively	LTX Video 2B	~~5s/frame
Video Long (100f)	Won't fit	Wan Video 14B	~~14.6s/frame

portablethermally-limitedlaptopblackwelllarge-vrammobile-flagship

Specifications

Compute

FP1652 TFLOPS

INT8832 TOPS

ArchitectureBlackwell

Memory

VRAM24 GB

Bandwidth896 GB/s

General

FamilyRTX 50 Laptop

SegmentLaptop

InterconnectMOBILE

Compute PlatformCUDA

Key Features

24 GB GDDR7 VRAM on a 256-bit busBlackwell GB203 die with 5th-gen Tensor Cores, FP4 and FP8 support52 TFLOPS FP16 / 832 INT8 TOPS / 1,824 AI TOPS896 GB/s memory bandwidth95–150W configurable TGPDLSS 4 with Multi-Frame Generation

For AI Workloads

Strengths

24 GB GDDR7 is the largest VRAM ever shipped in a laptop GPU — fits 70B Q3/Q4 models without CPU offloading
896 GB/s bandwidth delivers fast decode for large quantized models in a portable chassis
5th-gen Tensor Cores with FP4 support enable next-generation quantization formats for maximum throughput
First laptop GPU capable of single-card 70B inference — a meaningful capability leap

Considerations

Based on GB203 die, not desktop GB202 — delivers approximately 35–40% of desktop RTX 5090 sustained compute
95–150W TGP means performance varies significantly between laptop models — verify TGP before purchasing
Laptops equipped with this GPU carry a significant premium ($2,899+ laptop price)
Thermal throttling under sustained long inference sessions limits effective throughput in compact chassis

Architecture

Blackwell

Blackwell is NVIDIA's fifth-generation RTX architecture, built on TSMC's 4NP process. It introduces 5th-generation Tensor Cores with native FP4 precision support, enabling double the inference throughput per watt compared to Ada Lovelace's FP8 operations. Key innovations include the Neural Rendering Pipeline for AI-driven shading and the debut of GDDR7 memory in consumer GPUs.

AI Relevance

FP4 Tensor Cores deliver the highest tokens-per-watt efficiency in any consumer architecture. Native FP4 quantization means models can run at lower precision with minimal quality loss, effectively doubling the effective VRAM for model weights.

Process: TSMC 4NPPlatform: CUDATensor Cores: Gen 5Precisions: FP32, FP16, BF16, FP8, FP4, INT8, INT4

Buying advice

Should you buy RTX 5090 Laptop 24GB for local AI?

Excellent choice for local AI

Runs 26 of 50 top models well — a strong all-rounder for local inference.

24.0 GB

VRAM

Best models for this GPU

Qwen3-Coder 30B A3B Instruct — 97/100, 114 tok/s, 23.4 GB needed
Qwen3-VL 30B A3B Instruct — 96/100, 118 tok/s, 23.1 GB needed
GPT-OSS 20B — 95/100, 145 tok/s, 18.6 GB needed

What will limit you first

This setup is broadly balanced for this model.

Very little memory headroom

You can run the model, but there is not much room left for longer context, bigger batches, extra apps, or future model updates.

Best upgrade itinerary

Buy headroom, not only minimum fit

A slightly larger memory tier gives you safer context growth and makes the recommendation more future-proof.

Unlocks 1 additional models that do not fit on the current setup.

Want more headroom? MacBook Pro M4 Max 36GB (36.0 GB unified memory) is the next step up.

Recommendations by Workload

Chat

Qwen 3 14B

This model is a direct match for chat. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Decode 95.2 tok/s · 80K ctx · llama.cppEST.

13.1 GB / 24.0 GB VRAM

Coding

Devstral Small 2 24B Instruct

This model is a direct match for coding. It belongs to a current frontier family for local AI. It should run, but memory headroom will be limited. Known channels: huggingface, ollama, lm-studio.

Decode 55.3 tok/s · 40K ctx · llama.cppEST.

20.4 GB / 24.0 GB VRAM

Agentic Coding

Qwen 3.6 27B

This model is still usable for agentic-coding, but it is not the most specialized pick. It belongs to a current frontier family for local AI. It should run, but memory headroom will be limited. Known channels: huggingface, lm-studio.

Decode 34.3 tok/s · 69K ctx · llama.cppEST.

21.7 GB / 24.0 GB VRAM

Reasoning

Qwen 3 14B

This model is a direct match for reasoning. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Decode 95.2 tok/s · 80K ctx · llama.cppEST.

14.3 GB / 24.0 GB VRAM

RAG

Granite 4.1 8B

This model is a direct match for rag. It sits in the middle of the current model mix. It fits natively with comfortable headroom. Known channels: huggingface, ollama.

Decode 112.0 tok/s · 104K ctx · llama.cppEST.

13.1 GB / 24.0 GB VRAM

Full Model Compatibility

Model	Max Resolution	Gen Time	Grade
SD TurboImage	512×512	700ms	S
Stable Diffusion 1.5Image	512×768	~1.4s	S
Realistic Vision v5.1Image	512×768	~1.4s	S
DreamShaper 8Image	512×768	~1.4s	S
LCM DreamShaper v7Image	512×768	400ms	S
PixArt-SigmaImage	1024×1024	~5.7s	S
FramePack I2VVideo	256×256	~10.5s/frame	S
SDXL TurboImage	512×512	700ms	S
SDXL LightningImage	1024×1024	~2.1s	S
Stable Diffusion XL 1.0Image	1024×1024	~5.7s	S
Playground v2.5Image	1024×1024	~8.6s	S
RealVisXL v5.0Image	1024×1024	~6.4s	S
DreamShaper XLImage	1024×1024	~6.4s	S
Juggernaut XL v9Image	1024×1024	~6.4s	S
Animagine XL 3.1Image	1024×1024	~6.4s	S
Pony Diffusion V6 XLImage	1024×1024	~6.4s	S
Animagine XL 4.0Image	1024×1024	~6.4s	S
Illustrious XLImage	1024×1024	~6.4s	S
Wan Video 2.1 1.3BVideo	256×256	~4.2s/frame	S
Stable Diffusion 3.5 MediumImage	1024×1024	~10s	S
Flux.2 Klein 4BImage	1024×1024	~1.7s	S
LTX Video 2BVideo	768×512	~5s/frame	S
KolorsImage	1024×1024	~11.4s	S
Stable CascadeImage	1024×1024	~14.3s	S
AuraFlow v0.3Image	1536×1536	~25.7s	S
Stable Diffusion 3.5 LargeImage	1024×1024	~31.4s	S
Stable Diffusion 3.5 Large TurboImage	1024×1024	~5.7s	S
CogVideoX 2BVideo	720×480	~5s/frame	A
HunyuanVideoVideo	256×256	~10.5s/frame	A
ChromaImage	256×256	~10.5s	A
Z-Image TurboImage	1536×1536	~5.9s	B
Flux.1 DevImage	256×256	~25.7s	B
Flux.1 SchnellImage	256×256	~5s	B
LTX Video 13BVideo	256×256	~10.5s/frame	B
Flux.1 Kontext DevImage	256×256	~28.5s	B
AnimateDiff v1.5.3Video	512×768	~2.6s/frame	B
Cosmos Diffusion 7BVideo	256×256	~15.8s/frame	B
CogVideoX 5BVideo	256×256	~15s/frame	B
Wan2.2 TI2V 5BVideo	256×256	~15s/frame	B
Flux.2 Klein 9BImage	256×256	~5.2s	D
Flux.1 Fill DevImage	256×256	~24.3s	D
Mochi 1 PreviewVideo	256×256	~9.4s/frame	F
HunyuanVideo 1.5Video	256×256	~8.8s/frame	F
Helios 14BVideo	256×256	~10.8s/frame	F
SkyReels V2 14BVideo	256×256	~10.8s/frame	F
Wan Video 2.1 14BVideo	256×256	~10.8s/frame	F
Wan Video 2.2 14BVideo	256×256	~10.8s/frame	F
Qwen ImageImage	256×256	~9.6s	F
Qwen Image EditImage	256×256	~9.6s	F
Flux.2 DevImage	256×256	~4m 30s	F
MAGI-1Video	256×256	~13.4s/frame	F
HunyuanImage 3.0Image	256×256	~16.9s	F

RTX 5090 Laptop 24GB

Choose the operating mode for this hardware

About this GPU for AI

AI Capability Matrix

Specifications

Key Features

For AI Workloads

Blackwell

Should you buy RTX 5090 Laptop 24GB for local AI?

Recommendations by Workload

Full Model Compatibility

Models you could run with an upgrade

Diffusion Model Compatibility

Upgrade from RTX 5090 Laptop 24GB

Upgrade options

Frequently Asked Questions

RTX 5090 Laptop 24GB

Choose the operating mode for this hardware

About this GPU for AI

AI Capability Matrix

Specifications

Key Features

For AI Workloads

Blackwell

Should you buy RTX 5090 Laptop 24GB for local AI?

Recommendations by Workload

Full Model Compatibility

Models you could run with an upgrade

Diffusion Model Compatibility

Upgrade from RTX 5090 Laptop 24GB

Upgrade options

Frequently Asked Questions