Will It Run AI

NVIDIA

NVIDIA DGX Spark 128GB

DGXWorkstationGrace BlackwellUNIFIEDCUDA
128GB
Unified Memory
273GB/s
Bandwidth
140W TDP

Operating mode

Choose the operating mode for this hardware

Use this to bias workload recommendations toward responsiveness, background autonomy, lighter serving, or multi-GPU scale-out.

Current mode

Balanced

Balanced for general local use. Keeps the ranking neutral across personal and serving workflows.

About this GPU for AI

NVIDIA DGX Spark 128GB is a compact Grace Blackwell personal AI system with 128 GB of coherent unified memory and the NVIDIA CUDA software stack preinstalled. It is aimed at developers who want to prototype, fine-tune, and run much larger local models than fit on 24 GB or 48 GB consumer GPUs, but without jumping straight to a rack-scale server.

Official product page ↗

Beyond LLMs

AI Capability Matrix

What AI tasks this GPU can handle — from text generation to image and video creation.

CapabilityStatusRepresentative Model
LLM Chat (7B)Runs nativelyLlama 3.1 8B Q4
LLM Coding (30B)Runs nativelyQwen 3 30B Q4
LLM Large (70B)Runs nativelyLlama 3.1 70B Q4
Image Gen (SDXL)Runs nativelySDXL 1.0 FP16
Image Gen (Flux)Runs nativelyFlux.1 Dev FP16
Image Gen (SD 3.5)Runs nativelySD 3.5 Large FP16
Video Short (25f)Runs nativelyLTX Video 2B
Video Long (100f)Runs nativelyWan Video 14B
unified-memoryhigh-memorycudaworkstationcompact-ai-system

规格参数

算力
架构Grace Blackwell
显存
统一内存128 GB
带宽273 GB/s
类型LPDDR5x
通用
系列DGX
定位Workstation
互连UNIFIED
计算平台CUDA
TDP140W

核心特性

NVIDIA GB10 Grace Blackwell Superchip128 GB LPDDR5x coherent unified system memory273 GB/s memory bandwidthCUDA software stack in a compact desktop systemConnectX-7 networking with support for pairing two systems

AI 工作负载

优势
  • Much larger single memory pool than typical desktop GPUs, making big-model local inference practical
  • CUDA ecosystem is easier to live with than oneAPI or many niche local-AI stacks
  • Compact, appliance-like system for serious local AI without a full multi-GPU tower
  • Good fit for prototyping, testing, and fine-tuning workflows that are memory-limited on 24-48 GB cards
注意事项
  • 273 GB/s unified-memory bandwidth is far below top discrete datacenter GPUs, so tokens per second can still lag once a model fits
  • Unified system memory is not the same thing as 128 GB of dedicated HBM or GDDR VRAM
  • Single-box convenience does not replace a true multi-GPU server when you need scale-out throughput
  • Best seen as a large-memory developer workstation, not as a cheap substitute for high-end inference infrastructure

购买建议

是否应该购买 NVIDIA DGX Spark 128GB 用于本地 AI?

本地 AI 的绝佳选择

能良好运行 50 个顶级模型中的 36 个 — 本地推理的全能之选。

128.0 GB

Unified memory

最适合此 GPU 的模型

What will limit you first

The model fits in shared memory, but shared-memory bandwidth is now the real limiter.

Fit does not mean dedicated-VRAM speed

Unified or shared memory can make a model technically fit, but sustained tokens per second may still trail a discrete high-bandwidth GPU with less total memory.

Shared-memory contention still exists

The OS, browser, and inference runtime all compete for the same physical memory pool, so real-world headroom is less forgiving than raw capacity suggests.

Best upgrade itinerary

Prioritize bandwidth, not only capacity

If this workload feels slow, the next useful step is often a GPU tier with materially faster memory bandwidth rather than only a small bump in capacity.

Unlocks 2 additional models that do not fit on the current setup.

想要更多余量? NVIDIA H200 141GB (141.0 GB VRAM) 是下一步升级选择。

Recommendations by Workload

Chat

S

Qwen3-Coder-Next

This model is still usable for chat, but it is not the most specialized pick. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Decode 11.1 tok/s · 256K ctx · llama.cppEST.
63.5 GB / 108.8 GB unified memory

Coding

S

Qwen3-Coder-Next

This model is a direct match for coding. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Decode 11.1 tok/s · 256K ctx · llama.cppEST.
64.2 GB / 108.8 GB unified memory

Agentic Coding

S

Devstral 2 123B Instruct

This model is still usable for agentic-coding, but it is not the most specialized pick. It belongs to a current frontier family for local AI. It should run, but memory headroom will be limited. Known channels: huggingface, lm-studio.

Decode 2.4 tok/s · 59K ctx · llama.cppEST.
99.7 GB / 108.8 GB unified memory

Reasoning

S

Command A 111B

This model is a direct match for reasoning. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Decode 2.6 tok/s · 111K ctx · llama.cppEST.
85.6 GB / 108.8 GB unified memory

RAG

S

Qwen 3.5 122B A10B

This model is a direct match for rag. It belongs to a current frontier family for local AI. It should run, but memory headroom will be limited. Known channels: huggingface, lm-studio.

Decode 6.6 tok/s · 131K ctx · llama.cppEST.
93.3 GB / 108.8 GB unified memory

Full Model Compatibility

AlibabaQwen 3.5 122B A10B
S88
122B90.8 GB7 tok/s131K ctx
moe
AlibabaQwen3-Coder-Next
S88
80B64.2 GB11 tok/s256K ctx
moe
AlibabaQwen3-Coder 30B A3B Instruct
S87
30.5B34.0 GB25 tok/s256K ctx
moe
AlibabaQwen 3.6 35B A3B
S87
35B39.4 GB21 tok/s262K ctx
+1moe
AlibabaQwen3-VL 30B A3B Instruct
S87
30B33.7 GB26 tok/s256K ctx
moe
CohereCommand A 111B
S87
111B85.6 GB3 tok/s111K ctx
dense
MistralMistral Small 4 119B
S86
119B91.9 GB7 tok/s66K ctx
moe
AlibabaQwen 3.5 35B A3B
S86
35B36.8 GB23 tok/s131K ctx
moe
MistralDevstral 2 123B Instruct
S86
123B94.4 GB2 tok/s59K ctx
dense
AlibabaQwen 2.5 VL 72B
S85
72B62.8 GB4 tok/s33K ctx
dense
AlibabaQwen 3 30B A3B
A85
30.5B34.0 GB25 tok/s131K ctx
moe
AlibabaQwen 3.5 9B
A84
9B21.6 GB32 tok/s131K ctx
dense
AlibabaQwen 3.5 27B
A84
27B33.6 GB11 tok/s131K ctx
dense
AlibabaQwen 3.5 4B
A84
4B18.6 GB56 tok/s131K ctx
dense
OpenAIGPT-OSS 120B
A83
117B90.2 GB3 tok/s77K ctx
dense
MistralMagistral Small 2507
A83
24B31.0 GB12 tok/s131K ctx
dense
AlibabaQwen 3 8B
A83
8B21.0 GB36 tok/s131K ctx
dense
AlibabaQwen 3 32B
A83
32B37.4 GB9 tok/s131K ctx
dense
AlibabaQwen 3.6 27B
A83
27B31.4 GB8 tok/s262K ctx
+1dense
MistralDevstral Small 2 24B Instruct
A83
24B31.0 GB12 tok/s256K ctx
dense
OpenAIGPT-OSS 20B
A83
21B29.2 GB31 tok/s128K ctx
moe
AlibabaQwen 3 14B
A83
14B24.9 GB21 tok/s131K ctx
dense
Mistral AIPixtral Large 124B
A82
124B95.0 GB2 tok/s57K ctx
dense
NVIDIANemotron Cascade 2 30B A3B
A82
30B35.2 GB25 tok/s262K ctx
moe
MicrosoftPhi-4-reasoning-plus 14B
A82
14.7B26.0 GB20 tok/s33K ctx
dense
NVIDIANemotron 3 Nano 30B
A82
30B34.7 GB10 tok/s131K ctx
dense
MistralLeanstral 119B A6B
A82
119B95.3 GB7 tok/s41K ctx
moe
MistralDevstral Small 1.1
A81
24B31.0 GB12 tok/s131K ctx
dense
MicrosoftPhi-4 Mini Reasoning 4B
A81
3.8B17.7 GB53 tok/s131K ctx
dense
GoogleGemma 4 31B
A80
30.7B47.3 GB7 tok/s83K ctx
dense
GoogleGemma 4 26B A4B
A79
25.2B33.0 GB27 tok/s256K ctx
moe
NVIDIANemotron Nano 8B
A78
8B20.8 GB36 tok/s131K ctx
dense
MistralMinistral 3 14B
A77
14B24.9 GB21 tok/s262K ctx
multimodal
LG AIEXAONE 4.0 32B
A77
32B37.4 GB9 tok/s131K ctx
dense
Jina AIJina Embeddings v3
A74
0.57B17.1 GB8 tok/s8K ctx
dense
BAAIBGE M3
A73
0.57B16.2 GB8 tok/s8K ctx
dense
AlibabaQwen 3.5 397B A17B
F0
397B259.0 GB2 tok/s4K ctx
moe
Moonshot AIKimi K2.5
F0
1000B631.4 GB2 tok/s4K ctx
moe
Moonshot AIKimi K2.6
F0
1000B631.4 GB2 tok/s4K ctx
+1moe
DeepSeekDeepSeek V4 Pro
F0
1600B877.8 GB2 tok/s4K ctx
moe
DeepSeekDeepSeek V4 Flash
F0
284B173.3 GB2 tok/s4K ctx
moe
Z.aiGLM-5.1
F0
754B492.9 GB2 tok/s4K ctx
moe
Z.aiGLM-5
F0
744B486.8 GB2 tok/s4K ctx
moe
DeepSeekDeepSeek V3.2
F0
671B423.7 GB2 tok/s4K ctx
moe
AlibabaQwen 3 235B A22B
F0
235B160.2 GB2 tok/s4K ctx
moe
AlibabaQwen3-Coder 480B A35B Instruct
F0
480B309.6 GB2 tok/s4K ctx
moe
MiniMax M2.7
F0
230B158.0 GB2 tok/s4K ctx
moe
DeepSeekDeepSeek Coder V2 236B
F0
236B216.5 GB2 tok/s4K ctx
moe
DeepSeekDeepSeek R1 671B
F0
671B482.8 GB2 tok/s4K ctx
moe
DeepSeekDeepSeek V3.1 671B
F0
671B482.8 GB2 tok/s4K ctx
moe

触手可及

升级后即可运行的模型

高质量模型,只需稍多一点内存

Image & Video Generation

Diffusion Model Compatibility

51 of 52 models can generate images or video on your NVIDIA DGX Spark 128GB

ModelMax ResolutionGen TimeGrade
SD TurboImage512×512~8.9sS
Stable Diffusion 1.5Image512×768~17.8sS
Realistic Vision v5.1Image512×768~17.8sS
DreamShaper 8Image512×768~17.8sS
LCM DreamShaper v7Image512×768~5.3sS
PixArt-SigmaImage1024×1024~1m 11sS
FramePack I2VVideo1280×720~2m 11s/frameS
SDXL TurboImage512×512~8.9sS
SDXL LightningImage1024×1024~26.7sS
Stable Diffusion XL 1.0Image1024×1024~1m 11sS
Playground v2.5Image1024×1024~1m 47sS
RealVisXL v5.0Image1024×1024~1m 20sS
DreamShaper XLImage1024×1024~1m 20sS
Juggernaut XL v9Image1024×1024~1m 20sS
Animagine XL 3.1Image1024×1024~1m 20sS
Pony Diffusion V6 XLImage1024×1024~1m 20sS
Animagine XL 4.0Image1024×1024~1m 20sS
Illustrious XLImage1024×1024~1m 20sS
Wan Video 2.1 1.3BVideo480×832~52s/frameS
Stable Diffusion 3.5 MediumImage1024×1024~2m 5sS
Flux.2 Klein 4BImage1024×1024~21.4sS
LTX Video 2BVideo1280×720~1m 2s/frameS
KolorsImage1024×1024~2m 22sS
Stable CascadeImage1024×1024~2m 58sS
AuraFlow v0.3Image1536×1536~5m 21sS
Stable Diffusion 3.5 LargeImage1024×1024~6m 32sS
Stable Diffusion 3.5 Large TurboImage1024×1024~1m 11sS
CogVideoX 2BVideo720×480~1m 2s/frameS
HunyuanVideoVideo720×1280~2m 11s/frameS
ChromaImage1024×1024~1m 11sS
Z-Image TurboImage1536×1536~1m 14sS
Flux.1 DevImage1024×1024~5m 21sS
Flux.1 SchnellImage1024×1024~1m 2sS
LTX Video 13BVideo1280×720~2m 11s/frameS
Flux.1 Kontext DevImage1024×1024~5m 56sS
AnimateDiff v1.5.3Video512×768~32.5s/frameS
Cosmos Diffusion 7BVideo1024×576~1m 42s/frameS
CogVideoX 5BVideo720×480~1m 29s/frameS
Wan2.2 TI2V 5BVideo832×480~1m 29s/frameS
Flux.2 Klein 9BImage1024×1024~35.6sS
Flux.1 Fill DevImage1024×1024~5m 3sS
Mochi 1 PreviewVideo848×480~1m 58s/frameS
HunyuanVideo 1.5Video720×1280~1m 49s/frameS
Helios 14BVideo1280×720~2m 15s/frameS
SkyReels V2 14BVideo1280×720~2m 15s/frameS
Wan Video 2.1 14BVideo720×1280~2m 15s/frameS
Wan Video 2.2 14BVideo720×1280~2m 15s/frameS
Qwen ImageImage1024×1024~1m 60sS
Qwen Image EditImage1024×1024~1m 60sS
Flux.2 DevImage1024×1024~56m 10sS
MAGI-1Video1280×720~2m 47s/frameS
HunyuanImage 3.0Image256×256~3m 31sF

Image models estimated at 1024×1024 (28 steps, FP16). Video models estimated at 768×512 (25 frames, 30 steps, FP16). Actual performance varies with runtime and system load.

Upgrade paths

Upgrade from NVIDIA DGX Spark 128GB

See what you unlock with more powerful hardware

升级选项

升级选项

Frequently Asked Questions

What AI models can I run on NVIDIA DGX Spark 128GB?

NVIDIA DGX Spark 128GB (128 GB unified memory) can run these top models: Qwen 3.5 122B A10B (score: 88/100), Qwen3-Coder-Next (score: 88/100), Qwen3-Coder 30B A3B Instruct (score: 87/100). See the full compatibility list above.

How much unified memory does NVIDIA DGX Spark 128GB have for AI?

NVIDIA DGX Spark 128GB ships with 128 GB of unified memory, with roughly 108.8 GB realistically usable for AI inference after OS and runtime overhead.

Is NVIDIA DGX Spark 128GB good for running LLMs locally?

Yes, NVIDIA DGX Spark 128GB is excellent for running LLMs locally with top compatibility scores above 80/100.

What is the best model for NVIDIA DGX Spark 128GB for coding?

For coding on NVIDIA DGX Spark 128GB, we recommend Qwen3-Coder-Next. It achieves 11.1 tokens per second with 256K context window. This model is a direct match for coding. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.

Should I upgrade from NVIDIA DGX Spark 128GB?

There are 4 upgrade path(s) from NVIDIA DGX Spark 128GB: NVIDIA H200 141GB, NVIDIA B200 180GB. Upgrading would unlock larger models and faster inference speeds.

Can NVIDIA DGX Spark 128GB run Flux for image generation?

Yes, NVIDIA DGX Spark 128GB with 109 GB of usable memory can run Flux.1 Dev at FP16 natively. Flux is a 12B parameter diffusion transformer that produces high-quality images. You can also run the Schnell variant for faster generation.

What image and video AI models can I run on NVIDIA DGX Spark 128GB?

NVIDIA DGX Spark 128GB (128 GB unified memory) can handle various AI generation tasks beyond LLMs. For image generation, SDXL and Stable Diffusion 3.5 run well. Flux.1 Dev also runs natively for state-of-the-art image quality. For video, LTX Video 2.3 can generate short clips. Check the AI Capability Matrix above for detailed compatibility.

Is NVIDIA DGX Spark 128GB good for AI image generation?

NVIDIA DGX Spark 128GB is excellent for AI image generation. With 109 GB of usable memory, it runs all major diffusion models including Flux.1, SDXL, and Stable Diffusion 3.5 at full precision. You can generate high-resolution images quickly and even handle video generation models.

Can NVIDIA DGX Spark 128GB run Qwen 3.5 27B?

Yes, NVIDIA DGX Spark 128GB with 109 GB of usable memory can run Qwen 3.5 27B at Q8 (near-lossless, ~28.9 GB) or even FP16 (~55.4 GB) depending on your context needs. This setup provides an excellent experience with this model. Use Ollama or vLLM for best results.

What is the best quantization for AI models on NVIDIA DGX Spark 128GB?

With 109 GB VRAM on NVIDIA DGX Spark 128GB, use Q8_0 for most models — it is near-lossless and you have the memory for it. For 70B+ models, Q6_K offers excellent quality. Reserve Q4_K_M for 100B+ models or when you need maximum context length.

For local LLMs on NVIDIA DGX Spark 128GB, does memory capacity matter more than bandwidth?

NVIDIA DGX Spark 128GB has enough memory for many local LLMs, but bandwidth still matters a lot for real speed. Once a model fits, a faster-memory GPU can feel significantly better than a slower setup with similar capacity.

Compare with similar