Mistral Small 4 VRAM Requirements - 119B Hardware Guide
Exact VRAM for Mistral Small 4 119B at Q4_K_M, Q5_K_M, Q6_K, Q8_0, and FP16. See whether 80GB GPUs or high-memory Macs can run it locally.
If you are searching for Mistral Small 4 VRAM requirements, the short answer is: this is not a 24GB or 48GB consumer-GPU model.
Quick answers
- Q4_K_M: ~72.6 GB
- Q5_K_M: ~85.7 GB
- Q6_K: ~97.6 GB
- Q8_0: ~127.3 GB
- FP16: ~244.0 GB
Mistral Small 4 119B sits in a very different class from Mistral Small 24B. It is a high-end local model for 80GB GPUs, high-memory Macs, or larger serving setups.
Mistral Small 4 VRAM by Quantization
These numbers are weights-only estimates. Add more memory for KV cache, runtime overhead, and serving headroom.
| Quantization | VRAM |
|---|---|
| Q4_K_M | 72.6 GB |
| Q5_K_M | 85.7 GB |
| Q6_K | 97.6 GB |
| Q8_0 | 127.3 GB |
| FP16 | 244.0 GB |
Why Mistral Small 4 Feels Confusing
The name makes it sound close to "Mistral Small 24B". It is not.
Mistral Small 4 is a frontier-tier 119B model with a much larger memory footprint. That is why Google is ranking broad Mistral requirement pages for queries like mistral small 4 vram requirements while the answer people need is actually simple:
- it is a serious model
- it needs serious memory
- it is not a clean fit for mainstream single-GPU local use
What Hardware Can Actually Run Mistral Small 4?
24GB and 48GB GPUs
This is the wrong tier.
- A single RTX 4090 24GB is not enough
- Even 48GB class cards are still well below a clean Q4_K_M fit
You can force ugly offload-heavy setups, but that is not a good recommendation.
80GB GPU Tier
This is where Mistral Small 4 starts making sense.
- NVIDIA H100 80GB: realistic Q4_K_M single-GPU option
- NVIDIA A100 80GB: also viable at Q4_K_M
If your goal is to run Mistral Small 4 locally without turning the setup into an experiment, 80GB is the real entry point.
High-Memory Apple Silicon
Apple Silicon can be relevant here because unified memory changes the fit story.
- 128GB Macs are the first plausible single-machine Apple tier for Q4_K_M
- 192GB Macs are much safer if you want headroom for context and runtime overhead
This is exactly the kind of model where Apple Silicon becomes attractive as a capacity-first local platform.
Serving and Multi-GPU Setups
If you care about throughput rather than "can I technically load it once", this is a serving-grade model.
That means:
- multiple high-memory GPUs
- tensor parallel or distributed inference
- runtimes built for serving, not only personal chat
For that side of the problem, the right mental model is closer to workstation or lab infrastructure than enthusiast desktop hardware.
Is Mistral Small 4 Worth It for Local AI?
Only if you are already in the right hardware tier.
It makes sense when:
- you already own 80GB+ GPU hardware
- you have a 128GB-192GB Apple Silicon machine and want a bigger local reasoning model
- you care about frontier-class local quality more than simple setup
It does not make sense when:
- you are trying to stretch a 24GB consumer GPU
- you want the easiest high-quality local Mistral experience
- you are really looking for the best model that still feels sane on mainstream hardware
In that mainstream tier, the right answer is still Mistral Small 24B, not Mistral Small 4.
Better Alternatives for Smaller Hardware
If you do not have 80GB-class memory, these are the more realistic Mistral choices:
If you want the broader family overview, use the main guide:
Bottom Line
Mistral Small 4 119B is a real local model only if you already have serious hardware.
- Q4_K_M starts around 72.6GB
- 24GB consumer GPUs are out
- 80GB GPUs or high-memory Macs are the realistic starting point
If you are searching this because you want to know whether your normal desktop can run it, the practical answer is no. If you are already shopping in the 80GB or 128GB+ tier, then it becomes interesting.