LMSYS
Vicuna 7B (7B parameters) requires approximately 13.9 GB of VRAM with Q4_K_M quantization. For the best balance of quality and speed, we recommend hardware with at least 16 GB of VRAM.
Get started
— copy & paste to run locallyCopy-paste commands to run Vicuna 7B on your machine.
Run
ollama run vicunaQuick specs
About this model
Related models
Quick picks
Best hardware
Run this model
Quantization options
No hardware detected — fit column shows raw VRAM estimates
| Quant | Bits | VRAM | Quality | Fit |
|---|---|---|---|---|
Q2_K | 2 | 2.7 GB | Low | — |
Q3_K_S | 3 | 3.4 GB | Low | — |
NVFP4 | 4 | 3.9 GB | Medium | — |
Q4_K_M | 4 | 4.3 GB | Medium | — |
Q5_K_M | 5 | 5.0 GB | High | — |
Q6_K | 6 | 5.7 GB | High | — |
Q8_0 | 8 | 7.5 GB | Very High | — |
F16 | 16 | 14.3 GB | Maximum | — |
Quality benchmarks
Reasoning
General
Source: community · 2023-07-29
Hardware compatibility
Computing compatibility...
Memory breakdown
Frequently asked questions
Vicuna 7B (7B parameters) requires approximately 13.9 GB of VRAM with Q4_K_M quantization. Lower quantizations like Q4_K_M use less memory but may reduce quality.
Yes, RX 7600 XT 16GB can run Vicuna 7B with a compatibility score of 52/100. It provides 16 GB of memory and achieves approximately 39.1 tokens per second.
The recommended quantization for Vicuna 7B is Q4_K_M, which offers the best balance between model quality and memory efficiency. Higher quantizations preserve more quality but require more VRAM.
The top recommended hardware for Vicuna 7B: RX 7900 XT 20GB (score: 57/100), RTX A4500 20GB (score: 57/100), RTX 3090 24GB (score: 56/100). These provide the best combination of memory, bandwidth, and compute for running this model locally.
Yes, Vicuna 7B is well-suited for chat as well as instruction. It was designed with these use cases in mind.
See also