Unsloth

gemma 3 27b it

Name: gemma 3 27b it
Rating: 49 (90 reviews)
Author: Unsloth

Limited data available — some specs may be incomplete or estimated.

gemma 3 27b it (27B parameters) requires approximately 21.4 GB of VRAM with Q4_K_M quantization. For the best balance of quality and speed, we recommend hardware with at least 25 GB of VRAM.

Quick specs

Parameters27B

Architecturedense

Context0K tokens

Modalitytext

Min RAM10.5 GB

Rec. RAM16.5 GB (Q4_K_M)

LicenseUnknown

FamilyGemma

✓ Chat

Related models

Inference speed

gemma 3 27b it inference speed — tokens per second by GPU & Mac

Estimated decode speed (tokens/sec) for gemma 3 27b it at Q4_K_M across popular GPUs and Apple Silicon, using the fastest local runtime per device. Fastest is RTX 5090 32GB at ~73 tok/s. Speed is memory-bandwidth bound, so cards that fit the whole model in VRAM run far faster than ones that offload to system RAM.

GPU / Mac	Memory	Quant	Speed (tok/s)	Fits?
RTX 5090 32GB	32 GB	Q4_K_M	72.9	Fits
RTX 4090 24GB	24 GB	Q4_K_M	46.5	Offloads

Quick picks

Best budgetC

Mac mini M4 64GB~$1,099 — 9 tok/s

Best overallB

RTX 5090 32GB~$1,999 — 73 tok/s

Best hardware

Top picks for gemma 3 27b it

RTX 5090 32GBB

32 GB

AMD Instinct MI100 32GBB

32 GB

NVIDIA A100 40GBC

40 GB

Run this model

gemma 3 27b it on RTX 5090 32GB gemma 3 27b it on AMD Instinct MI100 32GB gemma 3 27b it on NVIDIA A100 40GB

Quantization

gemma 3 27b it quantization — VRAM & quality by quant level

How much VRAM gemma 3 27b it (27B) needs at each GGUF quant, and whether it fits a 24 GB card (RTX 4090 / 3090). The recommended Q4_K_M uses ~16.5 GB — about 43% less VRAM than Q8_0, at a small quality cost.

Quant	Bits	VRAM (weights)	Quality	Fits 24 GB?
Q2_K	2	10.5 GB	Low	Fits
Q3_K_S	3	13.2 GB	Low	Tight
NVFP4	4	15.1 GB	Medium	Tight
Q4_K_Mrecommended	4	16.5 GB	Medium	Offloads

Hardware compatibility

Fit estimates across all hardware

Open calculator

Computing compatibility...

Memory breakdown

Reference: RTX 2060 6GB

Weights16.5 GB

KV Cache3.2 GB

Runtime1.2 GB

Headroom0.6 GB

Frequently asked questions

FAQ — gemma 3 27b it

How much VRAM does gemma 3 27b it need?

gemma 3 27b it (27B parameters) requires approximately 21.4 GB of VRAM with Q4_K_M quantization. Lower quantizations like Q4_K_M use less memory but may reduce quality.

Can I run gemma 3 27b it on a Mac mini M4 64GB?

Yes, Mac mini M4 64GB can run gemma 3 27b it with a compatibility score of 47/100. It provides 64 GB of memory and achieves approximately 8.7 tokens per second.

What is the best quantization for gemma 3 27b it?

The recommended quantization for gemma 3 27b it is Q4_K_M, which offers the best balance between model quality and memory efficiency. Higher quantizations preserve more quality but require more VRAM.

What hardware is recommended for gemma 3 27b it?

The top recommended hardware for gemma 3 27b it: RTX 5090 32GB (score: 56/100), AMD Instinct MI100 32GB (score: 55/100), NVIDIA A100 40GB (score: 55/100). These provide the best combination of memory, bandwidth, and compute for running this model locally.

Is gemma 3 27b it good for chat?

Yes, gemma 3 27b it is well-suited for chat. It was designed with these use cases in mind.