- Google unveiled Gemma 4 12B on June 3, 2026.
The model packs 12 billion parameters yet can operate on any laptop that has 16 GB of RAM. Google achieves this by switching to a novel encoding scheme and a token‑prediction strategy that reduces memory overhead. The model is distributed as a downloadable package, but the exact file size was not disclosed.
This matters because earlier Gemma releases required desktop‑class hardware or cloud resources to run. By shrinking the memory footprint, Gemma 4 makes local LLM experimentation more accessible to developers who only have a standard notebook.
The move echoes a broader push to democratise AI execution, though performance will still lag behind larger, server‑grade models.