- AI KATANA
- Posts
- Gemma 3n Edge Gallery Guide: Run Google’s Gemma 3n Locally on Your Phone
Gemma 3n Edge Gallery Guide: Run Google’s Gemma 3n Locally on Your Phone
Step-by-step Gemma 3n Edge Gallery guide

TL;DR
Google’s Gemma 3n is a lightweight, multimodal language model tuned for phones. The free Google AI Edge Gallery app (Android ≥ 12, 6 GB RAM+) lets you download the Gemma 3n 4 B model pack (~1.5 GB) and run chat or “ask-image” demos fully offline. iOS coming soon.
You’ll:
Install the Gallery APK
Fetch the model from the in-app catalog
Pick GPU or CPU back-end
Launch a chat, point the camera, or load a photo for Q&A
By the end, you’ll have a working on-device AI lab in your pocket—no coding, no cloud.
What Is Gemma 3n in Plain English?
Think of Gemma 3n as a smart assistant squeezed down to fit inside your phone. It’s part of Google’s open “Gemma 3” family but uses a new Matryoshka Transformer (“MatFormer”) design. Like nested dolls, smaller sub-models sit inside a bigger one; the phone activates only the layers it has power for, saving battery and memory. Gemma 3n understands text, images, and audio, handles 4 000-token conversations, and speaks 140+ languages.
💡 Fun fact: Loading Gemma 3n is like adding a camera filter—you download once, and it works system-wide.
How Google AI Edge Gallery Works
The Gallery is an experimental model manager: a storefront where you choose models, fetch them over the internet once, then run them offline using either:
GPU back-end – uses your phone’s graphics cores.
CPU – Android’s built-in Neural Network engine, often faster on Google Tensor or Snapdragon chips.
Device & OS Requirements
Component | Minimum | Why It Matters |
---|---|---|
Android | 12 (API 31) | NNAPI 1.3 features |
RAM | 6 GB | Prevents swap thrashing |
Free storage | 2.5 GB | Model + cache |
SoC | 64-bit with Vulkan 1.1 GPU OR hardware neural engine | Required for real-time inference |
⚠️ Warning: Low-end devices may install but crash on first inference. Test before traveling.
Specs come from Google’s release notes and early reviewers.
Hands-On Setup
Download & Install Edge Gallery
Visit the GitHub Releases page on your phone.
Grab the latest
edge-gallery-v*.apk
.Accept the unknown-sources prompt and install. Path: Settings → Security → Install unknown apps (toggle once).

Fetch the Gemma 3n 4 B Model Pack
Open Edge Gallery → Select any capability

Tap Gemma 3n 4 B (E4B-it-litert) ➜ Download (≈ 1.5 GB). Note: Certain models, like Gemma 3, have restricted access and can only be downloaded after signing in with a Hugging Face account and agreeing to their licensing terms.

Within each task, you can often customize model inference settings—such as switching between CPU and GPU, adjusting temperature, and more.

Pick Chat → type ➜ reply appears in ~1 s/token. For image Q&A, grant camera permission, snap, then ask.
Stats below chat responses:
Time to First Token (TTFT)
Prefill Speed (tokens/s)
Decode Speed (tokens/s)
Latency (sec)
“Try It Yourself” Mini-Projects
Mini-Project 1: Play around in Prompt Lab
Rewrite copy, summarize text and create code snippets.

Mini-Project 2: Solving Equations
Point camera at an equation.
Ask: “How to solve for X?”

Mini-Project 3: Multilingual Quick-Chat
“Translate ‘Nice to meet you’ to Tamil, Mandarin, and Arabic.”
Limitations, Myths & Ethical Considerations
Not omniscient: Offline means no real-time web search; facts may be stale.
Built-in safety: Gemma 3n ships with a lightweight content filter blocking harmful outputs, enforced on device.
Privacy isn’t absolute: Screenshots or crash logs you share with support could expose prompts.
Myth: “Edge models are private by default.” True except if you enable cloud-backup of app data.
Quick-Start Recap ✔️
Android 12+ phone with ≥ 6 GB RAM
Install Edge Gallery APK
Download Gemma 3n 4 B pack (~1.5 GB)
Choose GPU or CPU
Open Chat, Ask-Image demo or Prompt Lab