This is a submission for the Gemma 4 Challenge: Build with Gemma 4
Most AI coding tools ask for the same tradeoff:
“Give me your code, and I’ll give you help.”
I wanted to try the opposite.
What if a coding mentor lived inside VS Code, understood your repository, helped with real developer tasks, and kept your code on your own machine by default?
So I built Gemma Local Code Mentor.
TL;DR
Gemma Local Code Mentor is a local-first VS Code extension powered by Gemma 4.
It can:
- Explain selected code
- Suggest refactors
- Generate tests
- Summarize files
- Summarize repository architecture
- Answer questions about the repo
- Run through a local FastAPI backend
- Use Ollama as the default local model runtime
- Keep Local Only Mode enabled by default
No telemetry.
No cloud fallback.
No external API calls while Local Only Mode is on.
Your code stays where it belongs: on your machine.
What I Built
I built a VS Code extension plus a Dockerized FastAPI backend for developers who want AI help without sending private code to a remote API.
The workflow is simple:
- Select code in VS Code.
- Run a
Gemma:command. - The extension sends context to
127.0.0.1:8765. - The backend builds a task-specific prompt.
- Gemma 4 responds through a local provider.
- The result appears in a VS Code side panel.
The extension currently includes these commands:
Gemma: Explain SelectionGemma: Refactor SelectionGemma: Generate TestsGemma: Summarize FileGemma: Summarize ArchitectureGemma: Ask RepositoryGemma: Toggle Local Only ModeGemma: Open Panel
This is not just a chat box glued into an editor. The backend has structured prompt builders, response parsing, provider routing, tests, repository context handling, and privacy checks.
Why I Built It
There are many AI coding assistants now, but the privacy model often feels backwards.
For open source code, cloud tools are usually fine.
For client code, internal company projects, security-sensitive prototypes, or early startup ideas, uploading code somewhere else can be a blocker.
I wanted a coding assistant with different defaults:
| Feature | Typical Cloud Assistant | Gemma Local Code Mentor |
|---|---|---|
| Runs in VS Code | Yes | Yes |
| Explains code | Yes | Yes |
| Generates tests | Yes | Yes |
| Refactors code | Yes | Yes |
| Sends code to cloud | Often | No by default |
| Works with local models | Usually no | Yes |
| Has a local-only switch | Rare | Yes |
| Can be hacked by contributors | Limited | Fully open source |
The goal is not to beat every commercial coding assistant.
The goal is to prove that a useful AI coding mentor can be local-first from day one.
Demo
Suggested demo flow:
- Open a real code file in VS Code.
- Select a function.
- Run
Gemma: Explain Selection. - Run
Gemma: Generate Tests. - Ask a repository-level question.
- Show the side panel with
Local Only Mode: ON. - Show the backend running locally.
Code
Repository:

ennydev-2026
/
GemmaLocalCodeMentor
GLCM – Gemma Local Code Mentor
Gemma Local Code Mentor
Gemma Local Code Mentor is a local-first VSCode extension and Dockerized FastAPI backend for explaining, refactoring, testing, and summarizing code with local Gemma models.
What It Does
The project runs on the developer’s machine:
- VSCode extension in TypeScript.
- Local FastAPI backend on
127.0.0.1:8765. - Ollama as the default local model runtime.
- Local sample provider for development and tests without installed models.
- Double-model routing
- Fast model for short explanations and lightweight chat.
- Deep model for refactors, tests, architecture, and larger context.
- Local Only Mode enabled by default.
Architecture
flowchart LR
A["VSCode Extension"] --> B["FastAPI Backend :8765"]
B --> C["Prompt Orchestrator"]
B --> D["Repo Context Builder"]
B --> E["Local Index Store"]
B --> F["Model Router"]
F --> G["Fast Gemma Model"]
F --> H["Deep Gemma Model"]
G --> I["Ollama"]
H --> I
B --> J["Response Parser"]
J --> A
Loading
Commands
gemma.explainSelectiongemma.refactorSelectiongemma.generateTestsgemma.summarizeFilegemma.summarizeArchitecturegemma.askRepogemma.togglePrivacyModegemma.openPanel
Direct link:
https://github.com/ennydev-2026/GemmaLocalCodeMentor
How I Used Gemma 4
I used Gemma 4 as the reasoning layer behind the local code mentor.
The project is designed around two model roles:
- Gemma 4 E4B for fast tasks like short explanations and lightweight chat
- Gemma 4 31B Dense for deeper tasks like refactoring, test generation, architecture summaries, and larger context
That choice was intentional.
A code mentor should not use the largest model for every single request. If I ask what a small function does, I want a fast answer. If I ask for tests, architecture, or a refactor, I want deeper reasoning.
So the backend includes a model router:
-
fastmode uses the fast model -
deepmode uses the deep model -
automode chooses based on task type and context size
This makes Gemma 4 feel more like a practical local development tool instead of a single hardcoded model call.
Architecture
flowchart LR
A["VS Code Extension"] --> B["FastAPI Backend on 127.0.0.1:8765"]
B --> C["Prompt Builders"]
B --> D["Repository Context Builder"]
B --> E["Model Router"]
E --> F["Gemma 4 E4B Fast Model"]
E --> G["Gemma 4 31B Dense Deep Model"]
F --> H["Ollama"]
G --> H
B --> I["JSON Response Parser"]
I --> A
The stack:
- VS Code extension in TypeScript
- FastAPI backend in Python
- Ollama as the default local runtime
- Docker support
- Mock provider for development and tests
-
.gemmaignoresupport - Local URL safety checks
- Backend test coverage with
pytest
Local-First Is a Product Decision
The privacy layer is not just a README promise.
The repo includes:
- Local Only Mode enabled by default
- Backend URL validation
- No telemetry
- No cloud fallback
- No external API calls while local-only is enabled
-
.gemmaignorefor excluding sensitive files - Mock mode so contributors can work without installing a model first
That matters because local AI changes who can safely use these tools.
A freelancer can use it on client code.
A company can test AI workflows without sending source code away.
A student can learn from a mentor without paying API costs.
An open-source maintainer can customize the whole stack.
Run It Locally
Backend:
cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --host 127.0.0.1 --port 8765 --reload
Mock mode, no model required:
cd backend
GEMMA_PROVIDER=mock uvicorn app.main:app --host 127.0.0.1 --port 8765 --reload
Extension:
cd extension
npm install
npm run compile
Then open the project in VS Code, press F5, and run any Gemma: command.
What I Want Help With
This is where I want the community involved.
I would love contributors for:
- Better repository indexing
- Smarter prompt templates
- More language-aware code analysis
- Inline code actions
- Diff previews before applying refactors
- Local embeddings for repo search
- Better test framework detection
- llama.cpp provider support
- MLX provider support
- A polished marketplace-ready VSIX
- UI improvements for the side panel
If you care about local AI, open models, privacy-respecting devtools, or VS Code extensions, jump in.
Fork it.
Open an issue.
Try another Gemma 4 model.
Add a provider.
Improve the prompts.
Make the UX better.
Install the VS Code Extension
You can install the extension directly in VS Code using this identifier:
text
ennydev-2026.gemma-local-code-mentor
## Final Thought
AI coding tools are becoming part of the daily developer workflow.
That means defaults matter.
The default should not always be:
> "Upload your code first."
Sometimes the best place for your code is exactly where it already is:
**on your machine.**
What would you add to a local-first VS Code code mentor?