I Built a Local-First VSCode Code Mentor with Gemma 4 — Your Code Never Leaves Your Machine

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

Most AI coding tools ask for the same tradeoff:

“Give me your code, and I’ll give you help.”

I wanted to try the opposite.

What if a coding mentor lived inside VS Code, understood your repository, helped with real developer tasks, and kept your code on your own machine by default?

So I built Gemma Local Code Mentor.

TL;DR

Gemma Local Code Mentor is a local-first VS Code extension powered by Gemma 4.

It can:

  • Explain selected code
  • Suggest refactors
  • Generate tests
  • Summarize files
  • Summarize repository architecture
  • Answer questions about the repo
  • Run through a local FastAPI backend
  • Use Ollama as the default local model runtime
  • Keep Local Only Mode enabled by default

No telemetry.
No cloud fallback.
No external API calls while Local Only Mode is on.
Your code stays where it belongs: on your machine.

What I Built

I built a VS Code extension plus a Dockerized FastAPI backend for developers who want AI help without sending private code to a remote API.

The workflow is simple:

  1. Select code in VS Code.
  2. Run a Gemma: command.
  3. The extension sends context to 127.0.0.1:8765.
  4. The backend builds a task-specific prompt.
  5. Gemma 4 responds through a local provider.
  6. The result appears in a VS Code side panel.

The extension currently includes these commands:

  • Gemma: Explain Selection
  • Gemma: Refactor Selection
  • Gemma: Generate Tests
  • Gemma: Summarize File
  • Gemma: Summarize Architecture
  • Gemma: Ask Repository
  • Gemma: Toggle Local Only Mode
  • Gemma: Open Panel

This is not just a chat box glued into an editor. The backend has structured prompt builders, response parsing, provider routing, tests, repository context handling, and privacy checks.

Why I Built It

There are many AI coding assistants now, but the privacy model often feels backwards.

For open source code, cloud tools are usually fine.

For client code, internal company projects, security-sensitive prototypes, or early startup ideas, uploading code somewhere else can be a blocker.

I wanted a coding assistant with different defaults:

Feature Typical Cloud Assistant Gemma Local Code Mentor
Runs in VS Code Yes Yes
Explains code Yes Yes
Generates tests Yes Yes
Refactors code Yes Yes
Sends code to cloud Often No by default
Works with local models Usually no Yes
Has a local-only switch Rare Yes
Can be hacked by contributors Limited Fully open source

The goal is not to beat every commercial coding assistant.

The goal is to prove that a useful AI coding mentor can be local-first from day one.

Demo

Suggested demo flow:

  1. Open a real code file in VS Code.
  2. Select a function.
  3. Run Gemma: Explain Selection.
  4. Run Gemma: Generate Tests.
  5. Ask a repository-level question.
  6. Show the side panel with Local Only Mode: ON.
  7. Show the backend running locally.

Code

Repository:

GitHub logo

ennydev-2026
/
GemmaLocalCodeMentor

GLCM – Gemma Local Code Mentor

Gemma Local Code Mentor

Gemma Local Code Mentor is a local-first VSCode extension and Dockerized FastAPI backend for explaining, refactoring, testing, and summarizing code with local Gemma models.

What It Does

The project runs on the developer’s machine:

  • VSCode extension in TypeScript.
  • Local FastAPI backend on 127.0.0.1:8765.
  • Ollama as the default local model runtime.
  • Local sample provider for development and tests without installed models.
  • Double-model routing
    • Fast model for short explanations and lightweight chat.
    • Deep model for refactors, tests, architecture, and larger context.
  • Local Only Mode enabled by default.

Architecture

flowchart LR
    A["VSCode Extension"] --> B["FastAPI Backend :8765"]
    B --> C["Prompt Orchestrator"]
    B --> D["Repo Context Builder"]
    B --> E["Local Index Store"]
    B --> F["Model Router"]
    F --> G["Fast Gemma Model"]
    F --> H["Deep Gemma Model"]
    G --> I["Ollama"]
    H --> I
    B --> J["Response Parser"]
    J --> A


Loading

Commands

  • gemma.explainSelection
  • gemma.refactorSelection
  • gemma.generateTests
  • gemma.summarizeFile
  • gemma.summarizeArchitecture
  • gemma.askRepo
  • gemma.togglePrivacyMode
  • gemma.openPanel

Direct link:

https://github.com/ennydev-2026/GemmaLocalCodeMentor

How I Used Gemma 4

I used Gemma 4 as the reasoning layer behind the local code mentor.

The project is designed around two model roles:

  • Gemma 4 E4B for fast tasks like short explanations and lightweight chat
  • Gemma 4 31B Dense for deeper tasks like refactoring, test generation, architecture summaries, and larger context

That choice was intentional.

A code mentor should not use the largest model for every single request. If I ask what a small function does, I want a fast answer. If I ask for tests, architecture, or a refactor, I want deeper reasoning.

So the backend includes a model router:

  • fast mode uses the fast model
  • deep mode uses the deep model
  • auto mode chooses based on task type and context size

This makes Gemma 4 feel more like a practical local development tool instead of a single hardcoded model call.

Architecture

flowchart LR
    A["VS Code Extension"] --> B["FastAPI Backend on 127.0.0.1:8765"]
    B --> C["Prompt Builders"]
    B --> D["Repository Context Builder"]
    B --> E["Model Router"]
    E --> F["Gemma 4 E4B Fast Model"]
    E --> G["Gemma 4 31B Dense Deep Model"]
    F --> H["Ollama"]
    G --> H
    B --> I["JSON Response Parser"]
    I --> A

The stack:

  • VS Code extension in TypeScript
  • FastAPI backend in Python
  • Ollama as the default local runtime
  • Docker support
  • Mock provider for development and tests
  • .gemmaignore support
  • Local URL safety checks
  • Backend test coverage with pytest

Local-First Is a Product Decision

The privacy layer is not just a README promise.

The repo includes:

  • Local Only Mode enabled by default
  • Backend URL validation
  • No telemetry
  • No cloud fallback
  • No external API calls while local-only is enabled
  • .gemmaignore for excluding sensitive files
  • Mock mode so contributors can work without installing a model first

That matters because local AI changes who can safely use these tools.

A freelancer can use it on client code.
A company can test AI workflows without sending source code away.
A student can learn from a mentor without paying API costs.
An open-source maintainer can customize the whole stack.

Run It Locally

Backend:

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --host 127.0.0.1 --port 8765 --reload

Mock mode, no model required:

cd backend
GEMMA_PROVIDER=mock uvicorn app.main:app --host 127.0.0.1 --port 8765 --reload

Extension:

cd extension
npm install
npm run compile

Then open the project in VS Code, press F5, and run any Gemma: command.

What I Want Help With

This is where I want the community involved.

I would love contributors for:

  • Better repository indexing
  • Smarter prompt templates
  • More language-aware code analysis
  • Inline code actions
  • Diff previews before applying refactors
  • Local embeddings for repo search
  • Better test framework detection
  • llama.cpp provider support
  • MLX provider support
  • A polished marketplace-ready VSIX
  • UI improvements for the side panel

If you care about local AI, open models, privacy-respecting devtools, or VS Code extensions, jump in.

Fork it.
Open an issue.
Try another Gemma 4 model.
Add a provider.
Improve the prompts.
Make the UX better.

Install the VS Code Extension

You can install the extension directly in VS Code using this identifier:


text
ennydev-2026.gemma-local-code-mentor

## Final Thought

AI coding tools are becoming part of the daily developer workflow.

That means defaults matter.

The default should not always be:

> "Upload your code first."

Sometimes the best place for your code is exactly where it already is:

**on your machine.**

What would you add to a local-first VS Code code mentor?
Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

Intel’s comeback story is even wilder than it seems

Next Post

PMM Powerhour: Developer Marketing

Related Posts