By Geeta Kakrani, Google Developer Expert (AI) | AI Consultant

In my two decades of experience in software engineering, I have seen many waves of innovation. But the transition to Agentic AI feels different. We are no longer just “calling models”; we are building systems that can think, delegate, and optimise themselves.
Recently, I decided to tackle a major challenge: building a production-grade AI infrastructure that is fast, intelligent, and architecturally elegant. I didn’t want a simple chatbot. I wanted a Smart Orchestrator.
The Vision: The Right Agent for the Right Task
In a high-performing team, you don’t ask a Senior Architect to handle a routine greeting, and you don’t ask a junior intern to design a global failover strategy.
My project, the “Kifayati AI Router,” treats LLMs like a specialised team. It uses a central Agentic Controller to decide which “specialist” should handle a specific user request based on intent and complexity.
The Architecture: A Three-Tier Engineering Approach
To make this work, I combined the best of open-source flexibility and managed cloud power.
1. The Instant Memory Layer (Semantic Caching)
Before any model is even woken up, the system checks its Semantic Cache.
- My Approach: I implemented a cache that understands intent, not just keywords. If a user asks a question similar to one answered before, the system serves the response instantly.
- The Result: Latency drops to near zero for repeat queries, and we save precious compute cycles.
2. The Private Specialist (Gemma 3 on GKE)
For 70% of routine tasks, I deployed Gemma 3 on Google Kubernetes Engine (GKE).
- The Engineering: By utilising Spot VMs, I created a highly resilient and cost-effective environment for hosting open weights.
- The Result: This is my “In-house Expert” — private, lightning-fast, and running on my own terms.
3. The Expert Council (Gemini 3 )
When the Agentic Router detects a high-reasoning task — like complex architectural design or deep coding logic — it promotes the query to Gemini3 on Vertex AI. This ensures that high-compute intelligence is used only when it is truly needed.
A Personal Reflection: The Feeling of Control
Building this was a massive “Aha!” moment for me. As a developer, there is a unique satisfaction in watching an Agentic Router make split-second decisions.
In my video demo, seeing the console log flash [CACHE HIT] or [ROUTING TO GKE] felt like watching a perfectly tuned engine. It isn’t just about saving resources; it’s about the Engineering Excellence of building a system that knows exactly how to handle itself under load.
https://medium.com/media/f07184a88973da020a16b9be3e4444ea/href
Beyond API Calls: How I Built an Agentic AI Orchestrator with Gemma, Gemini, and GKE was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.