Software

2 minute read

How to Build Production AI Agents with an MCP Gateway

January 27, 2026

MCP servers are everywhere now; filesystem tools, web search, databases, Slack integrations. But connecting AI directly to these servers creates problems at scale: token bloat, security risks, and orchestration chaos.

An MCP gateway sits between your LLM and these servers, handling security, execution, and performance. Here’s what you need to know about building agents with a gateway.

The Problem with Direct MCP Connections

When Claude or GPT-4 connects directly to MCP servers, every request carries all tool definitions in context. Connect 5 servers with 100 tools? Every single LLM call includes 100 tool definitions—even for simple queries.

This creates three issues:

Token waste: Most of your context budget goes to tool catalogs instead of actual work. A 6-turn conversation with 100 tools burns through 600+ tokens just on definitions.

Security gaps: Tools execute without validation or approval. No audit trail, no user confirmation, no safety checks before destructive operations.

Coordination overhead: Each tool call requires a separate LLM round-trip. Fetching 5 pieces of data means 5 LLM calls, not one intelligent workflow.

What an MCP Gateway Does

A gateway like Bifrost provides a control plane for MCP:

Connection management: Connect to servers via STDIO, HTTP, or SSE protocols. The gateway discovers tools, maintains connections, and monitors health every 10 seconds.

Security layer: Tool calls from LLMs are suggestions, not commands. You review and approve each execution. Full audit trails for compliance.

Three execution modes: Manual (approve each tool), Agent Mode (auto-execute specific tools), or Code Mode (AI writes TypeScript to orchestrate everything).

maximhq
/
bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions 
  -H "Content-Type: application/json" 
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'

That’s it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…

View on GitHub

Execution Modes Explained

Manual Mode: Full Control

Default behavior. LLM suggests tools, you execute them explicitly:

# 1. Chat request
POST /v1/chat/completions
→ LLM returns tool call suggestions

# 2. Your approval logic here
# Check permissions, validate args, get user OK

# 3. Execute approved tools
POST /v1/mcp/tool/execute

# 4. Continue conversation
POST /v1/chat/completions with results

No tools run without your explicit API call. Perfect for sensitive operations.

Agent Mode: Controlled Autonomy

Configure which tools can auto-execute. The gateway runs approved tools automatically:

{
  "tools_to_execute": ["*"],
  "tools_to_auto_execute": ["read_file", "list_directory", "search"]
}

Safe operations (read, search) run autonomously. Dangerous operations (write, delete) still need approval.

Agent Mode runs tools in parallel for speed, continues for up to 10 iterations by default, and stops when no more tool calls are needed or max depth is reached.

Code Mode: Orchestration at Scale

For 3+ servers, Code Mode solves the token problem differently.

Instead of exposing 100 tools directly, the gateway exposes three meta-tools:

listToolFiles – discover available servers
readToolFile – load TypeScript definitions on-demand
executeToolCode – run TypeScript that orchestrates everything

The AI writes one TypeScript script. All tool calls happen in a sandboxed VM. Only the final result returns to the LLM.

Real numbers from production:

Classic MCP with 5 servers: 6 LLM turns, 600+ tokens in tool definitions
Code Mode: 3-4 LLM turns, ~50 tokens in tool definitions
Result: 50% cost reduction, 40-50% faster execution

Code Mode Architecture

The gateway creates a virtual filesystem of TypeScript declarations:

servers/
  youtube.d.ts       ← all YouTube tools
  filesystem.d.ts    ← all filesystem tools
  database.d.ts      ← all database tools

The AI reads what it needs, writes coordinating code:

// Search YouTube, get top 3 videos, save to file
const results = await youtube.search({ query: "AI news", maxResults: 3 });
const titles = results.items.map(item => item.snippet.title);
await filesystem.write_file({ 
  path: "results.json", 
  content: JSON.stringify(titles) 
});
return { saved: titles.length };

This executes in a Goja VM sandbox with TypeScript transpilation, async/await support, and 30-second timeout protection. The LLM gets back a compact result instead of every intermediate step.

Security Model

Three layers of control:

Connection filtering: Choose which tools from each server are available. Use ["*"] for all tools, [] for none, or specify individual tools.

Execution approval: Even available tools don’t run automatically unless configured. Default is explicit execution.

Code validation: In Agent Mode with Code Mode, the gateway parses TypeScript code and checks every tool call against auto-execute lists before running.

Example safe config:

{
  "filesystem": {
    "tools_to_execute": ["*"],
    "tools_to_auto_execute": ["read_file", "list_directory"]
  },
  "database": {
    "tools_to_execute": ["*"],
    "tools_to_auto_execute": []
  }
}

Read operations auto-execute. Write/delete operations require approval.

When to Use Each Mode

Manual Mode:

1-2 simple MCP servers
Every operation needs review
Compliance requires approval trails

Agent Mode:

Mix of safe and dangerous tools
Want speed for reads, control for writes
Building interactive assistants

Code Mode:

3+ MCP servers connected
Complex multi-step workflows
Token costs or latency matter
Tools need to coordinate with each other

You can mix modes—enable Code Mode for heavy servers (web search, documents) while keeping small utilities as direct tools.

Connection Types in Practice

STDIO: Local tools like filesystem operations. Gateway spawns subprocesses and communicates via stdin/stdout:

{
  "name": "filesystem",
  "connection_type": "stdio",
  "stdio_config": {
    "command": "npx",
    "args": ["-y", "@anthropic/mcp-filesystem"]
  }
}

HTTP: Remote APIs and microservices. Standard HTTP requests:

{
  "name": "web_search",
  "connection_type": "http",
  "connection_string": "https://mcp-server.example.com/mcp"
}

SSE: Real-time data streams. Server-Sent Events for persistent connections:

{
  "name": "live_data",
  "connection_type": "sse",
  "connection_string": "https://stream.example.com/sse"
}

The gateway monitors all connections with health checks every 10 seconds. Disconnected clients can be reconnected via API without restarting.

Token Economics Example

Real scenario: E-commerce assistant with 10 MCP servers, 150 tools total.

Task: Find products, check inventory, compare prices, estimate shipping, create quote.

Classic MCP:

8-10 LLM turns
2,400 tokens in tool definitions per turn
4,000-5,000 avg request tokens
$3.20-4.00 total cost
18-25 seconds latency

Code Mode:

3-4 LLM turns
100-300 tokens in tool definitions per turn
1,500-2,000 avg request tokens
$1.20-1.80 total cost
8-12 seconds latency

The difference comes from keeping tool definitions out of context until needed, and executing all coordination logic in one sandbox call instead of multiple LLM round-trips.

Implementation

Bifrost provides this as open-source infrastructure. You can run it as a gateway (single binary), embed it via Go SDK, or deploy in Kubernetes.

The gateway sits between your app and LLM providers, routing both chat requests and MCP tool execution through one interface.

For production agents, this architecture gives you control over what tools can do, visibility into what they’re doing, and performance that scales with complexity instead of degrading.

Try it: https://github.com/maximhq/bifrost

The MCP gateway implementation handles connection management, security validation, and execution modes as infrastructure concerns so you can focus on building agents that work.

Aegis Software Completes Acquisition of Simio

January 27, 2026

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

How to Build Production AI Agents with an MCP Gateway

Aegis Software Completes Acquisition of Simio

Trending Tags

How to Build Production AI Agents with an MCP Gateway

The Problem with Direct MCP Connections

What an MCP Gateway Does

maximhq
/
bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost

The fastest way to build AI applications that never go down

Quick Start

Execution Modes Explained

Manual Mode: Full Control

Agent Mode: Controlled Autonomy

Code Mode: Orchestration at Scale

Code Mode Architecture

Security Model

When to Use Each Mode

Connection Types in Practice

Token Economics Example

Implementation

Leave a Reply Cancel reply

Previous Post

Aegis Software Completes Acquisition of Simio

How to Build Production AI Agents with an MCP Gateway

Aegis Software Completes Acquisition of Simio

Solved: Surveiller le cloud (GCP, AWS) avec Centreon? ou AlertManager?

How to Build Production AI Agents with an MCP Gateway

The Problem with Direct MCP Connections

What an MCP Gateway Does

maximhq / bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost

The fastest way to build AI applications that never go down

Quick Start

Execution Modes Explained

Manual Mode: Full Control

Agent Mode: Controlled Autonomy

Code Mode: Orchestration at Scale

Code Mode Architecture

Security Model

When to Use Each Mode

Connection Types in Practice

Token Economics Example

Implementation

Leave a Reply Cancel reply

Previous Post

Aegis Software Completes Acquisition of Simio

Related Posts

Achieving privacy compliance with your CI/CD: A guide for compliance teams

Creating a Synchronized Vertical and Horizontal Scrolling Component for Web Apps

Good bye “typescript-is” (ancestor of “typia”, 20,000x faster validator)

maximhq
/
bifrost