MCP servers are everywhere now; filesystem tools, web search, databases, Slack integrations. But connecting AI directly to these servers creates problems at scale: token bloat, security risks, and orchestration chaos.
An MCP gateway sits between your LLM and these servers, handling security, execution, and performance. Here’s what you need to know about building agents with a gateway.
The Problem with Direct MCP Connections
When Claude or GPT-4 connects directly to MCP servers, every request carries all tool definitions in context. Connect 5 servers with 100 tools? Every single LLM call includes 100 tool definitions—even for simple queries.
This creates three issues:
Token waste: Most of your context budget goes to tool catalogs instead of actual work. A 6-turn conversation with 100 tools burns through 600+ tokens just on definitions.
Security gaps: Tools execute without validation or approval. No audit trail, no user confirmation, no safety checks before destructive operations.
Coordination overhead: Each tool call requires a separate LLM round-trip. Fetching 5 pieces of data means 5 LLM calls, not one intelligent workflow.
What an MCP Gateway Does
A gateway like Bifrost provides a control plane for MCP:
Connection management: Connect to servers via STDIO, HTTP, or SSE protocols. The gateway discovers tools, maintains connections, and monitors health every 10 seconds.
Security layer: Tool calls from LLMs are suggestions, not commands. You review and approve each execution. Full audit trails for compliance.
Three execution modes: Manual (approve each tool), Agent Mode (auto-execute specific tools), or Code Mode (AI writes TypeScript to orchestrate everything).

maximhq
/
bifrost
Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
Bifrost
The fastest way to build AI applications that never go down
Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.
Quick Start
Go from zero to production-ready AI gateway in under a minute.
Step 1: Start Bifrost Gateway
# Install and run locally
npx -y @maximhq/bifrost
# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Step 2: Configure via Web UI
# Open the built-in web interface
open http://localhost:8080
Step 3: Make your first API call
curl -X POST http://localhost:8080/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, Bifrost!"}]
}'
That’s it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…
Execution Modes Explained
Manual Mode: Full Control
Default behavior. LLM suggests tools, you execute them explicitly:
# 1. Chat request
POST /v1/chat/completions
→ LLM returns tool call suggestions
# 2. Your approval logic here
# Check permissions, validate args, get user OK
# 3. Execute approved tools
POST /v1/mcp/tool/execute
# 4. Continue conversation
POST /v1/chat/completions with results
No tools run without your explicit API call. Perfect for sensitive operations.
Agent Mode: Controlled Autonomy
Configure which tools can auto-execute. The gateway runs approved tools automatically:
{
"tools_to_execute": ["*"],
"tools_to_auto_execute": ["read_file", "list_directory", "search"]
}
Safe operations (read, search) run autonomously. Dangerous operations (write, delete) still need approval.
Agent Mode runs tools in parallel for speed, continues for up to 10 iterations by default, and stops when no more tool calls are needed or max depth is reached.
Code Mode: Orchestration at Scale
For 3+ servers, Code Mode solves the token problem differently.
Instead of exposing 100 tools directly, the gateway exposes three meta-tools:
-
listToolFiles– discover available servers -
readToolFile– load TypeScript definitions on-demand -
executeToolCode– run TypeScript that orchestrates everything
The AI writes one TypeScript script. All tool calls happen in a sandboxed VM. Only the final result returns to the LLM.
Real numbers from production:
- Classic MCP with 5 servers: 6 LLM turns, 600+ tokens in tool definitions
- Code Mode: 3-4 LLM turns, ~50 tokens in tool definitions
- Result: 50% cost reduction, 40-50% faster execution
Code Mode Architecture
The gateway creates a virtual filesystem of TypeScript declarations:
servers/
youtube.d.ts ← all YouTube tools
filesystem.d.ts ← all filesystem tools
database.d.ts ← all database tools
The AI reads what it needs, writes coordinating code:
// Search YouTube, get top 3 videos, save to file
const results = await youtube.search({ query: "AI news", maxResults: 3 });
const titles = results.items.map(item => item.snippet.title);
await filesystem.write_file({
path: "results.json",
content: JSON.stringify(titles)
});
return { saved: titles.length };
This executes in a Goja VM sandbox with TypeScript transpilation, async/await support, and 30-second timeout protection. The LLM gets back a compact result instead of every intermediate step.
Security Model
Three layers of control:
Connection filtering: Choose which tools from each server are available. Use ["*"] for all tools, [] for none, or specify individual tools.
Execution approval: Even available tools don’t run automatically unless configured. Default is explicit execution.
Code validation: In Agent Mode with Code Mode, the gateway parses TypeScript code and checks every tool call against auto-execute lists before running.
Example safe config:
{
"filesystem": {
"tools_to_execute": ["*"],
"tools_to_auto_execute": ["read_file", "list_directory"]
},
"database": {
"tools_to_execute": ["*"],
"tools_to_auto_execute": []
}
}
Read operations auto-execute. Write/delete operations require approval.
When to Use Each Mode
Manual Mode:
- 1-2 simple MCP servers
- Every operation needs review
- Compliance requires approval trails
Agent Mode:
- Mix of safe and dangerous tools
- Want speed for reads, control for writes
- Building interactive assistants
Code Mode:
- 3+ MCP servers connected
- Complex multi-step workflows
- Token costs or latency matter
- Tools need to coordinate with each other
You can mix modes—enable Code Mode for heavy servers (web search, documents) while keeping small utilities as direct tools.
Connection Types in Practice
STDIO: Local tools like filesystem operations. Gateway spawns subprocesses and communicates via stdin/stdout:
{
"name": "filesystem",
"connection_type": "stdio",
"stdio_config": {
"command": "npx",
"args": ["-y", "@anthropic/mcp-filesystem"]
}
}
HTTP: Remote APIs and microservices. Standard HTTP requests:
{
"name": "web_search",
"connection_type": "http",
"connection_string": "https://mcp-server.example.com/mcp"
}
SSE: Real-time data streams. Server-Sent Events for persistent connections:
{
"name": "live_data",
"connection_type": "sse",
"connection_string": "https://stream.example.com/sse"
}
The gateway monitors all connections with health checks every 10 seconds. Disconnected clients can be reconnected via API without restarting.
Token Economics Example
Real scenario: E-commerce assistant with 10 MCP servers, 150 tools total.
Task: Find products, check inventory, compare prices, estimate shipping, create quote.
Classic MCP:
- 8-10 LLM turns
- 2,400 tokens in tool definitions per turn
- 4,000-5,000 avg request tokens
- $3.20-4.00 total cost
- 18-25 seconds latency
Code Mode:
- 3-4 LLM turns
- 100-300 tokens in tool definitions per turn
- 1,500-2,000 avg request tokens
- $1.20-1.80 total cost
- 8-12 seconds latency
The difference comes from keeping tool definitions out of context until needed, and executing all coordination logic in one sandbox call instead of multiple LLM round-trips.
Implementation
Bifrost provides this as open-source infrastructure. You can run it as a gateway (single binary), embed it via Go SDK, or deploy in Kubernetes.
The gateway sits between your app and LLM providers, routing both chat requests and MCP tool execution through one interface.
For production agents, this architecture gives you control over what tools can do, visibility into what they’re doing, and performance that scales with complexity instead of degrading.
Try it: https://github.com/maximhq/bifrost
The MCP gateway implementation handles connection management, security validation, and execution modes as infrastructure concerns so you can focus on building agents that work.

