I Benchmarked Lynkr Against LiteLLM on the Same Backends. Lynkr Was Cheaper for Tool-Heavy Workloads
Founder disclosure: I built Lynkr, so take this as a technical benchmark write-up, not a neutral industry report. The numbers below come from the same backend providers on both gateways.
If you’re routing AI coding traffic through a gateway, just switching providers is not enough. The real savings come from reducing the tokens that ever reach the model in the first place.
I ran Lynkr and LiteLLM against the same backends — Ollama locally, Moonshot, and Azure OpenAI — across 9 scenarios. On the scenarios that actually look like agentic coding work, Lynkr was cheaper because it does three things before forwarding the request upstream: smart tool selection, TOON compression, and semantic caching.
The short version
Lynkr was measurably better on the cost-sensitive parts of the workload:
- Smart tool selection: 53% fewer input tokens, 52% lower cost
- TOON JSON compression: 87.6% fewer billed tokens on a large tool result, 50% lower cost
- Semantic cache: 171ms cache-hit response vs 3,282ms on the repeat query path
- Tier routing: escalated hard prompts to stronger models instead of blindly sending everything to the cheapest route
This matters if you’re running Claude Code, Codex, Cursor, or similar agent workflows where tools, file reads, grep output, and repeated context dominate your token bill.
Setup
Same benchmark inputs, same providers, same request shape.
- Machine: macOS on Apple Silicon
- Lynkr: v9.3.2 on Node 20
- LiteLLM: v1.87.1 on Python 3.12
- Backends used: Ollama local, Moonshot, Azure OpenAI
- Scenarios: 9 total across simple prompts, tools, history, cache, and routing
Each scenario sent the same HTTP request to both gateways at POST /v1/messages.
Where Lynkr wins
1) Smart tool selection
A lot of coding requests are read-only, but the model still gets handed the full tool universe: write, edit, bash, git, file ops, everything.
Lynkr classifies the request first and strips irrelevant tool schemas before forwarding upstream. So a read-only question does not pay to carry write-capable tools.
Benchmark setup: 14 tool definitions attached to every request, which is pretty realistic for a Claude Code or Cursor style session.
- Lynkr: 959 billed input tokens, $0.0044
- LiteLLM: 2,085 billed input tokens, $0.0091
Result: 53% fewer input tokens and 52% lower cost on the same model and prompt.
This is the kind of optimization that compounds because it happens before every downstream model call.
2) TOON compression for tool results
Tool-heavy workflows often blow up because of structured JSON, not because the user wrote a long prompt.
Lynkr’s TOON path compresses large JSON payloads before they hit the provider. Plain text goes through unchanged. The useful effect is that file reads, grep arrays, tool traces, and other structured outputs stop dominating the request.
Benchmark setup: a Bash tool returning 60 grep results as a JSON array, roughly 3,400 tokens unoptimized.
- Lynkr: 427 billed input tokens, $0.009, 12s latency
- LiteLLM: 3,458 billed input tokens, $0.018, 12s latency
Result: 87.6% token reduction and 50% lower cost at the same latency.
That last part matters. This was not a tradeoff where cost improved because the request got slower. Compression happened in-process and the wall-clock result stayed flat.
3) Semantic cache
The easiest cheap request is the one that never reaches the model.
Lynkr computes embeddings for the incoming prompt and returns a cached response when a semantically similar request shows up again. In the benchmark, the second prompt was just a paraphrase of the first:
- “Explain TCP vs UDP”
- “What is the difference between TCP and UDP?”
Cold run vs cache hit
- Lynkr cold: 2,857 tokens, 1,891ms
- Lynkr cache hit: served from cache in 171ms
- LiteLLM repeat path: 54 tokens, 3,282ms
The important part is not just token avoidance. The response time dropped from 1.9s to 171ms, about 11x faster.
For interactive tooling, that difference is felt immediately.
4) Tier routing that looks at complexity, not just price
LiteLLM has routing. But in this benchmark configuration it was using cost-based-routing, which means the gateway optimizes for cheap first.
That works for simple questions. It breaks when the prompt genuinely needs a stronger model.
Lynkr scores requests across 15 dimensions — token size, reasoning markers, code complexity, risk signals, and agentic traits — then routes automatically.
In the benchmark:
-
Simple prompt: “What does git stash do?”
- Lynkr routed to
minimax-m2.5 - LiteLLM routed to local Ollama
- Lynkr routed to
-
Complex prompt: JWT vs cookies security analysis for a banking architecture
- Lynkr escalated to
moonshot-v1-auto - LiteLLM still sent it to local Ollama
- Lynkr escalated to
That is the difference between “cheap by default” and “cheap when appropriate.”
Why this benchmark matters more than a generic proxy comparison
A lot of gateway comparisons collapse into “who can talk to more providers.” That is table stakes now.
The more important question is:
What does the gateway do to reduce spend before the request hits the model?
That is where Lynkr is different in practice.
It stacks three cost levers:
- Tool pruning so irrelevant tool schemas do not ride along
- TOON compression so large structured tool output stops inflating prompts
- Semantic cache so repeated or near-repeated requests do not call the model again
Then it adds tier routing on top, so the remaining requests go to the right model for the job.
That stack is why the benchmark result is interesting. It is not just “Lynkr can route too.” It is that Lynkr changes the size and shape of the request before routing even happens.
Cost projection at 100,000 requests/month
Using the large JSON tool-result test as a representative tool-heavy scenario:
- LiteLLM: about $818/month
- Lynkr: about $409/month
So on equal footing, same backend, same model class, Lynkr came out roughly 50% cheaper.
That is the distinction I’d care about if I were evaluating an LLM gateway for coding agents. Not whether the gateway has another provider adapter, but whether it reduces the number of tokens my provider ever sees.
What about Portkey?
Portkey is good at a different layer of the stack.
It is stronger on managed observability, prompt management, and governance. But this benchmark was not measuring dashboarding or policy UX. It was measuring request-path optimization.
On that axis, Lynkr is doing something Portkey does not really center on:
- automatic complexity detection
- semantic caching
- token compression
- drop-in routing for coding-tool workloads
So I would not frame this as “Portkey but cheaper.” They solve different primary problems.
Important caveats
To keep this honest, there are a few things worth stating clearly.
1) This is not a neutral benchmark
I built Lynkr. So the burden is on me to be explicit about methodology and where the numbers come from.
2) LiteLLM can look cheaper in headline totals
If LiteLLM routes everything to a free local model, the raw total can look lower. But that is not the useful comparison.
The fair comparison is same backend, same prompt, same model class. On those apples-to-apples paths, Lynkr was cheaper because it sent fewer tokens upstream.
3) Lynkr adds system-level context
In this benchmark, Lynkr injected a system prompt with memory and agent instructions, which added about 2,800 tokens of overhead in some scenarios. That is why comparing estimated raw request size to billed tokens can be misleading.
The correct comparison is billed tokens between Lynkr and LiteLLM on the same scenario.
Who this is for
Lynkr is for teams running things like:
- Claude Code
- Codex
- Cursor
- Hermes
- custom agents using an OpenAI-compatible endpoint
If your real problem is reducing spend on coding workflows without rewriting client-side integrations, the benchmark result is pretty simple:
Lynkr wins when the workload includes tools, structured outputs, repeated prompts, and mixed-complexity requests.
That is exactly what real coding-agent traffic looks like.
Reproducibility
The benchmark script is reproducible from the Lynkr repo root:
node benchmark-tier-routing.js
Versions used in this run:
- Lynkr v9.3.2
- LiteLLM v1.87.1
Final takeaway
If all you want is a gateway that forwards requests, Lynkr is not interesting.
If you want a gateway that makes coding traffic cheaper before it reaches the model, that is where Lynkr starts to separate.
The three levers that mattered in this benchmark were:
- tool selection
- TOON compression
- semantic cache
And on top of that, tier routing kept the hard prompts from being sent to the wrong model just because it was cheaper.
If you want to dig into it, the repo is here:
GitHub: https://github.com/Fast-Editor/Lynkr
If you test it against your own coding workload, I would genuinely like to know where it holds up and where it doesn’t.