Software

3 minute read

Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything

May 25, 2026

Multi-tenant agents share infrastructure. When one user’s agent calls the web search tool 200 times in a minute, every other user’s agent slows down or gets errors.

Global rate limits protect the provider. Per-key rate limits protect your users from each other. agent-rate-fence is a sliding-window rate limiter where the key is whatever you want: user ID, session ID, tool name, or any combination.

The Shape of the Fix

from agent_rate_fence import RateFence, RateLimitExceeded

fence = RateFence(
    max_calls=10,
    window_seconds=60.0,
)

def rate_limited_search(user_id: str, query: str) -> list:
    try:
        with fence.allow(key=user_id):
            return web_search(query)
    except RateLimitExceeded as e:
        return {"error": f"Rate limit exceeded. Retry after {e.retry_after_seconds:.0f}s"}

Ten calls per user per 60-second window. The eleventh call raises RateLimitExceeded with a retry_after_seconds hint.

What It Does NOT Do

agent-rate-fence does not rate limit across processes or machines. It is in-process only. For distributed rate limiting across a fleet of workers, you need Redis or a similar shared store.

It does not differentiate between tool types. All calls through the same fence share the same counter for the same key. If you want different limits for different tools, create a fence per tool.

It does not queue requests that exceed the limit. Excess requests fail immediately. For a queueing approach, you need a task queue.

Inside the Library

Sliding window implementation using a deque of timestamps:

from collections import deque
import time
import threading

class RateFence:
    def __init__(self, max_calls: int, window_seconds: float):
        self._max = max_calls
        self._window = window_seconds
        self._calls: dict[str, deque] = {}
        self._lock = threading.Lock()

    @contextmanager
    def allow(self, key: str):
        with self._lock:
            now = time.monotonic()
            if key not in self._calls:
                self._calls[key] = deque()

            # Remove expired entries
            dq = self._calls[key]
            while dq and dq[0] < now - self._window:
                dq.popleft()

            if len(dq) >= self._max:
                oldest = dq[0]
                retry_after = self._window - (now - oldest)
                raise RateLimitExceeded(retry_after_seconds=retry_after)

            dq.append(now)

        yield

The sliding window is more accurate than a fixed window (which can allow 2x the limit at window boundaries). A deque per key is efficient: popleft() is O(1), and the deque only holds timestamps within the current window.

Thread safety: the entire check-and-append is under one lock. This prevents the TOCTOU race where two concurrent callers both see len(dq) < max and both proceed.

Cleanup: keys that have been idle for more than window_seconds can accumulate in self._calls. A background cleanup thread is optional. By default, the deque for an idle key stays in memory but empty, which is cheap.

When to Use It

Use it for multi-tenant agents where users share tool infrastructure. Web search, database queries, external API calls — any tool where one user consuming excessive capacity affects others.

Use it by tool category, not just by user. A user making 10 web searches per minute might be fine. A user making 10 DELETE operations per minute might not be. Create a fence per tool category with different limits.

Use it for cost attribution and control. Pairing per-user rate limiting with per-user cost tracking gives you a complete picture of resource consumption per tenant.

Skip it for single-user agents or closed environments where all agent instances belong to the same user. The overhead is not worth it.

Install

pip install git+https://github.com/MukundaKatta/agent-rate-fence

from agent_rate_fence import RateFence, RateLimitExceeded

# Different limits for different tool categories
search_fence = RateFence(max_calls=20, window_seconds=60.0)
write_fence = RateFence(max_calls=5, window_seconds=60.0)
delete_fence = RateFence(max_calls=2, window_seconds=300.0)

FENCES = {
    "search_web": search_fence,
    "create_record": write_fence,
    "update_record": write_fence,
    "delete_record": delete_fence,
}

def execute_tool(tool_name: str, user_id: str, args: dict):
    fence = FENCES.get(tool_name)
    if fence:
        try:
            with fence.allow(key=user_id):
                return call_tool(tool_name, args)
        except RateLimitExceeded as e:
            return {
                "error": "rate_limit_exceeded",
                "retry_after": e.retry_after_seconds,
            }
    return call_tool(tool_name, args)

Sibling Libraries

Library	What it solves
`llm-rate-limit-bucket`	Token-bucket rate limiter for LLM API calls
`token-budget-pool`	Shared USD/token budget across concurrent agents
`llm-cost-cap`	Per-call cost gate
`llm-batch-coalesce`	Collapse duplicate calls from concurrent callers
`agent-deadline`	Time-bound agent execution

The multi-tenant stack: agent-rate-fence per user per tool, token-budget-pool for shared fleet budget, llm-cost-cap for individual call bounds.

What's Next

Redis backend for distributed rate limiting is the most requested missing feature. The interface would stay the same — RateFence(max_calls=10, window=60, backend=RedisBackend(url="redis://...")) — but the deque would live in Redis with atomic Lua scripts for check-and-append.

Burst allowance: RateFence(max_calls=10, window=60, burst=20) would allow up to 20 calls in a short burst before rate limiting kicks in, while still enforcing the 10-per-60s average. This matches how most API rate limits actually work.

Per-key limit overrides: some users legitimately need higher limits (enterprise tiers). A fence.set_limit(key="enterprise-user-1", max_calls=100) would allow per-key overrides without a separate fence.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.

Pope Leo XIV’s AI Encyclical: What Builders Must Know (2026)

May 25, 2026

Software

VCP-Virtual Private Cloud

May 26, 2026

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Why I Keep Shipping Small Tools Instead of One Big Product

I Let an Agent Take Over an Account With Every Permission Check Green

We Built a Signal Protocol Messenger. Then We Checked If It Was Legal in 5 Jurisdictions.

Trending Tags

Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything

The Shape of the Fix

What It Does NOT Do

Inside the Library

When to Use It

Install

Sibling Libraries

What's Next

Leave a Reply Cancel reply

Previous Post

Pope Leo XIV’s AI Encyclical: What Builders Must Know (2026)

Next Post

VCP-Virtual Private Cloud

Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything

The Shape of the Fix

What It Does NOT Do

Inside the Library

When to Use It

Install

Sibling Libraries

What's Next

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts