API Rate Limiting Cheat Sheet


Gateway-level rate limiting

  • Gateway-level rate limiting is a popular approach to rate limiting that allows developers to set rate limits at the gateway level.
  • Gateway-level rate limiting is typically implemented in API gateways such as Kong, Google’s Apigee, or Amazon API Gateway.
  • Gateway-level rate limiting can provide simple and effective rate limiting, but may not offer as much fine-grained control as other approaches.

Token bucket algorithm

  • The token bucket algorithm is a popular rate limiting algorithm that involves allocating tokens to API requests.
  • The tokens are refilled at a set rate, and when an API request is made, it must consume a token.
  • If there are no tokens available, the request is rejected.
  • The token bucket algorithm is commonly used in many rate limiting libraries and tools, such as rate-limiter, redis-rate-limiter, and the Google Cloud Endpoints.

More: Token Bucket vs Bursty Rate Limiter by @animir

Leaky bucket algorithm

  • The leaky bucket algorithm is similar to the token bucket algorithm, but instead of allocating tokens, API requests are added to a “bucket” at a set rate.
  • If the bucket overflows, the requests are rejected.
  • The leaky bucket algorithm can be useful for smoothing out request bursts, and for ensuring that requests are processed at a consistent rate.

Sliding window algorithm

  • The sliding window algorithm is a rate limiting approach that involves tracking the number of requests made in a sliding window of time.
  • If the number of requests exceeds a set limit, further requests are rejected.
  • The sliding window algorithm is commonly used in many rate limiting libraries and tools, such as Django Ratelimit, Express Rate Limit, and the Kubernetes Rate Limiting.

More: Rate limiting using the Sliding Window algorithm by @satrobit

Distributed rate limiting

  • For high-traffic APIs, it may be necessary to implement rate limiting across multiple servers.
  • Distributed rate limiting algorithms such as Redis-based rate limiting or Consistent Hashing-based rate limiting can be used to implement rate limiting across multiple servers.
  • Distributed rate limiting can help to ensure that rate limiting is consistent across multiple servers, and can help to reduce the impact of traffic spikes.

In this example, we’ll create a simple Next.js application with a rate-limited API endpoint using Redis and Upstash. Upstash is a serverless Redis database provider that allows you to interact with Redis easily and cost-effectively.

First, let’s create a new Next.js project:

npx create-next-app redis-rate-limit-example
cd redis-rate-limit-example

Install the required dependencies:

npm install upstash-redis@0.4.4 ioredis@4.27.6 express-rate-limit@5.3.0

Create a .env.local file in the project root to store your Upstash Redis credentials:


Replace your_upstash_redis_url_here with your actual Upstash Redis URL.

Create a new API route in pages/api/limited.js:

import { connectRedis } from '../../lib/redis';
import rateLimit from 'express-rate-limit';
import { createError } from 'micro';

const redisClient = connectRedis();

const rateLimiter = rateLimit({
  store: new RedisStore({
    client: redisClient,
  windowMs: 60 * 1000, // 1 minute
  max: 5, // limit each IP to 5 requests per minute
  handler: (req, res) => {
    res.status(429).json({ message: 'Too many requests, please try again later.' });

export default async function handler(req, res) {
  try {
    await rateLimiter(req, res);
  } catch (error) {
    if (error instanceof createError.HttpError) {
      return res.status(error.statusCode).json({ message: error.message });
    res.status(500).json({ message: 'Internal server error' });

  res.status(200).json({ message: 'Success! Your request was not rate-limited.' });

export const config = {
  api: {
    bodyParser: false,

Create a lib/redis.js file to handle Redis connections:

import Redis from 'ioredis';

let cachedRedis = null;

export function connectRedis() {
  if (cachedRedis) {
    return cachedRedis;

  const redis = new Redis(process.env.UPSTASH_REDIS_URL);
  cachedRedis = redis;
  return redis;

Create a new RedisStore class in lib/redis-store.js:

import { connectRedis } from './redis';

export class RedisStore {
  constructor({ client } = {}) {
    this.redis = client || connectRedis();

  async get(key) {
    const data = await this.redis.get(key);
    return JSON.parse(data);

  async set(key, value, ttl) {
    await this.redis.set(key, JSON.stringify(value), 'EX', ttl);

  async resetKey(key) {
    await this.redis.del(key);

Now you can test your rate-limited API endpoint by starting the development server:

npm run dev

Visit http://localhost:3000/api/limited in your browser or use a tool like Postman or curl to make requests. You should see the Success! Your request was not rate-limited. message. If you make more than 5 requests within a minute, you’ll receive the rate limit message:

Too many requests, please try again later.

User-based rate limiting

  • Some APIs may require rate limiting at the user level, rather than the IP address or client ID level.
  • User-based rate limiting involves tracking the number of requests made by a particular user account, and limiting requests if the user exceeds a set limit.
  • User-based rate limiting is commonly used in many API frameworks, such as Django Rest Framework, and can be implemented using session-based or token-based authentication.

API key rate limiting

  • For APIs that require authentication with an API key, rate limiting can be implemented at the API key level.
  • API key rate limiting involves tracking the number of requests made with a particular API key, and limiting requests if the key exceeds a set limit.
  • API key rate limiting is commonly used in many API frameworks, such as Flask-Limiter, and can be implemented using API key-based authentication.

Custom rate limiting

  • Finally, it’s worth noting that there are many other rate limiting approaches that can be customized to suit the needs of a particular API.
  • Some examples include adaptive rate limiting, which adjusts the rate limit based on the current traffic load, and request complexity-based rate limiting, which takes into account the complexity of individual requests when enforcing rate limits.
  • Custom rate limiting approaches can be useful for optimizing the rate limiting strategy for a specific API use case.

For my latest project Pub Index API I am making use of an API gateway for rate-limiting.

More: RESTful API Design Cheatsheet

