🚀 How We Reduced API Response Time by 60% in Node.js Using Caching, Query Optimization & Performance Profiling

Performance issues in backend systems rarely come from one obvious bug.

In our production system, a high-traffic API showed:

  • 📉 P95 latency ~ 820ms
  • 📈 Frequent database CPU spikes
  • ⛔ Increased timeout rates during traffic bursts

After a structured optimization process, we reduced:

  • ⚡ P95 latency: 820ms → 310ms
  • 🧠 Database CPU usage reduced by ~40%
  • 📊 Timeout rate reduced by ~70%

Here’s the exact breakdown of what we did.

1️⃣ Step One: Measure Before Optimizing

Before touching any code, we collected:

  • P50 / P95 / P99 latency
  • Slow query logs
  • DB execution plans
  • CPU & memory metrics
  • Event loop lag

We added a lightweight response-time logger:

app.use((req, res, next) => {
  const start = process.hrtime.bigint();

  res.on("finish", () => {
    const duration =
      Number(process.hrtime.bigint() - start) / 1_000_000;

    console.log(`${req.method} ${req.url} - ${duration.toFixed(2)}ms`);
  });

  next();
});

🔎 Observation

  • ~68% latency was database time
  • ~20% was repeated identical queries
  • Remaining was serialization + network overhead

Conclusion:
🧩 Primary bottleneck = Database layer.

2️⃣ Query Optimization (Biggest Impact 🔥)

❌ Problem 1: Missing Composite Index

Original query:

SELECT * 
FROM orders
WHERE user_id = $1
AND status = 'completed'
ORDER BY created_at DESC
LIMIT 20;

Execution plan showed:

  • Sequential scan
  • High disk I/O
  • Large row filtering

✅ Fix: Composite Index

CREATE INDEX idx_orders_user_status_created
ON orders(user_id, status, created_at DESC);

📊 Result

  • Query time: 180ms → 35ms
  • Removed full table scan
  • Reduced disk reads significantly

Indexing alone reduced endpoint latency by ~25%.

❌ Problem 2: Over-Fetching Data

Original:

SELECT * FROM users WHERE id = $1;

But API only needed:

  • name
  • profile_picture

✅ Optimized Query

SELECT name, profile_picture
FROM users
WHERE id = $1;

*🎯 Impact *

  • Reduced payload size
  • Lower memory allocation
  • Faster JSON serialization

Small change. Measurable gain.

❌ Problem 3: N+1 Query Pattern

Original logic:

const orders = await getOrders(userId);

for (const order of orders) {
  order.items = await getItems(order.id);
}

Under 20 orders → 21 queries.

✅ Optimized Join Query

SELECT o.id, o.created_at, i.product_id, i.quantity
FROM orders o
LEFT JOIN order_items i ON i.order_id = o.id
WHERE o.user_id = $1;

📈 Result

Drastically reduced DB round trips

Improved P95 latency stability

3️⃣ Introducing Multi-Layer Caching 🧠

Important principle:

⚠️ Do NOT cache blindly.
Cache only:

  • Read-heavy endpoints
  • Expensive aggregations
  • Low-volatility data

🟢 Layer 1: In-Memory Cache (Short TTL)

Used for ultra-hot endpoints.

import NodeCache from "node-cache";

const cache = new NodeCache({ stdTTL: 30 });

async function getUserProfile(userId) {
  const key = `user:${userId}`;

  const cached = cache.get(key);
  if (cached) return cached;

  const data = await fetchFromDB(userId);
  cache.set(key, data);

  return data;
}

Use case:

  • Frequently accessed profile endpoints
  • Dashboard metadata

Latency improvement: ~10–15%

🟡 Layer 2: Redis Distributed Cache

Used for:

  • Aggregated stats
  • Leaderboards
  • Expensive computations
async function getDashboardStats(userId) {
  const key = `dashboard:${userId}`;

  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);

  const data = await computeStats(userId);

  await redis.set(key, JSON.stringify(data), "EX", 120);

  return data;
}

📊 Impact

  • Significant DB load reduction
  • Improved P95 & P99 latency
  • Reduced spike sensitivity

4️⃣ Cache Invalidation Strategy ⚠️

Caching without invalidation creates stale data problems.

We used:

  • Event-based invalidation after writes
  • Short TTL for volatile data
  • Versioned keys when necessary

Example:

await redis.del(`dashboard:${userId}`);

Triggered immediately after order creation.

We avoided:

  • Long-lived static caches
  • Global flushes
  • Blind TTL-only approaches

5️⃣ Connection Pool Optimization 🔌

We observed:

  • DB pool exhaustion during spikes
  • Increased wait time for connections

Original pool size: 10
Optimized to: 25 (after validating DB capacity)

📈 Result

  • Reduced queuing delay
  • Stabilized latency under burst traffic

6️⃣ JSON Serialization Optimization ⚡

Large nested objects increased serialization cost.

Instead of returning deeply populated objects, we:

  • Reduced unnecessary fields
  • Flattened response structure
  • Avoided over-population

Serialization overhead dropped ~8–10%.

7️⃣ Final Optimized Flow 🏗️

Client
  ↓
Load Balancer
  ↓
Node.js (Clustered)
  ↓
In-Memory Cache
  ↓
Redis
  ↓
Optimized Indexed Queries
  ↓
Database

Thanks :)…..

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

Marposs Corporation Announces Matteucci as President

Next Post

LiteRT: The Universal Framework for On-Device AI

Related Posts