Cursor Rules for MongoDB: The Complete Guide to AI-Assisted MongoDB Development
MongoDB is the database where “schemaless” gets sold as a feature for the first six months and confessed as a regret for the next six years. The first incident is always the same: a single field that started as a string in the user document — country: "Argentina" — got written as an object three releases later — country: { code: "AR", name: "Argentina" } — because nobody enforced the shape, and now half the dashboard queries $match: { "country.code": "AR" } returns zero results for the older half of the user base. The second incident is a query against a 50-million-document collection that does a COLLSCAN because the index { created_at: -1 } doesn’t match the query { tenant_id: ..., created_at: -1 } and somebody assumed MongoDB would “just figure it out.” The third is an $lookup inside an aggregation that joins a million documents to a million documents and runs out of the 100MB stage memory limit at 03:14 on a Saturday.
Then you add an AI assistant.
Cursor and Claude Code were trained on MongoDB code that spans fifteen years — Mongoose tutorials from 2014 with Schema.Types.Mixed everywhere, raw pymongo from before transactions existed, callback-based Node drivers, find().toArray() over unbounded collections, aggregation pipelines copy-pasted from Stack Overflow with $lookup followed by $unwind followed by $group and no thought about index usage, _id as an auto-incrementing integer because someone “didn’t like ObjectId,” and connection-per-request patterns that destroy the connection pool. Ask for “a feed query that returns posts with their author and recent comments,” and you get an aggregation that does two $lookup joins on collections without compound indexes, returns the entire document tree with no projection, and runs in 8 seconds against a million-row collection. It runs. It is not the MongoDB you should ship in 2026.
The fix is .cursorrules — one file in the repo that tells the AI what idiomatic modern MongoDB looks like. Eight rules below, each with the failure mode, the rule, and a before/after. Copy-paste .cursorrules at the end. Examples use the Node.js driver and Mongoose conventions, but the rules apply equally to PyMongo, the Go driver, and the Rust driver.
How Cursor Rules Work for MongoDB Projects
Cursor reads project rules from two locations: .cursorrules (a single file at the repo root, still supported) and .cursor/rules/*.mdc (modular files with frontmatter, recommended for any non-trivial project). For MongoDB I recommend modular rules so the OLTP collection conventions don’t bleed into analytical-collection patterns and per-collection invariants are documented near the queries:
.cursor/rules/
mongo-schema.mdc # JSON Schema validation, embedded vs referenced
mongo-indexes.mdc # ESR rule, compound indexes, partial / sparse
mongo-queries.mdc # find() projection, hints, no $where, cursor
mongo-aggregation.mdc # pipeline order, $match early, allowDiskUse
mongo-transactions.mdc # session, writeConcern, retryWrites
mongo-driver.mdc # connection pool, bulkWrite, change streams
mongo-testing.mdc # real Mongo (testcontainers), no mocking driver
Frontmatter controls activation: globs: ["**/*.{ts,js,py}", "**/*.aggregation.json"] with alwaysApply: false. Now the rules.
Rule 1: Schema Validation in the Database — JSON Schema, $jsonSchema Validators, Not “Just Trust the App”
The most common AI failure in MongoDB is “the application validates the shape.” Cursor returns a Mongoose schema with mongoose.Schema.Types.Mixed, a Pydantic model that allows extra = "allow", or — worst case — a raw db.collection("users").insertOne(req.body) with no shape check at all. Three releases later you have ten thousand documents with country as a string, ten thousand with country as an object, and a hundred with country: null because somebody saved a partially-filled form. The fix is layered: validate at the application boundary AND at the database via $jsonSchema collection validators, so any insert that bypasses your code (a one-off mongosh script, a migration, a different microservice) still fails fast.
The rule:
Two layers of schema validation:
LAYER 1 — application boundary: Mongoose schemas (TS/JS) or Pydantic v2
models (Python) with `extra='forbid'` / `strict: true`. Every required
field is required; every optional field has an explicit default.
LAYER 2 — database collection validators: `$jsonSchema` validator on
EVERY collection that holds business data. Set `validationLevel: "strict"`
and `validationAction: "error"` once you've migrated existing data;
during migrations use `"warn"` to log violations.
Example collection creation:
db.createCollection("orders", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["_id", "tenant_id", "customer_id", "status", "total", "created_at"],
properties: {
_id: { bsonType: "objectId" },
tenant_id: { bsonType: "objectId" },
customer_id: { bsonType: "objectId" },
status: { enum: ["pending","paid","shipped","cancelled","refunded"] },
total: { bsonType: "decimal" },
created_at: { bsonType: "date" },
line_items: {
bsonType: "array",
minItems: 1,
items: {
bsonType: "object",
required: ["product_id", "qty", "price"],
properties: {
product_id: { bsonType: "objectId" },
qty: { bsonType: "int", minimum: 1 },
price: { bsonType: "decimal" },
},
additionalProperties: false,
},
},
},
additionalProperties: false,
},
},
validationLevel: "strict",
validationAction: "error",
});
Money is `decimal128`, never `double`. Timestamps are `Date`, never
strings. ObjectId references are typed `objectId`, never strings.
`additionalProperties: false` is mandatory on every embedded shape —
without it, every typo lands as a new field.
Schema changes are migrations: a script that applies the new validator,
backfills missing fields with defaults, then re-applies in `"strict"`
mode. The migration is in `migrations/` and tracked.
Before — Mongoose with Mixed, no DB validator, drift inevitable:
const Order = new mongoose.Schema({
customer_id: mongoose.Schema.Types.Mixed,
status: String,
total: Number,
line_items: mongoose.Schema.Types.Mixed,
});
customer_id is sometimes a string, sometimes an ObjectId. total loses precision past 9 quadrillion cents. line_items is whatever shape you wrote that day.
After — typed schema + DB validator that the driver enforces:
const lineItemSchema = new mongoose.Schema({
product_id: { type: mongoose.Schema.Types.ObjectId, required: true, ref: "Product" },
qty: { type: Number, required: true, min: 1, validate: Number.isInteger },
price: { type: mongoose.Schema.Types.Decimal128, required: true },
}, { _id: false });
const orderSchema = new mongoose.Schema({
tenant_id: { type: mongoose.Schema.Types.ObjectId, required: true, index: true },
customer_id: { type: mongoose.Schema.Types.ObjectId, required: true, ref: "Customer" },
status: { type: String, required: true,
enum: ["pending","paid","shipped","cancelled","refunded"],
default: "pending" },
total: { type: mongoose.Schema.Types.Decimal128, required: true },
line_items: { type: [lineItemSchema], required: true,
validate: (v: unknown[]) => v.length > 0 },
created_at: { type: Date, required: true, default: () => new Date() },
}, {
strict: "throw",
versionKey: false,
});
// migrations/2026-04-add-order-validator.ts
await db.command({
collMod: "orders",
validator: { $jsonSchema: orderJsonSchema },
validationLevel: "strict",
validationAction: "error",
});
The application rejects bad shape before the driver call. The database rejects it again if anyone bypasses the application. Drift is impossible.
Rule 2: Intentional Denormalization — Embed Children You Always Read, Reference Children You Sometimes Read
The “embed or reference?” question is the only document-modeling question that matters, and Cursor gets it wrong by defaulting to whatever the prompt suggests. Ask for “a blog post with comments” and you get comments embedded inside the post — fine for ten comments, broken at a hundred (16MB document limit), broken even sooner under the write contention of every comment incrementing the same document. Ask for “a user with their orders” and you get orders referenced via user_id, requiring a second query for every “show this user’s recent orders” view. The decision is mechanical: embed when child docs are read together with the parent and bounded in count; reference when they’re read independently or unbounded.
The rule:
Embed when ALL of:
- The child is always (or almost always) read with the parent.
- The child is bounded — you can prove a hard upper limit (line items
in an order: ~50; address book entries: ~100). If you cannot prove
a bound, do not embed.
- The child does not need to be queried independently of the parent
(no "find all comments by user X across all posts").
- The total document size will stay <100KB even at the bound (16MB
is the hard limit; 100KB is the soft limit for healthy reads).
Reference when ANY of:
- The child is unbounded (comments on a post, events on an aggregate).
- The child is read independently of the parent (find a post by id).
- The child has its own write hotspot (comments arrive faster than
the parent could be re-read).
- The child is shared by multiple parents (tags, categories).
Hybrid (denormalize a summary, reference the full): Order embeds a
denormalized customer name and email for read-side display, references
`customer_id` for the source of truth. Acceptable when you accept the
write cost of keeping the summary fresh (change stream, dual write,
materialized view).
Never use Mongoose `populate()` for hot paths — it issues a separate
query per ref. For aggregates, use `$lookup` with a justified index;
for OLTP, denormalize.
No deeply nested arrays of unbounded growth. `posts: [{ comments: [...] }]`
is forbidden. Move to a separate collection at the FIRST design pass.
Decisions and reasoning live in a `docs/data-model.md` next to the
schema files. Every embed-vs-reference choice has a one-line rationale.
Before — comments embedded in posts, document grows unbounded:
{
_id: ObjectId(),
title: "Hello",
body: "...",
comments: [ // grows forever; eventually breaks at 16MB
{ author: "ada", text: "...", at: ISODate(...) },
{ author: "babb", text: "...", at: ISODate(...) },
// ... 50,000 more
],
}
Every new comment rewrites a 12MB document. Reading the post pulls every comment.
After — referenced collection, bounded reads, paginated:
// posts collection
{
_id: ObjectId(),
tenant_id: ObjectId(),
title: "Hello",
body: "...",
comment_count: 50847, // denormalized counter for display
last_comment_at: ISODate(), // denormalized for "active" sort
created_at: ISODate(),
}
// comments collection
{
_id: ObjectId(),
post_id: ObjectId(), // indexed
tenant_id: ObjectId(),
author_id: ObjectId(),
body: "...",
created_at: ISODate(), // indexed with post_id for pagination
}
Post reads are constant size. Comments paginate via keyset on (post_id, created_at, _id). Counter is updated via change stream or $inc on the post in the same transaction.
Rule 3: Compound Indexes Built for Your Queries — The ESR Rule, Not “Index Every Field”
Cursor’s index advice is either “no indexes, you can add them later” (ship-then-burn) or “index every field” (waste write throughput on unused indexes). Both are wrong. The right model is the ESR rule from MongoDB itself: in a compound index, fields go in the order Equality, Sort, Range. A query { tenant_id: X, status: "paid" } sorted by created_at with a total > 100 filter wants the index { tenant_id: 1, status: 1, created_at: -1, total: 1 }. Get the order wrong and the query plan does an in-memory sort or worse, a collection scan.
The rule:
Every query that runs more than ~10 times per second has an index.
Every query that scans more than ~10,000 documents without an index
is a code-review reject.
Compound index field order follows ESR:
1. EQUALITY — fields used with `$eq` (incl. `tenant_id`, `status`,
specific category)
2. SORT — fields used in `.sort()` or `$sort`
3. RANGE — fields used with `$gt`, `$lt`, `$in` (large set), regex
Example query:
db.orders.find({ tenant_id: X, status: "paid" })
.sort({ created_at: -1 })
.limit(20)
Index:
db.orders.createIndex({ tenant_id: 1, status: 1, created_at: -1 })
Verify with `.explain("executionStats")` — `IXSCAN` not `COLLSCAN`,
`totalKeysExamined` close to `nReturned`, no `SORT` stage in memory.
Use partial indexes when ~80% of documents will never match the query
(`{ status: "active" }` when 95% are inactive):
db.orders.createIndex(
{ customer_id: 1, created_at: -1 },
{ partialFilterExpression: { status: { $in: ["pending", "paid"] } } }
)
Use TTL indexes for time-bounded data: sessions, OTPs, soft-deletes,
analytics events:
db.sessions.createIndex({ expires_at: 1 }, { expireAfterSeconds: 0 })
Unique indexes have `name` set explicitly so migrations don't recreate
them. Drop unused indexes; `db.collection.aggregate([{ $indexStats: {} }])`
shows usage.
Index builds in production go via `createIndex({ ... }, { background: true })`
on Mongo <4.2; on >=4.2 the build is automatically online but still
slow on big collections — schedule deliberately.
Migrations create indexes; never `Model.ensureIndexes()` on app boot
in production. Index management is intentional, not a side effect.
Tests assert `executionStats.totalKeysExamined / nReturned < 5` for
every list endpoint.
Before — single-field indexes, query plan does in-memory sort:
db.orders.createIndex({ tenant_id: 1 });
db.orders.createIndex({ status: 1 });
db.orders.createIndex({ created_at: -1 });
db.orders.find({ tenant_id: X, status: "paid" })
.sort({ created_at: -1 })
.limit(20)
.explain("executionStats");
// → uses tenant_id index, fetches 200,000 docs, sorts 200,000 in memory
// → SORT stage in winningPlan, totalKeysExamined = 200000, nReturned = 20
After — compound ESR index, IXSCAN with no in-memory sort:
db.orders.createIndex(
{ tenant_id: 1, status: 1, created_at: -1 },
{ name: "tenant_status_created" }
);
db.orders.find({ tenant_id: X, status: "paid" })
.sort({ created_at: -1 })
.limit(20)
.explain("executionStats");
// → IXSCAN on tenant_status_created, totalKeysExamined = 20, nReturned = 20
// → no SORT stage; cursor returns directly from index
test("orders list uses index, examines <=5 keys per result", async () => {
await OrderFactory.createMany(10000, { tenant_id: t1 });
const explain = await db.collection("orders")
.find({ tenant_id: t1, status: "paid" })
.sort({ created_at: -1 }).limit(20).explain("executionStats");
const stats = explain.executionStats;
expect(stats.totalKeysExamined / stats.nReturned).toBeLessThan(5);
expect(JSON.stringify(explain.queryPlanner.winningPlan)).not.toContain("COLLSCAN");
});
Twenty index keys for twenty results. The query stays fast as the collection grows.
Rule 4: Find Queries Always Project, Always Cursor, Never .toArray() on Unbounded Collections
The MongoDB query that ate the production memory is always db.orders.find({}).toArray() — twenty million documents materialized into a Node.js Buffer. Cursor’s reflex is to return await Model.find(filter) because that’s what the Mongoose tutorial shows. The rule is mechanical: declare the fields you need (projection), iterate via cursor for >1k results, and add a limit to every find that doesn’t have a hard bound.
The rule:
Every `find({...})` declares a projection of the fields actually used:
db.orders.find({ tenant_id: X }, { _id: 1, status: 1, total: 1, created_at: 1 })
Bare `find({...})` (returning every field) is forbidden in production code.
The driver retrieves a few KB per doc instead of every embedded array.
Every `find` returning more than ~1000 documents iterates as a cursor:
for await (const doc of db.orders.find(filter, projection).batchSize(1000)) {
process(doc);
}
NEVER `.toArray()` an unbounded query — use cursor + chunked processing.
Every list endpoint has an explicit `.limit(N)` matched to the page
size, plus a hard upper bound (max 200).
Pagination uses keyset (cursor) on `(sort_key, _id)` — never `.skip(N)`,
which gets slower the deeper the page goes:
// BAD
db.orders.find({...}).sort({ created_at: -1 }).skip(page * 20).limit(20)
// GOOD
const after = decodeCursor(opaqueCursor);
db.orders.find({
...filter,
$or: [
{ created_at: { $lt: after.ts } },
{ created_at: after.ts, _id: { $lt: after.id } }
]
}).sort({ created_at: -1, _id: -1 }).limit(20)
Counts: `.estimatedDocumentCount()` for collection-wide (uses metadata,
constant time); `.countDocuments(filter)` only when accuracy matters.
NEVER `await coll.find(filter).toArray().then(a => a.length)`.
`$where` and `$function` are forbidden — they execute JavaScript per
document and prevent index use. Use aggregation operators instead.
Read concerns explicit: `readConcern: { level: "majority" }` for reads
that need to be globally durable; `"local"` for everything else.
`readPreference: "secondaryPreferred"` only when you've thought about
staleness — defaults to primary.
Tests assert `executionStats.totalDocsExamined < 100 * nReturned` for
every list-style query (verifies the index is doing its job).
Before — full document return, in-memory .toArray(), skip pagination:
async function listOrders(tenantId: string, page: number) {
return Order.find({ tenant_id: tenantId })
.sort({ created_at: -1 })
.skip(page * 20)
.limit(20)
.lean(); // still pulls every field
}
Page 5000 = skip(100000) = scans 100,020 index entries to return 20 docs.
After — projection, keyset cursor, bounded:
const ORDERS_PROJECTION = { _id: 1, status: 1, total: 1, created_at: 1, customer_id: 1 };
async function listOrders(
tenantId: ObjectId,
cursor: { ts: Date; id: ObjectId } | null,
limit = 20,
) {
const safeLimit = Math.min(limit, 200);
const filter: Filter<OrderDoc> = { tenant_id: tenantId };
if (cursor) {
filter.$or = [
{ created_at: { $lt: cursor.ts } },
{ created_at: cursor.ts, _id: { $lt: cursor.id } },
];
}
const docs = await db
.collection<OrderDoc>("orders")
.find(filter, { projection: ORDERS_PROJECTION })
.sort({ created_at: -1, _id: -1 })
.limit(safeLimit + 1)
.toArray();
const hasNext = docs.length > safeLimit;
const items = hasNext ? docs.slice(0, safeLimit) : docs;
const next = hasNext
? encodeCursor(items[items.length - 1].created_at, items[items.length - 1]._id)
: null;
return { items, next };
}
Constant query cost regardless of cursor depth. Only the columns the UI uses cross the wire.
Rule 5: Aggregation Pipelines — $match First, $project Early, allowDiskUse Knowingly
Cursor writes aggregation pipelines like SQL queries — $lookup everything together, then $match at the end to filter. That’s the slowest possible order. The rule for MongoDB is: $match as early as possible (uses indexes if it’s the first stage), $project right after $match (drops fields the pipeline doesn’t need), $lookup only when the foreign collection is indexed on the join key, $unwind only when you’ll re-group, $group last, allowDiskUse: true only when you’ve measured.
The rule:
Pipeline stage ordering, applied mechanically:
1. `$match` — uses indexes ONLY if it's the first stage. Always first.
2. `$project` (or `$set`/`$unset`) — drop fields the rest of the
pipeline doesn't need. Smaller docs = less memory per stage.
3. `$sort` + `$limit` — combine when possible; the optimizer fuses
them and uses the index.
4. `$lookup` — only when the foreign collection has an index on
`foreignField`. Use the new `pipeline` form to project early on
the joined side.
5. `$unwind` — only when followed by `$group` or `$project` that
consumes the unwound docs.
6. `$group` — final aggregation; verify memory bound (default 100MB
per stage).
`$lookup` rules:
- Foreign collection MUST have an index on the field used in
`foreignField` (or the `pipeline` `let` variables).
- Use the `pipeline` form to project + match on the joined side
before pulling rows back:
{ $lookup: {
from: "comments",
let: { post_id: "$_id" },
pipeline: [
{ $match: { $expr: { $eq: ["$post_id", "$$post_id"] } } },
{ $sort: { created_at: -1 } },
{ $limit: 5 },
{ $project: { author_id: 1, body: 1, created_at: 1 } },
],
as: "recent_comments",
}}
- NEVER `$lookup` followed by `$unwind: "$results"` followed by `$group`
to re-collapse — that's an N×M materialization. Use `$lookup` with
pipeline-side aggregation instead.
`allowDiskUse: true` is opt-in per query, not a default. Setting it
hides memory bugs; remove it after the pipeline is reshaped.
`$facet` runs sub-pipelines in parallel for "compute aggregates and a
list in one round trip" — useful for dashboard tiles, dangerous if any
sub-pipeline scans the whole collection.
`$out` and `$merge` (write the result to a collection) are ONLY for
materialized-view jobs in a scheduler, never as part of an OLTP request.
`.explain()` runs in CI for every aggregation that ships. The CI test
asserts `executionStats.executionTimeMillis < threshold` for representative
data sizes.
Aggregation pipelines >5 stages live in `aggregations/*.ts` as exported
arrays, not inline in business logic. Reviewable, testable, indexable.
Before — $match last, full doc through every stage, no projection:
db.orders.aggregate([
{ $lookup: { from: "customers", localField: "customer_id",
foreignField: "_id", as: "customer" } },
{ $unwind: "$customer" },
{ $lookup: { from: "line_items", localField: "_id",
foreignField: "order_id", as: "items" } },
{ $match: { tenant_id: X, status: "paid" } }, // filters AFTER joining everything
{ $sort: { created_at: -1 } },
{ $limit: 50 },
]);
Joins all orders to all customers and line items first, filters at the end. Reads gigabytes to return 50 documents.
After — $match first, projection early, indexed $lookup:
db.orders.aggregate([
{ $match: { tenant_id: X, status: "paid" } }, // uses ESR index
{ $sort: { created_at: -1 } }, // uses same index
{ $limit: 50 },
{ $project: { _id: 1, customer_id: 1, total: 1, created_at: 1 } },
{ $lookup: {
from: "customers",
let: { cid: "$customer_id" },
pipeline: [
{ $match: { $expr: { $eq: ["$_id", "$$cid"] } } },
{ $project: { _id: 1, name: 1, email: 1 } },
],
as: "customer",
}},
{ $unwind: { path: "$customer", preserveNullAndEmptyArrays: true } },
]);
test("orders dashboard pipeline runs in <100ms on 1M orders", async () => {
await OrderFactory.createMany(1_000_000);
const t0 = Date.now();
const result = await db.collection("orders").aggregate(orderDashboardPipeline(t1)).toArray();
expect(Date.now() - t0).toBeLessThan(100);
expect(result.length).toBe(50);
});
Filters down to 50 orders before any join. Joins are constant work. Result is bounded.
Rule 6: Transactions Only When Justified — writeConcern: "majority", Idempotent Operations, Single-Document Atomicity First
Multi-document transactions in MongoDB exist (since 4.0) and they work, but they’re not free. They lock more, retry under conflict, and have a 60-second default cap. Cursor will reach for them whenever there’s “more than one write,” when 90% of the time the operation can be modeled as a single-document update or as two writes that don’t actually need to be atomic. The rule: prefer single-document atomic updates; use transactions only when the invariant genuinely spans documents and the operation cannot be made idempotent.
The rule:
Transaction decision tree:
1. Can the writes target a SINGLE document? Use atomic update operators
(`$inc`, `$set`, `$push`, `$addToSet`, `$pull`) — already atomic.
No transaction needed.
2. Can the writes be made IDEMPOTENT and re-runnable? (E.g., "set
order.status = 'paid' AND insert a payment doc with idempotency_key.")
Use two writes with the idempotency key as the safety net. No
transaction needed.
3. Does the invariant genuinely span two collections AND must be
all-or-nothing AND cannot be made idempotent? Use a transaction.
Transaction rules:
- `writeConcern: { w: "majority" }` mandatory. Default `w: 1` is
unsafe under failover.
- `readConcern: "snapshot"` for read-then-write within the txn.
- Wrap in a retry helper that catches `TransientTransactionError` and
`UnknownTransactionCommitResult` and retries. Use the driver's
`withTransaction` which does this for you.
- Keep transactions SHORT. <1 second wall time. No external API calls
inside. No `await` on slow I/O.
- No reads outside the transaction's session inside the txn — pass
the session to every operation.
Atomic update operators preferred for everything possible:
- "increment a counter" → `$inc`
- "set a field if missing" → `$setOnInsert` with upsert
- "add to a set without duplicates" → `$addToSet`
- "remove an item from an array" → `$pull`
- "compare-and-swap" → `findOneAndUpdate({ _id, version: N }, { $set: {...}, $inc: { version: 1 } })`
Optimistic concurrency via a `version` field on documents that have
contended writes. The update filter includes `version: N`; the increment
bumps to `N+1`. A failed match means someone else wrote — re-read,
re-merge, retry.
Bulk writes use `bulkWrite([...], { ordered: false })` — unordered is
faster but a single failure doesn't stop the rest. Batch size 500-1000.
Before — transaction for what should be one update, no idempotency, no retry:
const session = client.startSession();
session.startTransaction();
const order = await orders.findOne({ _id }, { session });
await orders.updateOne({ _id }, { $set: { status: "paid" } }, { session });
await payments.insertOne({ order_id: _id, amount: order.total }, { session });
await session.commitTransaction();
Transaction wraps two operations that could be one update with an idempotent payment-doc upsert. No retry on conflict. No write concern.
After — single transaction with correct retries, OR no transaction:
// Option A — no transaction needed: idempotency key on the payment doc
async function recordPayment(orderId: ObjectId, amount: Decimal128, idempotencyKey: string) {
await db.collection("payments").updateOne(
{ idempotency_key: idempotencyKey },
{ $setOnInsert: { order_id: orderId, amount, idempotency_key: idempotencyKey, created_at: new Date() } },
{ upsert: true, writeConcern: { w: "majority" } },
);
await db.collection("orders").updateOne(
{ _id: orderId, status: "pending" },
{ $set: { status: "paid", paid_at: new Date() } },
{ writeConcern: { w: "majority" } },
);
}
// Option B — when the invariant genuinely spans collections:
async function transferFunds(fromAccount: ObjectId, toAccount: ObjectId, amount: Decimal128) {
await client.withSession(async (session) => {
await session.withTransaction(
async () => {
const debit = await accounts.updateOne(
{ _id: fromAccount, balance: { $gte: amount } },
{ $inc: { balance: amount.negated() } },
{ session },
);
if (debit.modifiedCount === 0) throw new InsufficientFunds();
await accounts.updateOne(
{ _id: toAccount },
{ $inc: { balance: amount } },
{ session },
);
await ledger.insertOne(
{ from: fromAccount, to: toAccount, amount, at: new Date() },
{ session },
);
},
{
readConcern: { level: "snapshot" },
writeConcern: { w: "majority" },
maxCommitTimeMS: 1000,
},
);
});
}
Idempotency key prevents double-charge whether the call retries or not. withTransaction handles the retry loop. Write concern is majority.
Rule 7: One Connection Pool Per Application — Configure It, Reuse It, Don’t Re-Open Per Request
Cursor’s default Node.js MongoDB code reads await MongoClient.connect(uri) inside the request handler and sometimes even inside for loops. The driver builds and discards connection pools per request, the database sees thousands of TCP handshakes, and at peak the ops team sees getaddrinfo ENOTFOUND because the DNS cache is exhausted. The rule: one client, one pool, configured deliberately, attached to the application’s lifecycle, exposed via dependency injection or a singleton.
The rule:
The MongoDB client (and its pool) is built ONCE at app startup, closed
ONCE at app shutdown, and shared across all requests.
// app.ts (Node)
const client = new MongoClient(env.MONGO_URI, {
maxPoolSize: 50, // tune to (connections per app * num apps) ≤ Atlas tier
minPoolSize: 5,
maxIdleTimeMS: 60_000,
serverSelectionTimeoutMS: 5_000,
socketTimeoutMS: 30_000,
connectTimeoutMS: 5_000,
waitQueueTimeoutMS: 2_000, // bound how long a request waits for a connection
retryWrites: true,
retryReads: true,
readPreference: "primaryPreferred",
writeConcern: { w: "majority", wtimeout: 5_000 },
});
await client.connect();
app.locals.db = client.db(env.MONGO_DB);
// graceful shutdown
process.on("SIGTERM", async () => {
server.close();
await client.close();
process.exit(0);
});
NEVER `new MongoClient(uri)` inside a request handler, controller, or
service constructor. Inject the existing client.
Connection pool sizing:
- Atlas M10 = 1500 connections; per-app pool = 1500 / num_app_instances.
- Don't set `maxPoolSize` higher than the database's per-app share.
- `waitQueueTimeoutMS` bounds the worst-case request latency under
pool exhaustion.
Health check endpoint runs `db.admin().ping()` (cheap) and reports the
pool's current/available connections. Liveness vs readiness probes
distinguish "process up" from "DB reachable."
Mongoose: one `mongoose.connect(uri, options)` per process; don't
create new `mongoose.Connection` instances per request.
Change streams (`coll.watch(pipeline)`) live on the SAME shared client.
ONE watcher per (collection, pipeline) per process. `resumeAfter` /
`startAfter` token persisted to a checkpoint store so a restart picks
up where it left off.
Driver version: pin the major version in `package.json` and update
deliberately. The driver's pool semantics changed in v5; don't
straddle versions.
Before — connect per request, no shutdown, leaking pools:
app.get("/orders", async (req, res) => {
const client = await MongoClient.connect(process.env.MONGO_URI);
const orders = await client.db().collection("orders").find({}).toArray();
res.json(orders);
// client never closed; leaks
});
After — single client, lifecycle-managed, configured pool:
// db.ts
export const client = new MongoClient(env.MONGO_URI, {
maxPoolSize: Number(env.MONGO_POOL_SIZE ?? 50),
minPoolSize: 5,
maxIdleTimeMS: 60_000,
serverSelectionTimeoutMS: 5_000,
waitQueueTimeoutMS: 2_000,
retryWrites: true,
writeConcern: { w: "majority", wtimeout: 5_000 },
});
// server.ts
await client.connect();
const db = client.db(env.MONGO_DB);
const app = createApp({ db });
const server = app.listen(env.PORT);
process.on("SIGTERM", async () => {
server.close();
await client.close();
});
// orders.ts
export async function listOrders(db: Db, tenantId: ObjectId) {
return db.collection<OrderDoc>("orders")
.find({ tenant_id: tenantId }, { projection: ORDERS_PROJECTION })
.limit(50).toArray();
}
One pool, reused across thousands of requests. Graceful shutdown drains it. The DB sees a stable connection count.
Rule 8: Tests Hit a Real MongoDB — Testcontainers, Schema Validators, No Mock Driver
Cursor’s default test mocks the driver: jest.mock("mongodb") with findOne returning a fake doc. That tests nothing about what MongoDB will do — not the index, not the validator, not the aggregation operator semantics, not the transaction retry, not the wire format. The only useful tests run against a real MongoDB. With Testcontainers (or the official mongodb-memory-server for fast unit tests), spinning up a fresh instance per test suite is one line and runs in CI without ceremony.
The rule:
Tests run against a REAL MongoDB. No mocking the driver. No mocking
collection methods. No `jest.mock("mongodb")`.
Two flavors:
- `mongodb-memory-server` — in-process MongoDB (downloads binaries),
fast, no Docker required. Suitable for unit/integration tests of
business logic.
- `testcontainers/mongodb` — real Docker MongoDB matching production
version. Suitable for tests that exercise replica-set features
(transactions, change streams) or need the real wire protocol.
Test setup:
- Spin up Mongo once per test FILE (or per-suite for slow tests).
- Apply schema validators ($jsonSchema collMod) before any insert.
- Apply indexes that production has — tests against an unindexed
collection lie about query plans.
- Seed with factories (factory-bot equivalent: @factory-girl/core,
fishery, polyfactory). Hand-typed docs in tests are forbidden.
Per-test isolation:
- Drop ONLY the data, not the database, between tests:
`await db.collection(name).deleteMany({})` for each non-system
collection. Faster than recreating indexes.
- Use a unique tenant_id per test to avoid cross-test interference
when running in parallel.
Assert on real query behavior:
- `.explain("executionStats")` checked in tests for indexed-query
contracts (Rule 3).
- For aggregations, snapshot the `executionStats.stages` to detect
plan regressions.
- Schema-validator tests: insert a bad document, expect
`MongoServerError` with code 121 (DocumentValidationFailure).
External services that hit Mongo via change streams: test the consumer
with a real `coll.watch()` and an inserted doc, assert the consumer
sees it. Don't fake the change-stream payload.
Migrations have tests too: a test that runs the migration up, asserts
the new state, runs it down, asserts the original state.
Coverage: >85% on data-access code (repositories, query builders),
>75% on services. Mutation testing for any invariant-critical write.
Before — mocked driver, tests prove nothing:
jest.mock("mongodb");
test("listOrders returns filtered orders", async () => {
const find = jest.fn().mockReturnValue({ toArray: () => Promise.resolve([{ _id: "1" }]) });
const collection = jest.fn().mockReturnValue({ find });
const db = { collection } as any;
const result = await listOrders(db, new ObjectId());
expect(find).toHaveBeenCalled();
expect(result).toHaveLength(1);
});
Doesn’t test the projection. Doesn’t test the index usage. Doesn’t catch a query that scans the collection.
After — real MongoDB, real index, real validator:
let mongo: MongoMemoryServer;
let client: MongoClient;
let db: Db;
beforeAll(async () => {
mongo = await MongoMemoryServer.create({ binary: { version: "7.0.5" } });
client = await MongoClient.connect(mongo.getUri());
db = client.db("test");
await applyValidators(db);
await applyIndexes(db);
});
afterAll(async () => {
await client.close();
await mongo.stop();
});
beforeEach(async () => {
for (const c of await db.listCollections().toArray()) {
if (!c.name.startsWith("system.")) await db.collection(c.name).deleteMany({});
}
});
test("listOrders uses tenant_status_created index", async () => {
const tenantId = new ObjectId();
await OrderFactory.createMany(1000, db, { tenant_id: tenantId, status: "paid" });
const explain = await db.collection("orders")
.find({ tenant_id: tenantId, status: "paid" }, { projection: ORDERS_PROJECTION })
.sort({ created_at: -1 }).limit(20).explain("executionStats");
expect(explain.queryPlanner.winningPlan.inputStage.indexName).toBe("tenant_status_created");
expect(explain.executionStats.totalKeysExamined).toBeLessThanOrEqual(20);
});
test("inserting an order with bad status fails the validator", async () => {
await expect(db.collection("orders").insertOne({
tenant_id: new ObjectId(),
customer_id: new ObjectId(),
status: "frobnicated",
total: Decimal128.fromString("10"),
line_items: [{ product_id: new ObjectId(), qty: 1, price: Decimal128.fromString("10") }],
created_at: new Date(),
})).rejects.toMatchObject({ code: 121 });
});
Real index plan. Real validator. A regression that drops the index OR removes the validator fails CI.
The Complete .cursorrules File
Drop this in the repo root. Cursor and Claude Code both pick it up.
# MongoDB — Production Patterns
## Schema Validation
- Two layers: app boundary (Mongoose strict / Pydantic extra=forbid)
AND DB collection $jsonSchema validators.
- Money is decimal128, never double. Timestamps are Date, never strings.
References are objectId, never strings.
- additionalProperties: false on every embedded shape.
- Schema changes are migrations: apply validator, backfill, switch to strict.
## Embed vs Reference
- Embed when child is bounded, always read with parent, doc <100KB.
- Reference when unbounded, queried independently, write-hot, or shared.
- Hybrid: denormalize a summary, reference the source of truth.
- No deeply nested unbounded arrays. Move to a separate collection at
the first design pass.
- Decisions documented in docs/data-model.md.
## Indexes
- Every >10qps query has an index. Every >10k-doc scan without index
is a reject.
- Compound order = ESR: Equality, Sort, Range.
- Verify with .explain("executionStats"): IXSCAN, no in-memory SORT,
totalKeysExamined ≈ nReturned.
- Partial indexes when ~80% of docs won't match; TTL for time-bounded.
- Named indexes; drop unused (check $indexStats).
- Indexes managed via migrations, never ensureIndexes() on boot.
- Tests assert totalKeysExamined / nReturned < 5 for list endpoints.
## Find Queries
- Every find() has an explicit projection. Bare find returning full
docs is forbidden.
- >1000 docs = cursor + batchSize, never .toArray().
- .limit(N) on every list query, hard upper bound 200.
- Keyset (cursor) pagination, never .skip(N).
- .estimatedDocumentCount() for collection counts; .countDocuments()
only when accuracy required.
- $where / $function forbidden.
- readConcern / readPreference declared explicitly.
## Aggregation
- Pipeline order: $match → $project → $sort+$limit → $lookup → $unwind
→ $group.
- $match first stage = uses index.
- $lookup foreign field MUST be indexed; use pipeline form to project
on the joined side.
- No $lookup + $unwind + $group to re-collapse — use pipeline-side
aggregation.
- allowDiskUse=true is opt-in per query; never default.
- $out / $merge for materialized views in scheduler only, never OLTP.
- Aggregations >5 stages live in aggregations/*.ts as exported arrays.
- CI explain assertion on representative data sizes.
## Transactions
- Decision tree: single doc → atomic operators. Idempotent → idempotency
key. Cross-collection invariant → transaction.
- writeConcern: majority. readConcern: snapshot when needed.
- Use withTransaction for retry loop on TransientTransactionError /
UnknownTransactionCommitResult.
- Transactions <1s, no external I/O inside.
- Optimistic concurrency via version field for hot writes.
- bulkWrite ordered=false in batches of 500-1000.
## Driver & Connection
- ONE MongoClient per process, built at startup, closed on SIGTERM.
- Configure maxPoolSize, waitQueueTimeoutMS, serverSelectionTimeoutMS,
retryWrites, writeConcern explicitly.
- Pool size ≤ Atlas tier connection limit / app instances.
- Health endpoint pings DB; readiness vs liveness probes distinguished.
- One change-stream watcher per (collection, pipeline) per process,
resume token checkpointed.
- Driver major version pinned; updates deliberate.
## Testing
- Real MongoDB via mongodb-memory-server or testcontainers; never
mock the driver.
- Apply validators + indexes before tests.
- Factories for test data; hand-typed docs forbidden.
- deleteMany({}) per test, not full DB recreate.
- Unique tenant_id per test for parallel safety.
- Index plan assertions (.explain in tests) on every list endpoint.
- Schema-validator tests: bad doc → MongoServerError code 121.
- Migration tests: up + assert + down + assert.
- Coverage >85% data access, >75% services.
End-to-End Example: A Tenant-Scoped Paginated Order Feed
Without rules: unindexed scan, full doc return, skip pagination, in-memory sort.
async function feed(tenantId: string, page: number) {
return db.collection("orders").find({
tenant_id: tenantId,
status: { $in: ["paid", "shipped"] },
}).sort({ created_at: -1 }).skip(page * 20).limit(20).toArray();
}
50M-doc collection, page 5000, no compound index = 8 second response, full document return, scans 100K index entries.
With rules: ESR index, projection, keyset cursor, query-plan asserted in tests.
// migrations/2026-04-orders-index.ts
await db.collection("orders").createIndex(
{ tenant_id: 1, status: 1, created_at: -1, _id: -1 },
{ name: "tenant_status_created_id" },
);
// orders/queries.ts
const PROJECTION = {
_id: 1, status: 1, total: 1, created_at: 1,
customer_id: 1, "shipping_address.city": 1,
};
export async function feed(
db: Db,
tenantId: ObjectId,
cursor: { ts: Date; id: ObjectId } | null,
limit = 20,
) {
const safeLimit = Math.min(limit, 100);
const filter: Filter<OrderDoc> = {
tenant_id: tenantId,
status: { $in: ["paid", "shipped"] },
};
if (cursor) {
filter.$or = [
{ created_at: { $lt: cursor.ts } },
{ created_at: cursor.ts, _id: { $lt: cursor.id } },
];
}
const docs = await db.collection<OrderDoc>("orders")
.find(filter, { projection: PROJECTION })
.sort({ created_at: -1, _id: -1 })
.limit(safeLimit + 1)
.toArray();
return buildPage(docs, safeLimit);
}
// orders/queries.test.ts
test("feed uses index, scans <=21 keys per page", async () => {
await OrderFactory.createMany(50000, db, { tenant_id: t1 });
const explain = await db.collection("orders")
.find({ tenant_id: t1, status: { $in: ["paid", "shipped"] } }, { projection: PROJECTION })
.sort({ created_at: -1, _id: -1 }).limit(21).explain("executionStats");
expect(explain.executionStats.totalKeysExamined).toBeLessThanOrEqual(21);
expect(explain.queryPlanner.winningPlan.inputStage.indexName)
.toBe("tenant_status_created_id");
});
Constant cost regardless of cursor depth. Projection limits wire bytes. The test prevents an index regression from shipping.
Get the Full Pack
These eight rules cover the MongoDB patterns where AI assistants consistently reach for the wrong idiom. Drop them into .cursorrules and the next prompt you write will look different — schema-validated, intentionally-modeled, ESR-indexed, projection-bounded, pipeline-ordered, transaction-justified, pool-managed, real-Mongo-tested code, without having to re-prompt.
If you want the expanded pack — these eight plus rules for change streams as event sources, time-series collections, MongoDB Atlas Search vs $text, vector search with $vectorSearch, sharding strategy and shard-key selection, online schema migrations on multi-TB collections, monitoring with the MongoDB profiler and slow-query logs, and the deploy patterns I use for replica sets on Kubernetes — it is bundled in Cursor Rules Pack v2 ($27, one payment, lifetime updates). Drop it in your repo, stop fighting your AI, ship MongoDB you would actually merge.