Structure of the Storage Area in Tree Page of SQLite

Hello, I’m Maneshwar. I’m working on git-lrc: a Git hook for Checking AI generated code.
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs — without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

A tree page is not just a flat array of cells. It is a carefully managed miniature heap.

Cells are stored at the very end of the page (high addresses), and they grow backward toward the beginning of the page.

Meanwhile, the cell pointer array begins immediately after the page header and grows forward.

fig 6.7

The two regions grow toward each other, with free space in between.

The Cell Pointer Array

The cell pointer array is the logical directory of the page.

Each entry in this array is:

  • A 2-byte integer
  • Representing the offset (from the beginning of the page)
  • Pointing to the actual cell inside the cell content area

The number of entries in this array is stored in the page header at offset 3.
fig 6.6

Two important observations:

  1. Pointers are stored in sorted key order.
  2. Cells themselves may not be physically sorted.

This separation between logical order and physical placement is crucial.

Even if cells are scattered in memory due to inserts and deletes, the pointer array maintains sorted traversal order.

That’s why search operations rely on the pointer array, not physical layout.

Why Cells Are Not Always Contiguous

Over time, random inserts and deletes create holes in the cell content area.

Imagine:

  • Insert
  • Delete
  • Insert
  • Delete

Cells move around. Free regions appear between them.

Instead of compacting immediately on every change (which would be expensive), SQLite allows fragmentation to accumulate temporarily.

To manage this safely, unused regions inside the cell content area are organized into a free block chain.

Free Blocks: The Internal Linked List

Unused space inside the cell content area is collected into a singly linked list of free blocks.

  • Blocks are arranged in ascending address order
  • The head pointer to the list is stored in the page header at offset 1
  • Each free block is at least 4 bytes

Each free block begins with 4 bytes of control information:

  • First 2 bytes → pointer to next free block (0 means end of list)
  • Next 2 bytes → size of this free block (including header)

This design allows SQLite to reuse space without reorganizing the entire page.

Fragments: Tiny Leftovers

A free block must be at least 4 bytes. So what happens if there are 1–3 unused bytes?

Those are called fragments.

Fragments cannot be placed in the free block chain because they are too small to store block metadata.

Instead their total size is tracked in the page header at offset 7

The cumulative fragment size is limited to 255 bytes, if fragmentation grows too large, SQLite triggers page defragmentation

Defragmentation rewrites the page:

  • Cells are packed tightly
  • Free space becomes contiguous
  • Fragment count resets

This keeps the storage structure healthy over time.

The Cell Content Boundary

The page header also stores the offset of the first byte of the cell content area, at offset 5.

This value acts as the moving boundary between:

  • Cell content area (growing downward)
  • Unallocated space (middle region)

If this value is 0, it is treated as offset 65,536, a special encoding for large page sizes.

Together with the pointer array growth, this boundary defines how much free space remains on the page.

What This Design Achieves

Let’s step back and appreciate the elegance.

Each tree page behaves like a tiny memory allocator:

  • Pointer array → logical ordering
  • Cell content area → physical storage
  • Free block chain → reusable space
  • Fragment tracking → fragmentation control
  • Defragmentation → periodic cleanup

All of this happens inside a single disk page.

This is why SQLite can perform insert and delete operations efficiently without rewriting entire tables.

It also explains why page-level journaling is sufficient for crash recovery: each page is internally self-consistent.

Where We Go Next

We’ve now covered:

  • Page header
  • Cell pointer array
  • Cell content area
  • Free block management
  • Fragment tracking

What remains is the final and most detailed component:

Structure of a cell

That’s where keys, payloads, rowids, overflow pointers, and index entries are encoded into bytes.

git-lrc

👉 Check out: git-lrc
Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:

GitHub logo

HexmosTech
/
git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs — without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud
operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

  • 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won’t notice until production.
  • 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
  • 🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
  • 🔗 Why git? Git is universal. Every editor, every IDE, every AI…

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

Spec Is Not the Cure — Unless It’s Discovered Through Discussion

Related Posts