An Algebraic View of Databases

Hello, I’m Maneshwar. I’m working on FreeDevTools online currently building **one place for all dev tools, cheat codes, and TLDRs* — a free, open-source hub where developers can quickly find and use tools without any hassle of searching all over the internet.*

Yesterday, we closed the gap between relations and files, showing how an RDBMS hides physical storage behind a logical relational view.

Today, we move one level deeper, not into files yet, but into how relations themselves are manipulated.

Once data is represented as relations, the DBMS must support operations that transform one set of relations into another, while preserving the relational abstraction. This is where formal mathematical foundations come in.

Two Ways to Specify Relational Operations

There are two established mathematical frameworks for expressing operations on relations:

Relational Algebra

Relational algebra is an algebraic system of operations defined on relations.

Key properties:

  • Each operation takes relations as input
  • Each operation produces a relation as output
  • The system is closed, meaning operations can be freely composed

Because of this closure, complex queries can be expressed as compositions of simpler operations. This makes relational algebra ideal as an internal representation for query execution.

Relational Calculus

Relational calculus, in contrast, is declarative.

Instead of specifying how to compute results, it specifies what the result should satisfy. While powerful and expressive, its notation is more involved.

For now, we restrict ourselves to relational algebra, since it aligns closely with how DBMSs reason about and execute queries internally.

Relations as Mathematical Sets

A relation can be treated as a mathematical set of tuples:

  • All tuples are distinct
  • All tuples share the same schema
  • Each tuple is an ordered list of attribute values

This allows us to define operations using familiar set-theoretic concepts, adapted to structured data.

Primitive Relational Operations

The foundational operations of relational algebra are:

  • Projection
  • Selection
  • Cartesian product (cross-product)
  • Set-difference
  • Union

From these, more complex operations like intersection, join, and division are derived.

Each operation defines not only which tuples appear in the result, but also what the schema of the result relation looks like.

Projection (π)

Projection is a unary operation that removes unwanted columns from a relation.

  • Operates on a single relation
  • Returns only selected attributes
  • Eliminates duplicate tuples in the result

Conceptually, projection reshapes a relation horizontally, keeping only the attributes that matter.

Selection (σ)

Selection is also a unary operation, but it removes unwanted rows.

  • Filters tuples based on a condition
  • Preserves the original schema
  • Only tuples satisfying the condition appear in the result

Selection narrows a relation vertically, without changing its structure.

Set-Based Binary Operations

These operations require schema compatibility (same attributes and types).

Set-Difference (−)

Returns tuples that appear in the first relation but not in the second.

Union (∪)

Returns tuples that appear in either or both relations.

Intersection (∩)

Returns tuples that appear in both relations.

In all three cases, the output schema is identical to the input schemas.

Cross-Product (×)

Cross-product is a binary operation that combines every tuple of one relation with every tuple of another.

  • Input relations may have different schemas
  • Output contains all possible tuple pairs
  • Output schema is the concatenation of both schemas
  • Attribute name conflicts are resolved by renaming

While rarely used directly by users, cross-product is fundamental, it forms the basis of joins.

Join

Join is a derived operation built on cross-product + selection.

Different types of joins arise based on the selection condition:

  • Theta join: arbitrary comparison condition
  • Equi-join: equality conditions only
  • Natural join: equi-join on all common attributes, with duplicate columns removed

Joins are the primary way relations are meaningfully combined in practice, even though they are formally reducible to simpler operations.

Why This Matters

At this point, we’ve established:

  • Relations as mathematical objects
  • A closed set of operations over relations
  • A formal way to compose complex data transformations

This algebraic foundation is not just theoretical. It directly influences:

  • Query optimization
  • Execution planning
  • How SQL queries are translated and evaluated inside the DBMS

Where We Go Next

With relational algebra in place, the next steps are natural:

  • How SQL maps onto relational algebra
  • How a DBMS parses, optimizes, and executes queries
  • The internal components that make this possible

That takes us into SQL and the architecture of a typical RDBMS, which is where the abstraction finally meets real system design.

FreeDevTools

👉 Check out: FreeDevTools

Any feedback or contributors are welcome!

It’s online, open-source, and ready for anyone to use.

⭐ Star it on GitHub: freedevtools

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

Cincoze MD-3000 DIN-Rail Machine Vision Computer

Related Posts