Postgres window functions are a powerful SQL feature that lets you perform calculations across rows related to the current row without collapsing results. Unlike aggregate functions that reduce multiple rows into a single value, window functions retain each row while allowing computations such as running totals, moving averages, and rankings. They work well for real-time analytics, user scoring, or time-series comparisons, which are commonly needed in modern data-driven apps.
Window functions allow you to express powerful analytics logic in a single SQL statement that’s easier to write and debug and optimized by Postgres under the hood. This makes your queries faster and easier to manage.
In this article, I’ll explain what window functions are, how they differ from traditional aggregate functions, and how to use them effectively in Postgres. I’ll walk you through real-world examples of their usage, such as ranking users, calculating moving averages, tracking historical changes, and computing running totals, all while using clean, performant SQL that runs directly in your database.
All examples in this blog are tested on Supabase, an open source backend-as-a-service (BaaS) platform that simplifies backend development for web and mobile applications. Supabase provides full support for Postgres databases.
Understanding Postgres Window Functions
The term window function in SQL comes from the analogy of a sliding window that moves across data, allowing for calculations on a specific subset (“window” or “view”) of rows at a time. A window is a set of rows that are related to the current row and over which a window function is executed.
Window functions retain each row while still allowing calculations across neighboring rows. This makes them incredibly useful for analytical tasks like running totals, rankings, percent changes, or comparing a value to a group average. Window functions are used to write clean, expressive, and high-performance SQL. Here’s the basic structure of a window function:
<function_name>(<column>) OVER (
[PARTITION BY column_list]
[ORDER BY column_list]
[ROWS or RANGE frame_clause]
)
The previous snippet’s components function as follows:
-
PARTITION BY
divides the data set into separate groups (like departments or users). Each group is treated as an independent mini-table. -
ORDER BY
specifies the order for the rows within each partition, affecting how cumulative or ranking operations are performed. -
ROWS / RANGE
defines the frame—how many rows to include relative to the current row.
Window Functions vs. Aggregate Functions
The easiest way to understand the difference between the window functions and aggregate functions is with an example. Imagine you want to analyze sales data for the following table:
id | customer_name | region | product | category | quantity | unit_price | amount | sale_date |
---|---|---|---|---|---|---|---|---|
1 | Alice | North | Laptop A | Electronics | 1 | 800.00 | 800.00 | 2024-01-01 |
2 | Bob | North | Phone X | Electronics | 2 | 500.00 | 1000.00 | 2024-01-02 |
3 | Charlie | North | Mouse | Accessories | 3 | 20.00 | 60.00 | 2024-01-03 |
4 | David | South | Keyboard | Accessories | 1 | 50.00 | 50.00 | 2024-01-01 |
5 | Eva | South | Laptop B | Electronics | 1 | 900.00 | 900.00 | 2024-01-02 |
6 | Frank | East | Monitor | Electronics | 1 | 200.00 | 200.00 | 2024-01-03 |
7 | Grace | East | Headphones | Accessories | 2 | 100.00 | 200.00 | 2024-01-04 |
8 | Heidi | West | Phone X | Electronics | 1 | 500.00 | 500.00 | 2024-01-05 |
If you’re analyzing sales by region, an aggregate query using SUM(amount)
with GROUP BY region
will give you the total per region but discard individual transactions:
SELECT region, SUM(amount) FROM sales GROUP BY region;
So the above statement will give you the following result:
region | sum |
---|---|
West | 500.00 |
North | 1860.00 |
East | 400.00 |
South | 950.00 |
A window function instead adds a column with the total sales per region alongside each original sale record. This means you get both the group-level insight and the detailed row-level data in a single result—ideal for use cases like percentiles or ranking users without losing context. Take the following window function as an example:
SELECT
id, region, amount,
SUM(amount) OVER (PARTITION BY region) AS total_sales_per_region
FROM sales;
This will output the following:
id | region | amount | total_sales_per_region |
---|---|---|---|
6 | East | 200.00 | 400.00 |
7 | East | 200.00 | 400.00 |
3 | North | 60.00 | 1860.00 |
2 | North | 1000.00 | 1860.00 |
1 | North | 800.00 | 1860.00 |
5 | South | 900.00 | 950.00 |
4 | South | 50.00 | 950.00 |
8 | West | 500.00 | 500.00 |
Window Frames
While the “window” is a set of rows that a window function can operate on that is defined using the PARTITION BY
(and optionally, ORDER BY
) clause, a “window frame” is a subset of the “window”, used for row-by-row calculation. It’s defined using ROWS
or RANGE
clauses (like ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
). The “window frame” controls how many rows around the current row are used in the calculation. The window frame answers the question “Which nearby rows should be considered for this calculation?”
Window frames can be defined in two ways:
-
ROWS
: Selects a fixed number of rows before and after the current row, no matter what the values are.-
Example:
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
includes the row before, the current row, and the row after.
-
Example:
-
RANGE
: Selects rows that have similar values based on theORDER BY
column.-
Example:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
includes all rows with values less than or equal to the current one and treats tied values as a group.
-
Example:
The following are some common window-frame clause keywords:
Clause | Meaning |
---|---|
UNBOUNDED PRECEDING |
The very first row in the partition |
UNBOUNDED FOLLOWING |
The very last row in the partition |
CURRENT ROW |
The row currently being evaluated |
|
N rows before the current row |
|
N rows after the current row |
Complete frame clauses combine these keywords to define a range. For example, this creates a three-row window—the previous row, the current row, and the next row:
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
This includes all rows from the partition’s start up to the current row:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Together, these give you precise control over which rows are used in each window function’s calculation.
Common Types of Window Functions
The common types of window functions are divided into three major categories:
-
Ranking functions: These functions assign a rank or position to each row within a partition based on a specified ordering—for example,
ROW_NUMBER()
,RANK()
,DENSE_RANK()
, andNTILE(n)
. -
Aggregate functions: These functions perform a calculation on a set of rows and return a single value. When used as window functions, they operate within a defined window (partition) instead of collapsing rows like in a
GROUP BY
clause. The aggregate functions areSUM()
,AVG()
,COUNT()
,MAX()
, andMIN()
. -
Value functions: These functions retrieve values from another row in the window frame relative to the current row. The functions are
LAG(n)
,LEAD(n)
,FIRST_VALUE()
,LAST_VALUE()
, andNTH_VALUE(n)
.
The following table summarizes each of these function types:
Function Type | What does it do? | Why is it useful? |
---|---|---|
Aggregate | Calculates totals, averages, etc. | Compare rows with group totals; calculate trends |
Ranking | Assigns rank or order to rows | Leaderboards, top-N queries, grouped rankings |
Value | Retrieves relative row values | Change detection, trend analysis, time comparisons |
Let’s now explore each of these function types with real-world examples.
Ranking Functions in Action
Postgres provides a powerful set of ranking functions that allow you to assign positions to rows based on a given order. As an example, this article uses a leaderboard-building scenario, where you can use the functions to perform actions such as identifying top performers, dividing users into cohorts, or analyzing performance relative to peers.
Ranking functions are always used with the OVER(ORDER BY ...)
clause, and they do not collapse rows like GROUP BY
; they preserve row-level detail while adding new insights. The example below uses the following student_results
table:
student_id | student_name | subject | score |
---|---|---|---|
1 | Abigail | Math | 92 |
2 | Abigail | Science | 90 |
3 | Abigail | English | 88 |
4 | Bruce | Math | 78 |
5 | Bruce | Science | 75 |
6 | Bruce | English | 82 |
7 | Sarah | Math | 92 |
8 | Sarah | Science | 88 |
9 | Sarah | English | 91 |
10 | David | Math | 68 |
11 | David | Science | 72 |
12 | David | English | 70 |
13 | Adam | Math | 88 |
14 | Adam | Science | 85 |
15 | Adam | English | 90 |
ROW_NUMBER()
ROW_NUMBER()
assigns a unique sequential number to each row within a partition based on the order specified. Tied values are given different row numbers arbitrarily. The following example ranks students within each subject based on their score:
SELECT
subject,
student_name,
score,
ROW_NUMBER() OVER (
PARTITION BY subject
ORDER BY score DESC
) AS row_num
FROM student_results;
The output assigns a unique row number starting from 1 for the top scorer within each subject:
subject | student_name | score | row_num |
---|---|---|---|
English | Sarah | 91 | 1 |
English | Adam | 90 | 2 |
English | Abigail | 88 | 3 |
English | Bruce | 82 | 4 |
English | David | 70 | 5 |
Math | Abigail | 92 | 1 |
Math | Sarah | 92 | 2 |
Math | Adam | 88 | 3 |
Math | Bruce | 78 | 4 |
Math | David | 68 | 5 |
Science | Abigail | 90 | 1 |
Science | Sarah | 88 | 2 |
Science | Adam | 85 | 3 |
Science | Bruce | 75 | 4 |
Science | David | 72 | 5 |
RANK()
RANK()
assigns a rank to each row. Tied values receive the same rank, but the next rank is skipped to preserve the total count. Here’s an example that assigns a rank to each student within their subject based on descending scores, giving equal ranks for tied scores and skipping the next rank(s) accordingly:
SELECT
subject,
student_name,
score,
RANK() OVER (
PARTITION BY subject
ORDER BY score DESC
) AS rank
FROM student_results;
In the output, Abigail and Sarah share rank 1 in the subject Math. Adam is ranked at 3 because it’s the next available in Maths:
subject | student_name | score | rank |
---|---|---|---|
English | Sarah | 91 | 1 |
English | Adam | 90 | 2 |
English | Abigail | 88 | 3 |
English | Bruce | 82 | 4 |
English | David | 70 | 5 |
Math | Abigail | 92 | 1 |
Math | Sarah | 92 | 1 |
Math | Adam | 88 | 3 |
Math | Bruce | 78 | 4 |
Math | David | 68 | 5 |
Science | Abigail | 90 | 1 |
Science | Sarah | 88 | 2 |
Science | Adam | 85 | 3 |
Science | Bruce | 75 | 4 |
Science | David | 72 | 5 |
DENSE_RANK()
DENSE_RANK()
works like RANK()
, but there are no gaps (ie no numbers are skipped) in the ranking sequence. Ties get the same rank, and the next rank increases by one. The following query assigns a dense rank to each student within their subject based on descending scores:
SELECT
subject,
student_name,
score,
DENSE_RANK() OVER (
PARTITION BY subject
ORDER BY score DESC
) AS dense_rank
FROM student_results;
As it gives equal ranks for tied scores without skipping the next rank (ranks remain consecutive), Adam gets rank 2 in Math this time:
subject | student_name | score | dense_rank |
---|---|---|---|
English | Sarah | 91 | 1 |
English | Adam | 90 | 2 |
English | Abigail | 88 | 3 |
English | Bruce | 82 | 4 |
English | David | 70 | 5 |
Math | Abigail | 92 | 1 |
Math | Sarah | 92 | 1 |
Math | Adam | 88 | 2 |
Math | Bruce | 78 | 3 |
Math | David | 68 | 4 |
Science | Abigail | 90 | 1 |
Science | Sarah | 88 | 2 |
Science | Adam | 85 | 3 |
Science | Bruce | 75 | 4 |
Science | David | 72 | 5 |
NTILE(n)
NTILE(n)
function distributes rows into n equal groups (or as evenly as possible). Each group is assigned a bucket number starting from 1. This query divides students within each subject into two equally sized groups (tiles) based on descending scores:
SELECT
subject,
student_name,
score,
NTILE(2) OVER (
PARTITION BY subject
ORDER BY score DESC
) AS performance_bucket
FROM student_results;
It also labels them as performance buckets (eg 1 = higher performers; 2 = lower performers):
subject | student_name | score | performance_bucket |
---|---|---|---|
English | Sarah | 91 | 1 |
English | Adam | 90 | 1 |
English | Abigail | 88 | 1 |
English | Bruce | 82 | 2 |
English | David | 70 | 2 |
Math | Abigail | 92 | 1 |
Math | Sarah | 92 | 1 |
Math | Adam | 88 | 1 |
Math | Bruce | 78 | 2 |
Math | David | 68 | 2 |
Science | Abigail | 90 | 1 |
Science | Sarah | 88 | 1 |
Science | Adam | 85 | 1 |
Science | Bruce | 75 | 2 |
Science | David | 72 | 2 |
Aggregate Window Functions in Action
Aggregate window functions allow you to perform aggregate calculations like SUM()
, AVG()
, and COUNT()
without collapsing the result set. These functions are useful in calculating things like running totals, moving averages, or group-level statistics while still seeing individual records. Let’s walk through these key aggregate window functions and how they’re used in real-world scenarios using the following sales_commissions
table:
sale_id | employee_name | product_name | region | units_sold | unit_price | commission_rate | sale_date |
---|---|---|---|---|---|---|---|
1 | Alice | Phone | North | 10 | 500.00 | 0.08 | 2024-01-01 |
2 | Bob | Phone | North | 8 | 500.00 | 0.08 | 2024-01-01 |
3 | Charlie | Phone | South | 12 | 500.00 | 0.08 | 2024-01-02 |
4 | Alice | Laptop | East | 3 | 1000.00 | 0.10 | 2024-01-02 |
5 | Bob | Laptop | East | 4 | 1000.00 | 0.10 | 2024-01-02 |
6 | David | Laptop | West | 5 | 1000.00 | 0.10 | 2024-01-03 |
7 | Charlie | Monitor | South | 6 | 300.00 | 0.07 | 2024-01-03 |
8 | Eva | Monitor | North | 6 | 300.00 | 0.07 | 2024-01-03 |
9 | Frank | Monitor | East | 7 | 300.00 | 0.07 | 2024-01-04 |
10 | Alice | Monitor | North | 10 | 250.00 | 0.05 | 2024-01-02 |
SUM()
SUM()
calculates a running total (cumulative sum) of values as you move through rows or the total sum for an entire group without collapsing the original data. When calculating the running total, it adds values row by row in order (eg daily sales accumulating over time). In calculating group total, it sums all values in a category (eg total sales per product) while keeping individual rows intact. This query calculates each employee’s commission and shows the total and running (cumulative) commission per product over time:
SELECT
employee_name,
product_name,
sale_date,
units_sold * unit_price * commission_rate AS total_commission,
-- Sum of total commission per product
SUM(units_sold * unit_price * commission_rate) OVER (
PARTITION BY product_name
) AS total_product_commission,
-- Running total commission per product (ordered by date)
SUM(units_sold * unit_price * commission_rate) OVER (
PARTITION BY product_name
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_commission
FROM sales_commissions;
Here’s the output:
employee_name | product_name | sale_date | total_commission | total_product_commission | running_commission |
---|---|---|---|---|---|
Alice | Laptop | 2024-01-02 | 300.0000 | 1200.0000 | 300.0000 |
Bob | Laptop | 2024-01-02 | 400.0000 | 1200.0000 | 700.0000 |
David | Laptop | 2024-01-03 | 500.0000 | 1200.0000 | 1200.0000 |
Alice | Monitor | 2024-01-02 | 125.0000 | 524.0000 | 125.0000 |
Charlie | Monitor | 2024-01-03 | 126.0000 | 524.0000 | 251.0000 |
Eva | Monitor | 2024-01-03 | 126.0000 | 524.0000 | 377.0000 |
Frank | Monitor | 2024-01-04 | 147.0000 | 524.0000 | 524.0000 |
Bob | Phone | 2024-01-01 | 320.0000 | 1200.0000 | 320.0000 |
Alice | Phone | 2024-01-01 | 400.0000 | 1200.0000 | 720.0000 |
Charlie | Phone | 2024-01-02 | 480.0000 | 1200.0000 | 1200.0000 |
AVG()
The AVG()
window function calculates the average (mean) of a set of values over a specific group of rows called a “window.” Instead of computing the average for the whole table, it gives you an average that changes row by row based on nearby rows in the window. This is useful for things like moving averages or spotting trends over time. This query computes each employee’s commission and shows the overall average and moving (cumulative) average commission per product over time based on the sale date:
SELECT
employee_name,
product_name,
sale_date,
units_sold * unit_price * commission_rate AS total_commission,
-- Overall average commission for the product
AVG(units_sold * unit_price * commission_rate) OVER (
PARTITION BY product_name
) AS average_commission,
-- Moving average commission per product, ordered by sale date
AVG(units_sold * unit_price * commission_rate) OVER (
PARTITION BY product_name
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS moving_average_commission
FROM sales_commissions;
This is the output:
employee_name | product_name | sale_date | total_commission | average_commission | moving_average_commission |
---|---|---|---|---|---|
Alice | Laptop | 2024-01-02 | 300.0000 | 400.0000000000000000 | 300.0000000000000000 |
Bob | Laptop | 2024-01-02 | 400.0000 | 400.0000000000000000 | 350.0000000000000000 |
David | Laptop | 2024-01-03 | 500.0000 | 400.0000000000000000 | 400.0000000000000000 |
Alice | Monitor | 2024-01-02 | 125.0000 | 131.0000000000000000 | 125.0000000000000000 |
Charlie | Monitor | 2024-01-03 | 126.0000 | 131.0000000000000000 | 125.5000000000000000 |
Eva | Monitor | 2024-01-03 | 126.0000 | 131.0000000000000000 | 125.6666666666666667 |
Frank | Monitor | 2024-01-04 | 147.0000 | 131.0000000000000000 | 131.0000000000000000 |
Bob | Phone | 2024-01-01 | 320.0000 | 400.0000000000000000 | 320.0000000000000000 |
Alice | Phone | 2024-01-01 | 400.0000 | 400.0000000000000000 | 360.0000000000000000 |
Charlie | Phone | 2024-01-02 | 480.0000 | 400.0000000000000000 | 400.0000000000000000 |
COUNT()
COUNT()
counts how many rows fall within a defined window frame. It can be used for cumulative and group-wise counting. This query shows each employee’s sale and calculates the total number of sales for each product and a running (cumulative) count of sales per product over time based on sale_date
:
SELECT
employee_name,
product_name,
sale_date,
-- Group-wise count: total sales records for this product
COUNT(*) OVER (
PARTITION BY product_name
) AS total_sales_per_product,
-- Cumulative count: running count of sales records per product by sale_date
COUNT(*) OVER (
PARTITION BY product_name
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_sales_count
FROM sales_commissions;
The output from this query is as follows:
employee_name | product_name | sale_date | total_sales_per_product | cumulative_sales_count |
---|---|---|---|---|
Alice | Laptop | 2024-01-02 | 3 | 1 |
Bob | Laptop | 2024-01-02 | 3 | 2 |
David | Laptop | 2024-01-03 | 3 | 3 |
Alice | Monitor | 2024-01-02 | 4 | 1 |
Charlie | Monitor | 2024-01-03 | 4 | 2 |
Eva | Monitor | 2024-01-03 | 4 | 3 |
Frank | Monitor | 2024-01-04 | 4 | 4 |
Bob | Phone | 2024-01-01 | 3 | 1 |
Alice | Phone | 2024-01-01 | 3 | 2 |
Charlie | Phone | 2024-01-02 | 3 | 3 |
Value Window Functions in Action
Value window functions are focused on fetching data from neighboring rows within a window frame and can compare the current row to past or future data. These functions are particularly useful in time-series analysis, trend detection, and change tracking. Let’s explore different value window functions through the following package_tracking
table example:
tracking_id | package_id | checkpoint | city | status_time |
---|---|---|---|---|
1 | 101 | Picked Up | New York | 2024-06-01 09:00:00 |
2 | 101 | Distribution Hub | Newark | 2024-06-01 12:30:00 |
3 | 101 | In Transit | Harrisburg | 2024-06-01 18:00:00 |
4 | 101 | Out for Delivery | Baltimore | 2024-06-02 08:15:00 |
5 | 101 | Delivered | Baltimore | 2024-06-02 10:00:00 |
6 | 102 | Picked Up | San Francisco | 2024-06-03 08:00:00 |
7 | 102 | Distribution Hub | Oakland | 2024-06-03 10:45:00 |
8 | 102 | In Transit | Palo Alto | 2024-06-03 15:30:00 |
9 | 102 | Out for Delivery | San Jose | 2024-06-04 09:00:00 |
10 | 102 | Delivered | San Jose | 2024-06-04 11:30:00 |
LAG()
LAG()
returns the value of a column from a previous row based on the ORDER BY
clause. You can specify how many rows back to look, and the default is 1. This SQL query analyzes how much time passed between each checkpoint update for every package in the package_tracking
table—that is, it calculates the delay or duration between checkpoints:
SELECT
package_id,
checkpoint,
city,
status_time,
LAG(status_time) OVER (PARTITION BY package_id ORDER BY status_time) AS previous_time,
EXTRACT(EPOCH FROM (status_time - LAG(status_time) OVER (
PARTITION BY package_id ORDER BY status_time))) / 3600 AS hours_since_last
FROM package_tracking;
The first row returns “null” because there’s no previous row to look back:
package_id | checkpoint | city | status_time | previous_time | hours_since_last |
---|---|---|---|---|---|
101 | Picked Up | New York | 2024-06-01 09:00:00 | null | null |
101 | Distribution Hub | Newark | 2024-06-01 12:30:00 | 2024-06-01 09:00:00 | 3.5000000000000000 |
101 | In Transit | Harrisburg | 2024-06-01 18:00:00 | 2024-06-01 12:30:00 | 5.5000000000000000 |
101 | Out for Delivery | Baltimore | 2024-06-02 08:15:00 | 2024-06-01 18:00:00 | 14.2500000000000000 |
101 | Delivered | Baltimore | 2024-06-02 10:00:00 | 2024-06-02 08:15:00 | 1.7500000000000000 |
102 | Picked Up | San Francisco | 2024-06-03 08:00:00 | null | null |
102 | Distribution Hub | Oakland | 2024-06-03 10:45:00 | 2024-06-03 08:00:00 | 2.7500000000000000 |
102 | In Transit | Palo Alto | 2024-06-03 15:30:00 | 2024-06-03 10:45:00 | 4.7500000000000000 |
102 | Out for Delivery | San Jose | 2024-06-04 09:00:00 | 2024-06-03 15:30:00 | 17.5000000000000000 |
102 | Delivered | San Jose | 2024-06-04 11:30:00 | 2024-06-04 09:00:00 | 2.5000000000000000 |
LEAD()
LEAD()
returns the value of a column from the next row. Like LAG()
, you can define how far ahead to look. You can expose this info in your app or dashboard and show the next expected checkpoint, which is useful for live tracking systems or ETAs.
This query shows the most recent checkpoint and what’s expected next:
SELECT
package_id,
city,
checkpoint AS current_checkpoint,
status_time,
LEAD(checkpoint) OVER (PARTITION BY package_id ORDER BY status_time) AS next_checkpoint
FROM package_tracking
WHERE package_id = 101
ORDER BY status_time DESC
LIMIT 1;
Here’s the output:
package_id | city | current_checkpoint | status_time | next_checkpoint |
---|---|---|---|---|
101 | Baltimore | Delivered | 2024-06-02 10:00:00 | null |
FIRST_VALUE()
The FIRST_VALUE()
function returns the value of the first row in the window frame. This query retrieves the dispatch time for each package and the earliest recorded status_time
per package_id
:
SELECT
package_id,
checkpoint,
city,
status_time,
-- Show origin dispatch time for the package
FIRST_VALUE(status_time) OVER (
PARTITION BY package_id
ORDER BY status_time
) AS dispatched_on
FROM package_tracking;
This is the output, which allows you to compare all other checkpoints against when the package was first sent:
package_id | checkpoint | city | status_time | dispatched_on |
---|---|---|---|---|
101 | Picked Up | New York | 2024-06-01 09:00:00 | 2024-06-01 09:00:00 |
101 | Distribution Hub | Newark | 2024-06-01 12:30:00 | 2024-06-01 09:00:00 |
101 | In Transit | Harrisburg | 2024-06-01 18:00:00 | 2024-06-01 09:00:00 |
101 | Out for Delivery | Baltimore | 2024-06-02 08:15:00 | 2024-06-01 09:00:00 |
101 | Delivered | Baltimore | 2024-06-02 10:00:00 | 2024-06-01 09:00:00 |
102 | Picked Up | San Francisco | 2024-06-03 08:00:00 | 2024-06-03 08:00:00 |
102 | Distribution Hub | Oakland | 2024-06-03 10:45:00 | 2024-06-03 08:00:00 |
102 | In Transit | Palo Alto | 2024-06-03 15:30:00 | 2024-06-03 08:00:00 |
102 | Out for Delivery | San Jose | 2024-06-04 09:00:00 | 2024-06-03 08:00:00 |
102 | Delivered | San Jose | 2024-06-04 11:30:00 | 2024-06-03 08:00:00 |
LAST_VALUE()
LAST_VALUE()
gives you the value from the last row in your window frame. By default, the window goes up only to the current row (not beyond it). So unless you extend the frame (using something like ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
), it might return the current row’s value, not the true last row of the whole group. This query checks whether a package has reached its final destination city (not the delivery status) by comparing the current checkpoint’s city with the last city in the delivery path:
SELECT
package_id,
checkpoint,
city,
status_time,
-- Get the final destination city (same for all rows in that package)
LAST_VALUE(city) OVER (
PARTITION BY package_id
ORDER BY status_time
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS final_destination_city,
-- Compare current city to final destination
CASE
WHEN city = LAST_VALUE(city) OVER (
PARTITION BY package_id
ORDER BY status_time
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) THEN 'Yes'
ELSE 'No'
END AS is_final_destination_reached
FROM package_tracking;
This is the output:
package_id | checkpoint | city | status_time | final_destination_city | is_final_destination_reached |
---|---|---|---|---|---|
101 | Picked Up | New York | 2024-06-01 09:00:00 | Baltimore | No |
101 | Distribution Hub | Newark | 2024-06-01 12:30:00 | Baltimore | No |
101 | In Transit | Harrisburg | 2024-06-01 18:00:00 | Baltimore | No |
101 | Out for Delivery | Baltimore | 2024-06-02 08:15:00 | Baltimore | Yes |
101 | Delivered | Baltimore | 2024-06-02 10:00:00 | Baltimore | Yes |
102 | Picked Up | San Francisco | 2024-06-03 08:00:00 | San Jose | No |
102 | Distribution Hub | Oakland | 2024-06-03 10:45:00 | San Jose | No |
102 | In Transit | Palo Alto | 2024-06-03 15:30:00 | San Jose | No |
102 | Out for Delivery | San Jose | 2024-06-04 09:00:00 | San Jose | Yes |
102 | Delivered | San Jose | 2024-06-04 11:30:00 | San Jose | Yes |
Remember, the clause ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
is essential when using LAST_VALUE()
in window functions because without it, LAST_VALUE()
returns the current row’s value and not the true last value in the partition due to the default frame ending at the current row.
Common Pitfalls and Troubleshooting with Window Functions
Window functions are powerful, but small mistakes can lead to confusing results or slow performance. Here are ways to avoid common issues:
-
Unstable ordering: If
ORDER BY
values are tied, the row order becomes unpredictable. Fix this by adding a secondary column:ORDER BY score DESC, student_id
.- Unexpected ranks: If two rows have the same value,
RANK()
gives them the same rank but skips the next number (creating a gap).DENSE_RANK()
also gives the same rank but doesn’t skip any numbers. So tied values affect how ranks are counted. - Performance issues: Using complex window ranges or grouping by columns with too many unique values (like IDs) can make queries slow. Instead, use simpler window frames and group by useful categories like departments or regions.
-
Null behavior: By default, Postgres treats null values as the smallest when sorting. You can use
NULLS LAST
if you want them at the end.LAG()
andLEAD()
also return “null” when there’s no previous or next row. Use a default value likeLAG(sales, 1, 0)
to avoid this.
- Unexpected ranks: If two rows have the same value,
Tip: Debug by starting with a minimal query and adding complexity step by step. It helps reveal how window frames and partitions are applied.
Conclusion
Postgres window functions let you perform advanced analytics—like running totals, rankings, and comparisons—without losing row-level detail. They are more flexible and readable than complex subqueries or joins. We saw real-world examples like sales analysis, student rankings, and time-based trends using different window functions.