Web Analytics Made Easy - Statcounter

Comprehensive Guide to SQL Window Functions

🔷What are Window Functions?

Window functions are a class of SQL functions. They perform calculations across a set of table rows related to the current row. Unlike aggregate functions (e.g., SUM, COUNT), window functions do not collapse rows into a single result. Instead, they calculate values for each row while maintaining row granularity.

A window is defined by a PARTITION BY clause. This clause is used to group rows. It also includes an optional ORDER BY clause. This optional clause defines the order of the rows.

Syntax :

function_name(column) OVER (
PARTITION BY partition_column
ORDER BY order_column
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)

🔷Key Components of Window Functions

  1. OVER Clause:
    • Defines the window for the function.
    • Example: OVER (PARTITION BY column_name ORDER BY column_name).
  2. Window Frame:
    • Specifies the subset of rows for computation.
    • Frame options:
      • ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Includes all rows from the start to the current row.
      • RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING: Includes the current row and all following rows.

🔷Types of Window Functions

  1. Aggregate Functions as Window Functions:
    • Functions like SUM, AVG, COUNT, MAX, and MIN can be applied as window functions.
    • Example:
      • SELECT employee_id, department_id, SUM(salary) OVER (PARTITION BY department_id) AS total_salary FROM employees;
  2. Ranking Functions:
    • Used for assigning ranks to rows:
      • ROW_NUMBER(): Assigns a unique sequential number.
      • RANK(): Assigns ranks with gaps for duplicates.
      • DENSE_RANK(): Similar to RANK but without gaps.
      • NTILE(n): Divides rows into n buckets.
  3. Value Functions:
    • Retrieve values from other rows:
      • LEAD(): Fetches a value from a subsequent row.
      • LAG(): Fetches a value from a preceding row.
      • FIRST_VALUE(): Gets the first value in the window frame.
      • LAST_VALUE(): Gets the last value in the window frame.
  4. Statistical Functions:
    • Calculate percentiles or cumulative distributions:
      • PERCENT_RANK(): Computes the relative rank as a percentage.
      • CUME_DIST(): Calculates the cumulative distribution.

🔷Applications of Window Functions

  1. Analytics and Reporting:
    • Running Totals and Cumulative Sums:
      • SELECT order_id, customer_id, SUM(order_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS cumulative_sum FROM orders;
    • Moving Averages:
      • SELECT order_id, AVG(order_amount) OVER (ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg FROM orders;
  2. Rankings:
    • Assign ranks to sales representatives based on their performance:
      • SELECT sales_rep, RANK() OVER (PARTITION BY region ORDER BY sales DESC) AS rank FROM sales_data;
  3. Comparisons Across Rows:
    • Calculate the difference in salary between consecutive employees:
      • SELECT employee_id, salary, LAG(salary) OVER (PARTITION BY department_id ORDER BY salary) AS previous_salary FROM employees;
  4. Aggregations Without Collapsing Rows:
    • Determine total department salaries without collapsing individual employee records:
      • SELECT employee_id, department_id, SUM(salary) OVER (PARTITION BY department_id) AS dept_total_salary FROM employees;
  5. Highlighting Extremes:
    • Identify the highest-paid employee in each department:
      • SELECT employee_id, department_id, FIRST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS top_salary FROM employees;

🔷 Common SQL Window Functions

FunctionDescription
ROW_NUMBER()Assigns a unique sequential number
RANK()Ranks with gaps for ties
DENSE_RANK()Ranks without gaps for ties
NTILE(n)Divides rows into n equal buckets
LAG()Accesses value from previous row
LEAD()Accesses value from next row
FIRST_VALUE()Gets first value in the window
LAST_VALUE()Gets last value in the window
SUM()Running total
AVG()Moving average

🛠️ Example Dataset: Sales

SaleIDEmployeeRegionSaleAmountSaleDate
1AliceEast2002024-01-01
2BobEast4002024-01-02
3AliceEast1002024-01-03
4JohnWest5002024-01-02
5AliceEast3002024-01-04

🔹 1. ROW_NUMBER() – Unique Ranking

SELECT Employee, SaleAmount,
ROW_NUMBER() OVER (PARTITION BY Employee ORDER BY SaleDate) AS RowNum
FROM Sales;

Use case: Get latest N transactions per employee.

🔹 2. RANK() and DENSE_RANK() – Ranking with Ties

SELECT Employee, SaleAmount,
RANK() OVER (PARTITION BY Region ORDER BY SaleAmount DESC) AS Rank,
DENSE_RANK() OVER (PARTITION BY Region ORDER BY SaleAmount DESC) AS DenseRank
FROM Sales;

Use case: Sales competition results per region.

🔹 3. NTILE(n) – Divide Rows into Buckets

SELECT Employee, SaleAmount,
NTILE(3) OVER (ORDER BY SaleAmount) AS Bucket
FROM Sales;

Use case: Divide sales into performance tiers (low, mid, high).

🔹 4. LAG() and LEAD() – Access Adjacent Rows

SELECT Employee, SaleAmount,
LAG(SaleAmount, 1) OVER (PARTITION BY Employee ORDER BY SaleDate) AS PrevSale,
LEAD(SaleAmount, 1) OVER (PARTITION BY Employee ORDER BY SaleDate) AS NextSale
FROM Sales;

Use case: Sales trend analysis (up or down).

🔹 5. FIRST_VALUE() and LAST_VALUE()

SELECT Employee, SaleAmount, SaleDate,
FIRST_VALUE(SaleAmount) OVER (PARTITION BY Employee ORDER BY SaleDate) AS FirstSale,
LAST_VALUE(SaleAmount) OVER (
PARTITION BY Employee ORDER BY SaleDate
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS LastSale
FROM Sales;

Use case: Find when and how much was the first and last sale per employee.

🔹 6. Aggregation as Window Function (Running Total)

SELECT Employee, SaleDate, SaleAmount,
SUM(SaleAmount) OVER (PARTITION BY Employee ORDER BY SaleDate) AS RunningTotal
FROM Sales;

Use case: Track cumulative sales over time.

🔹Advantages of Window Functions

  • Preserve row-level details while performing advanced analytics.
  • Eliminate the need for self-joins or subqueries in many scenarios.
  • Offer flexible partitioning and ordering to handle complex use cases.

🧠 Best Practices

  • Always define ORDER BY for predictable results.
  • Use PARTITION BY to scope your calculation to a group (like per employee).
  • Use frame clauses (ROWS BETWEEN) to control the window size in moving averages.

⚠️ Performance Tips

  • Window functions can be resource intensive on large datasets.
  • Use indexed columns in PARTITION BY and ORDER BY for performance.
  • Avoid using window functions inside subqueries without filtering them.

📚 Real-World Use Cases

  1. Leaderboard ReportsRANK(), DENSE_RANK()
  2. Stock/Inventory TrendLAG(), LEAD()
  3. Sales ProjectionsSUM(), FIRST_VALUE(), AVG()
  4. Churn Prediction → Time-difference using LAG() and user behavior
  5. Customer Lifetime Value (CLTV) → Rolling aggregates per customer

🔚 Summary

SQL Window Functions are essential for advanced analytics, reporting, and business logic without collapsing data into groups. They offer powerful and readable syntax for handling ranking, cumulative totals, and value comparisons across rows.


Discover more from Technology with Vivek Johari

Subscribe to get the latest posts sent to your email.

Leave a Reply

Scroll to Top

Discover more from Technology with Vivek Johari

Subscribe now to keep reading and get access to the full archive.

Continue reading