Introduction Into Window Functions on SQL Server

Introduction Into Window Functions on SQL Server

·

5 min read

Why window functions may come in handy

Let us dive directly into a real world example. You’ve got a table with salaries of different job functions and you are asked to find out what the n-tiles (quantiles, mean etc.) are of each function within a company. So this means you’ve got many people that hold a position which you want to compare with others. You want to compare salaries of marketing professionals to their peers, but you don’t want to compare said salaries to marketing directors because they have their own salary bands. So that means you need to frame (also called partition) job titles with their own with which you calculate the n-tiles. Without window functions, this could be a quite fussy task.

In order to walk through the problem, create a new database, I’m going to name mine FoehnWind [^1]. The following script creates a new table and inserts some sample data of 20 rows into it.

Tutorial

USE [FoehnWind];
GO
CREATE TABLE dbo.Employee
(ID           TINYINT IDENTITY(1, 1) NOT NULL, 
 FirstName    VARCHAR(150) NOT NULL, 
 LastName     VARCHAR(150) NULL, 
 JobTitle     VARCHAR(100) NOT NULL, 
 AnnualSalary INT NOT NULL, 
 CONSTRAINT [PK_dbo.Employee] PRIMARY KEY CLUSTERED(ID ASC)
)
ON [PRIMARY];
GO
INSERT INTO [dbo].[Employee] 
VALUES
 ( 'Ken','Sanchez','Chief Executive Officer', 530000)
,( 'Jo','Berry','Marketing Assistant', 71875 )
,( 'Lori','Penor','Marketing Assistant', 73398 )
,( 'Pat','Coleman','Marketing Assistant', 51874 )
,( 'Stuart','Macrae','Marketing Assistant', 59108 )
,( 'Mary','Dempsey','Marketing Assistant', 57641 )
,( 'Wanida','Benshoof','Marketing Assistant', 60779 )
,( 'David','Bradley','Marketing Manager', 130000 )
,( 'Jill','Williams','Marketing Specialist', 108117 )
,( 'John','Wood','Marketing Specialist', 99407 )
,( 'Sariya','Pit','Marketing Specialist', 95048 )
,( 'Terry','Eminhizer','Marketing Specialist', 106316 )
,( 'Michael','Sullivan','Marketing Specialist', 87047 )
,( 'Ovidiu','Cracium','Marketing Specialist', 107883 )
,( 'Rob','Walters','Marketing Specialist', 109369 )
,( 'Garrett','Young','Senior Marketing Specialist', 118051 )
,( 'George','Li','Senior Marketing Specialist', 117253 )
,( 'Russell','King','Senior Marketing Specialist', 123362 )
,( 'Sandra','Alayo','Senior Marketing Specialist', 118185 )
,( 'Denise','Smith','Vice President Marketing', 250000 );
GO

Firstly, lets use a window function which calculates the averages and percentiles of salaries for each job title:

SELECT distinct e.JobTitle
        ,AVG(e.AnnualSalary) OVER (PARTITION BY e.JobTitle) AS AverageSalary
        ,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY e.AnnualSalary) 
            OVER (PARTITION BY e.JobTitle) as MedianContinous
        ,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY e.AnnualSalary) 
            OVER (PARTITION BY e.JobTitle) as MedianDisContinous
FROM dbo.Employee e
order by e.JobTitle

You perhaps now wondered how the functions PERCENTILE_CONT() and PERCENTILE_DISC() differ. The discontinous variant returns an actual value that exists in the calculated set and pointing out that this values is the closest one being to the calculative median. The continous percentile in contrast returns a value that is actually the calculated median. For example, Marketing Assistant Stuart Macrae has an Annual Salary of 59’108. This amount is also the median in this job role, but the calculated median is 59’943.5 so Stuart is slightly below the median but his salary is the closest one to the median.

JobTitleEmployeeNameAnnualSalaryAverageSalaryMedianContinousMedianDisContinous
Chief Executive OfficerKen Sanchez530000530000530000530000
Marketing AssistantPat Coleman518746244559943.559108
Marketing AssistantMary Dempsey576416244559943.559108
Marketing AssistantStuart Macrae591086244559943.559108
Marketing AssistantWanida Benshoof607796244559943.559108
Marketing AssistantJo Berry718756244559943.559108
Marketing AssistantLori Penor733986244559943.559108
Marketing ManagerDavid Bradley130000130000130000130000
Marketing SpecialistMichael Sullivan87047101883106316106316
Marketing SpecialistSariya Pit95048101883106316106316
Marketing SpecialistJohn Wood99407101883106316106316
Marketing SpecialistTerry Eminhizer106316101883106316106316
Marketing SpecialistOvidiu Cracium107883101883106316106316
Marketing SpecialistJill Williams108117101883106316106316
Marketing SpecialistRob Walters109369101883106316106316
Senior Marketing SpecialistGeorge Li117253119212118118118051
Senior Marketing SpecialistGarrett Young118051119212118118118051
Senior Marketing SpecialistSandra Alayo118185119212118118118051
Senior Marketing SpecialistRussell King123362119212118118118051
Vice President MarketingDenise Smith250000250000250000250000

Further reading

This was just an example of what window functions are handy for. More window functions are documented here. Ranked functions are used quite often to find out what happened over time, e.g. if a status of a product changed. If you want to dig even deeper into this topic, I recommend (no affiliation) the book T-SQL Window Functions: For data analysis and beyond (2nd Edition) (Developer Reference) by Itzik Ben-Gan. He's authored some excellent books with great knowledge.

[^1]: Fohen wind is a fairly common wind in the European Alps, which is a reference to the popular sample database Nordwind.

Cover Photo Credit: msandersmusic, on Pixabay