T-SQL is mean, there is no median

Published Friday, July 24, 2009 3:57 PM

SQL-Server and I has had a bit of a fallout. I don't talk to it and it doesn't talk to me. A shame really since we used to be very close back in 2001. Today we once again stood face to face. I said: "Oh, you look good. You have changed, and I like it. It looks nice on you." The reason for our meeting was that a colleague of mine asked a very straightforward question today: "Is there any T-SQL function for calculating the median value?". The short answer is: "No". The longer answer is "No, but there really should be should it not? Why is that?".

Some discourse ensued and consensus was that in order to calculate the median the data has to be iterated in not only a read-only forward manor, like for calculating the mean and therefore median has been excluded from the otherwise extensive list of functions available in T-SQL. Also there is the point of a median for all the results in the table or per grouping of some sort.

This is in no way a new problem and Google turned up these lines of code:

SELECT Identity(int, 1,1) AS Id, InvoiceTotal
INTO #Temp
FROM Orders
ORDER BY InvoiceTotal

SELECT @Median = InvoiceTotal FROM #Temp WHERE Id = ((SELECT Count(*) FROM #Temp) /2)

DROP TABLE #Temp

RETURN @Median

(Thanks to alias BobBarker)

This seems to be a simple and understandable solution. The problem is that this does not actually solve the problem as it only works for lists with even number of rows and that won't do.

Itzik Ben-Gan solves the problem beautifully though in this article. You simply make use of TOP and percent:

SELECT(
(SELECT MAX(Value) FROM
(SELECT TOP 50 PERCENT Value FROM dbo.VOrders ORDER BY Value) AS H1)
+
(SELECT MIN(Value) FROM
(SELECT TOP 50 PERCENT Value FROM dbo.VOrders ORDER BY Value DESC) AS H2)
) / 2 AS Median;

Now if you have access to SQL server 2005 or 2008 you can use an even nicer solution, making use of such things as PARTITION and Common Table Expressions, but by then we remembered why we do not talk to each other anymore (SQL server and me) and I hit Ctrl+F4.

There is a nice code snippet that works just great though and maybe someday I will make use of it. Until then "we'll always have Paris".

Filed under: ,

Comments

# Henrik Lukkarila said on Tuesday, July 28, 2009 1:55 PM

Have you thought about performance? I'm not sure but I've a feeling the second solution is the faster once since the other has a insert that takes longer to execute. But I'm not sure.

# Mikael Sand said on Wednesday, July 29, 2009 2:28 PM

Actually, I have not paid ANY kind of attention or thought about preformance, since it was not about that at all. Comparing the two examples I would not even use the first one.

But I think you are right.

# Mikael Sand - Trying to be different said on Thursday, June 03, 2010 3:46 PM

Today I started reading my new book but I also wanted to find out a bit more about Hyper V , just to

Leave a Comment

(required) 
(required) 
(optional)
(required)