1

I was trying to implement a median from this solution (among others, but this seemed the simplest Median code): Function to Calculate Median in Sql Server

However, I'm having difficulty in its application. This is my current SQL query. My goal is to find the Median TotalTimeOnCall for CallerComplaintTypeID on a given Week, Month, and Department. I think my biggest issue is that I'm just fundamentally not understanding how to apply this Median function to achieve my results.

For example, if I needed an Average, instead, I could just change that ORDER BY to a GROUP BY and then slap an AVG(TotalTimeOnCall) instead. How do I accomplish this idea with this Median solution, instead?

This is the "raw data" query:

WITH rawData as (
SELECT 
    DepartmentName
    ,MONTH(PlacedOnLocal) AS MonthNumber
    ,CASE 
    WHEN Datepart(day, PlacedOnLocal) < 8 THEN '1' 
    WHEN Datepart(day, PlacedOnLocal) < 15 THEN '2' 
    WHEN Datepart(day, PlacedOnLocal) < 22 THEN '3' 
    WHEN Datepart(day, PlacedOnLocal) < 29 THEN '4' 
    ELSE '5' 
    END AS WeekNumber
    ,CallerComplaintTypeID
    ,TotalTimeOnCall
FROM [THE_RELEVANT_TABLE]
WHERE PlacedOnLocal BETWEEN '2014-09-01' AND '2014-12-31'
    AND CallerComplaintTypeID IN (5,89,9,31,203)
    AND TotalTimeOnCall IS NOT NULL
)
SELECT 
DepartmentName,
MonthNumber,
WeekNumber,
CallerComplaintTypeID,
TotalTimeOnCall
FROM
rawData
ORDER BY DepartmentName, MonthNumber, WeekNumber, CallerComplaintTypeID

with this sample output:

DepartmentName  MonthNumber WeekNumber  CallerComplaintTypeID   TotalTimeOnCall
Dept_01     9   1   5   654
Dept_01     9   1   5   156
Dept_01     9   1   5   21
Dept_01     9   1   5   67
Dept_01     9   1   5   13
Dept_01     9   1   5   97
Dept_01     9   1   5   87
Dept_01     9   1   5   16

this is the Median solution from above:

SELECT
(
    (
        SELECT MAX(TotalTimeOnCall)
        FROM
            (
                SELECT TOP 50 PERCENT TotalTimeOnCall
                FROM rawData
                WHERE TotalTimeOnCall IS NOT NULL
                ORDER BY TotalTimeOnCall
            ) AS BottomHalf
    )
    +
    (
        SELECT MIN(TotalTimeOnCall)
        FROM
            (
                SELECT TOP 50 PERCENT TotalTimeOnCall
                FROM rawData
                WHERE TotalTimeOnCall IS NOT NULL
                ORDER BY TotalTimeOnCall DESC
            ) AS TopHalf
    )
) / 2 AS Median
Community
  • 1
  • 1
Matt
  • 1,356
  • 7
  • 22
  • 37
  • 1
    That median approach does not support per-group medians. It can compute only whole-table medians. You could consider instead creating a user-defined `MEDIAN()` aggregate function. – John Bollinger Jan 21 '15 at 17:37
  • 1
    Alternatively, I suspect the `ROW_NUMBER()`-based alternative also presented as an answer to the question you linked could be adapted to grouped medians, but I'm not prepared to work out the details. – John Bollinger Jan 21 '15 at 17:39
  • @JohnBollinger Thanks, I was able to piece something together from that that works nicely. – Matt Jan 21 '15 at 21:42
  • http://sqlperformance.com/2014/02/t-sql-queries/grouped-median – Aaron Bertrand Jan 22 '15 at 05:05

1 Answers1

0

Here is a simple median solution that allows you to get a median per group.

-- Example of how to get median from a set of data
;with cte_my_query as (
    -- this cte simulates the query that would return your data
    select '2016-01-01' as dt, 1 as val
    union 
    select '2016-01-01' as dt, 10 as val
    union 
    select '2016-01-01' as dt, 7 as val
    union 
    select '2016-01-01' as dt, 16 as val
    union 
    select '2016-01-01' as dt, 11 as val
    union 
    select '2016-01-01' as dt, 2 as val
    union
    select '2016-01-01' as dt, 5 as val
    union 
    select '2016-01-02' as dt, 6 as val
    union 
    select '2016-01-02' as dt, 13 as val
    union 
    select '2016-01-02' as dt, 7 as val
    union   
    select '2016-01-02' as dt, 9 as val
    union   
    select '2016-01-02' as dt, 18 as val
)
,cte_dates as (
    -- get the distinct key we want to get median for
    select distinct dt from cte_my_query
)
select  dt, median.val
from    cte_dates
    cross apply (
        -- of the top 50% (below), take the top 1, desc, which is the median value
        select top 1 val from (
            -- for each date, get the top 50% of the values
            select top 50 percent val
            from cte_my_query
            where cte_dates.dt = cte_my_query.dt
            order by dt
        ) as inner_median
        order by inner_median.val desc
    ) median
Jim Clouse
  • 7,366
  • 5
  • 29
  • 25