1

I am using SQL Server 2012 and I know it is quite simple to calculate moving averages. But what I need is to get the mode and the median for a defined window frame like so (with a window of 2 preceding to current row; month unique):

MONTH | CODE | MEDIAN | MODE
   1      0        0      0
   2      3        1.5    0
   3      2        2      0
   4      2        2      2
   5      2        2      2
   6      5        2      2
   7      3        3      2

If several values qualify as mode, than pick the first.

Jens
  • 152
  • 8
  • What does this mean? "with a window of 2 preceding to current row"? If the code you have posted is sample data, can you post desired results for that data? – Tab Alleman Apr 09 '15 at 15:49
  • The median and mode columns contain the desired result data. – Jens Apr 10 '15 at 07:05
  • Well in that case, do SELECT median, mode FROM MyTable and you will get your desired results. – Tab Alleman Apr 10 '15 at 13:24

2 Answers2

2

I commented my code thoroughly. Read my comments on my Mode calculations and let me know it needs tweaking. Overall, it's a relatively simple query. It just has a lot of ugly subqueries and it has a lot of comments. Check it out:

DECLARE @Table TABLE ([Month] INT,[Code] INT);
INSERT INTO @Table
    VALUES  (1,0),
            (2,3),
            (3,2),
            (4,2), --Try commenting this out to test my special mode thingymajig
            (5,2),
            (6,5),
            (7,3);

WITH CTE
AS
(
SELECT  ROW_NUMBER() OVER (ORDER BY [Month]) row_num,
        [Month],
        CAST(Code AS FLOAT) Code
FROM @Table
)

SELECT [Month],
        Code,
        ISNULL((
                SELECT  CASE
                            --When there is only one previous value at row_num = 2, find Mean of first two codes
                            WHEN A.row_num = 2 THEN (LAG(B.code,1) OVER (ORDER BY [Code]) + B.Code)/2.0
                            --Else find middle code value of current and previous two rows
                            ELSE B.Code
                        END
                FROM CTE B 
                --How subquery relates to outer query
                WHERE B.row_num BETWEEN A.row_num - 2 AND A.row_num 
                ORDER BY B.[Code] 
                --Order by code and offset by 1 so don't select the lowest value, but fetch the one above the lowest value
                OFFSET 1 ROW FETCH NEXT 1 ROW ONLY),
        0) AS Median,
        --I did mode a little different
            --Instead of Avg(D.Code) you could list the values because with mode, 
                --If there's a tie with more than one of each number, you have multiple modes
                --Instead of doing that, I simply return the mean of the tied modes
                    --When there's one, it doesn't change anything.
                        --If you were to delete the month 4, then your number of Codes 2 and number of Codes 3 would be the same in the last row.
                        --Proper mode would be 2,3. I instead average them out to be 2.5.
        ISNULL((
                SELECT AVG(D.Code)
                FROM (
                    SELECT  C.Code,
                            COUNT(*) cnt,
                            DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) dnse_rank
                    FROM CTE C
                    WHERE C.row_num <= A.row_num
                    GROUP BY C.Code
                    HAVING COUNT(*) > 1) D
                WHERE D.dnse_rank = 1),
        0) AS Mode
FROM CTE A

Results:

Month       Code                   Median                 Mode
----------- ---------------------- ---------------------- ----------------------
1           0                      0                      0
2           3                      1.5                    0
3           2                      2                      0
4           2                      2                      2
5           2                      2                      2
6           5                      2                      2
7           3                      3                      2
Stephan
  • 4,619
  • 1
  • 12
  • 22
0

If I understood your requirements correctly, your source table contains MONTH and CODE columns, and you want to calculate MEDIAN and MODE.

The query below calculates MEDIAN and MODE with moving window <= than 3 month ("2 preceding to current row") and returns the results matching your example.

-----------------------------------------------------
--Demo data
-----------------------------------------------------

CREATE TABLE #Data(
    [Month] INT NOT NULL,
    [Code] INT NOT NULL,
CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED 
(
    [Month] ASC
));

INSERT #Data
([Month],[Code])
VALUES
(1,0),
(2,3),
(3,2),
(4,2),
(5,2),
(6,5),
(7,3);

-----------------------------------------------------
--Query
-----------------------------------------------------

DECLARE @PrecedingRowsLimit INT = 2;

WITH [MPos] AS
(
    SELECT [R].[Month] 
        , [RB].[Month] AS [SubId]
        , [RB].[Code] 
        , ROW_NUMBER() OVER(PARTITION BY [R].[Month] ORDER BY [RB].[Code]) AS [RowNumberInPartition]
        , CASE 
            WHEN [R].[Count] % 2 = 1 THEN ([R].[Count] + 1) / 2
            ELSE NULL
            END AS [MedianPosition]
        , CASE 
            WHEN [R].[Count] % 2 = 0 THEN [R].[Count] / 2
            ELSE NULL
            END AS [MedianPosition1]
        , CASE 
            WHEN [R].[Count] % 2 = 0 THEN [R].[Count] / 2 + 1
            ELSE NULL
            END AS [MedianPosition2]
    FROM 
    (
        SELECT [RC].[Month] 
            , [RC].[RowNumber] 
            , CASE WHEN [RC].[Count] > @PrecedingRowsLimit + 1 THEN @PrecedingRowsLimit + 1 ELSE [RC].[Count] END AS [Count]
        FROM 
        (
            SELECT [Month] 
                , ROW_NUMBER() OVER(ORDER BY [Month]) AS [RowNumber]
                , ROW_NUMBER() OVER(ORDER BY [Month]) AS [Count]
            FROM #Data 
        ) [RC]
    ) [R]
        INNER JOIN #Data [RB]
        ON [R].[Month] >= [RB].[Month] 
            AND [RB].[Month] >= [R].[RowNumber] - @PrecedingRowsLimit
) 
SELECT DISTINCT [M].[Month] 
    , [ORIG].[Code] 
    , COALESCE([ME].[Code],([M1].[Code] + [M2].[Code]) / 2.0) AS [Median]
    , [MOD].[Mode] 
FROM [MPos] [M]
    LEFT JOIN [MPOS] [ME]
    ON [M].[Month] = [ME].[Month] 
        AND [M].[MedianPosition] = [ME].[RowNumberInPartition]
    LEFT JOIN [MPOS] [M1]
    ON [M].[Month] = [M1].[Month] 
        AND [M].[MedianPosition1] = [M1].[RowNumberInPartition]
    LEFT JOIN [MPOS] [M2]
    ON [M].[Month] = [M2].[Month] 
        AND [M].[MedianPosition2] = [M2].[RowNumberInPartition]
    INNER JOIN 
    (
        SELECT [MG].[Month] 
            , FIRST_VALUE([MG].[Code]) OVER (PARTITION BY [MG].[Month] ORDER BY [MG].[Count] DESC , [MG].[SubId] ASC) AS [Mode]
        FROM
        (
            SELECT [Month] , MIN([SubId]) AS [SubId], [Code]  , COUNT(1) AS [Count]
            FROM [MPOS] 
            GROUP BY [Month] , [Code] 
        ) [MG]
    ) [MOD]
    ON [M].[Month] = [MOD].[Month]
    INNER JOIN #Data [ORIG]
    ON [ORIG].[Month] = [M].[Month]
ORDER BY [M].[Month];
helix
  • 346
  • 2
  • 9
  • That is some SERIOUS code, but perhaps a little too complicated. Check out my code. I did all the calculations in two correlated subqueries. – Stephan Apr 11 '15 at 04:15
  • Yes, @Stephan, your code is way cleaner. Especially `FETCH NEXT 1 ROW ONLY` usage in median calculation is very neat. – helix Apr 11 '15 at 13:27
  • It was introduced in SQL 2012 and it's not the most useful. I'm just happy I finally got a chance to use it! :) – Stephan Apr 11 '15 at 16:10