0

I have tried all possibilities in working out the Median value on a specific column but keep on getting a different value from Excel.

I am working on SQL2008R2

This is what I have tried so far :

SELECT  Entity,
                BillPeriod,
                Cost,
                PatientDays,
                0 AS Median
        FROM GroupCTE
        UNION 
        SELECT  Entity,
                BillPeriod,
                Cost,
                PatientDays,
                0 AS Median
        FROM RegionCTE
                    UNION 
        SELECT  Entity,
                BillPeriod,
                Cost,
                PatientDays,
                0 AS Median
        FROM InstitutionCTE
        Union
        SELECT  'Median' as Entity,
                BillPeriod,
                0 as Cost,
                0 as PatientDays,
                (((
                    (SELECT MAX(Cost) FROM
                        (SELECT TOP 50 PERCENT Cost FROM InstitutionCTE WHERE Cost <> 0 ORDER BY Cost ASC) AS BottomHalf)
                    +
                    (SELECT MIN(Cost) FROM
                        (SELECT TOP 50 PERCENT Cost FROM InstitutionCTE WHERE Cost <> 0 ORDER BY Cost DESC) AS TopHalf)
                )) / 2)  AS Median

Am I doing something terribly wrong here?

I tried this : Function to Calculate Median in Sql Server

Thank you

Community
  • 1
  • 1
PKirby
  • 843
  • 2
  • 13
  • 33
  • You could watch this, Itzik Ben-Gan gives some good examples how to calculate median: https://www.youtube.com/watch?v=goyWzAu-AA0 – Evaldas Buinauskas Mar 20 '15 at 08:30
  • This calculation doesn't match with the definition of Median. Whereas Excel Median function might be implementing the exact formula for median. So you see difference in the result. – wonderbell Mar 20 '15 at 08:43
  • (1) Provide an example of how the calculation is incorrect. (2) If `cost` is an integer, then you are doing integer division. – Gordon Linoff Mar 20 '15 at 11:00
  • @GordonLinoff, with the current data, I get a big difference when comparing to SQL results : SQL - 401970.095 Excel - 397361.495. Cost data type is float. – PKirby Mar 23 '15 at 06:15
  • Count the number of values bigger and less than those values and figure out which is correct. – Gordon Linoff Mar 24 '15 at 01:39

1 Answers1

0

So I managed to fix this issue. The problem was, for some reason, doing the Median calculation gave be different results than that of Excels Median function.

I created a TEMP table and imported the result of my UNION, then updated the Median column using my calculation :

INSERT INTO #Temp
            SELECT  Entity,
                    BillPeriod,
                    ((Cost) / (PatientDays)) * 100 AS Cost,
                    PatientDays,
                    0 AS Median
            FROM GroupCTE
            UNION 
            SELECT  Entity,
                    BillPeriod,
                    ((Cost) / (PatientDays)) * 100 AS Cost,
                    PatientDays,
                    0 AS Median
            FROM RegionCTE
            UNION 
            SELECT  Entity,
                    BillPeriod,
                    ((Cost) / (PatientDays)) * 100 AS Cost,
                    PatientDays,
                    0 AS Median
            FROM InstitutionCTE
            UNION       
            SELECT  'Median' as Entity,
                    BillPeriod,
                    0 AS Cost,
                    0 AS PatientDays,
                    0 AS Median


            FROM InstitutionCTE
            GROUP BY Entity, BillPeriod, PatientDays

            UPDATE #Temp
            SET Median =    ((
                        ((SELECT MAX(Cost) FROM
                            (SELECT TOP 50 PERCENT Cost FROM #Temp WHERE Cost <> 0 ORDER BY Cost ASC) AS BottomHalf)
                        +
                        (SELECT MIN(Cost) FROM
                            (SELECT TOP 50 PERCENT Cost FROM #Temp WHERE Cost <> 0 ORDER BY Cost DESC) AS TopHalf)
                    )) / 2)     


SELECT * FROM #Temp
DROP TABLE #Temp
PKirby
  • 843
  • 2
  • 13
  • 33