1

I'm using postgres 9.5 and trying to calculate median and average price per unit with a GROUP BY id. Here is the query in DBFIDDLE

Here is the data

id   | price | units
-----+-------+--------
1    |  100  | 15
1    |  90   | 10
1    |  50   |  8
1    |  40   |  8
1    |  30   |  7
2    |  110  | 22
2    |  60   |  8
2    |  50   | 11

Using percentile_cont this is my query:

SELECT id,
  ceil(avg(price)) as avg_price,
  percentile_cont(0.5) within group (order by price) as median_price,
  ceil( sum (price) / sum (units) ) AS avg_pp_unit,
  ceil( percentile_cont(0.5) within group (order by price)  / 
        percentile_cont(0.5) within group (order by units) ) as median_pp_unit
FROM t
GROUP by id

This query returns:

id| avg_price | median_price | avg_pp_unit  | median_pp_unit 
--+-----------+--------------+--------------+---------------
1 |   62      |     50       |      6       |      7 
2 |   74      |     60       |      5       |      5

I'm pretty sure average calculation is correct. Is this the correct way to calculate median price per unit?

This post suggests this is correct (although performance is poor) but I'm curious if the division in the median calculation could skew the result.

Calculating median with PERCENTILE_CONT and grouping

moonshot
  • 509
  • 1
  • 3
  • 10

1 Answers1

2

The median is the value separating the higher half from the lower half of a data sample (a population or a probability distribution). For a data set, it may be thought of as the "middle" value. https://en.wikipedia.org/wiki/Median

So your median price is 55, and the median units is 9

        Sort by price                  Sort by units
  id    |   price   |  units |  | id    |  price  |   units  
 -------|-----------|--------|  |-------|---------|---------- 
      1 | 30        |      7 |  |     1 |      30 | 7        
      1 | 40        |      8 |  |     1 |      40 | 8        
      1 | 50        |      8 |  |     1 |      50 | 8        
 >>>  2 | 50        |     11 |  |     2 |      60 | 8    <<<<    
 >>>  2 | 60        |      8 |  |     1 |      90 | 10   <<<<
      1 | 90        |     10 |  |     2 |      50 | 11       
      1 | 100       |     15 |  |     1 |     100 | 15       
      2 | 110       |     22 |  |     2 |     110 | 22       
        |           |        |  |       |         |          
         (50+60)/2                               (8+10)/2 
          55                                        9        

I'm unsure what you intend for "median price per unit":

CREATE TABLE t(
   id    INTEGER  NOT NULL
  ,price INTEGER  NOT NULL
  ,units INTEGER  NOT NULL
);
INSERT INTO t(id,price,units) VALUES (1,30,7);
INSERT INTO t(id,price,units) VALUES (1,40,8);
INSERT INTO t(id,price,units) VALUES (1,50,8);
INSERT INTO t(id,price,units) VALUES (2,50,11);
INSERT INTO t(id,price,units) VALUES (2,60,8);
INSERT INTO t(id,price,units) VALUES (1,90,10);
INSERT INTO t(id,price,units) VALUES (1,100,15);
INSERT INTO t(id,price,units) VALUES (2,110,22);

SELECT
       percentile_cont(0.5) WITHIN GROUP (ORDER BY price) med_price
     , percentile_cont(0.5) WITHIN GROUP (ORDER BY units) med_units
FROM
  t;

     | med_price | med_units 
 ----|-----------|----------- 
   1 |        55 |         9 

If column "price" represents a "unit price" then you don't need to divide 55 by 9, but if "price" is an "order total" then you would need to divide by units: 55/9 = 6.11

Paul Maxwell
  • 26,417
  • 3
  • 25
  • 46