What is data x and data y in the context of the original question?
Usually FROM is followed by the table name. But, then why are 2 tables
listed when the question refers to only one?
In the original question, data x, data y
joins the table to itself, creating a cartesian product. The original table had 7 rows, and by joining every row against every other row, the resulting product is 49 rows.
Also, I didn't understand this part: HAVING
SUM(SIGN(1-SIGN(y.val-x.val)))
.
Essentially, this function determines for every value how many values are less than the one being examined. It then compares this total to half the count + 1... and then selects that value as the median.
It does this by subtracting the value (x.val
) from the value it is comparing (y.val
). It then uses the SIGN
function to convert the result to -1
, 0
, or 1
. It then subtracts this value, and then takes the SIGN
again. So if the y.val
value is less than the x.val
value it that is being compared to, the end result would be a 1
. For example, let's say y
is 3
, and x
is 5
.
3 - 5 = -2
SIGN(-2) = -1
1 - (-1) = 2
SIGN(2) = 1
If y
were 5
, and x
was 3
... the end result would be 0
:
5 - 3 = 2
SIGN(2) = 1
1 - 1 = 0
SIGN(0) = 0
Summing the results of these comparisons gives us a number that indicates how many values come before the value that we're examining. It then compares this SUM
against COUNT(*) + 1 / 2
to find the middle range...