0

Problem is to calculate the median of values that are distributed in a table. I have multiple values in a table column against the same uids, Where I need to find the median of those values for each uid. In order to do that, first I concatenated the values and then trying to find the median among the grouped values against each uid. After group concatenation, I have got the values into a comma separated list.

Now, I need to find the median from the comma separated values given in the following table in column "text_responded_in_hour".

uId |text_responded_in_hour 
----|-----------------------
176 |70,660,70
177 |102      
194 |102,102  

I have tried Google but didn't able to find any related solution.

Nomiluks
  • 1,783
  • 5
  • 24
  • 49
  • 2
    Find the median BEFORE you do group_concat. – jarlh Apr 26 '17 at 09:02
  • Actually, I have grouped the values that are associated with the relevant ids. After making a group list I just decided to find median. – Nomiluks Apr 26 '17 at 09:06
  • @Nomi as jarlh wrote: find the median before you concat the values - this is pretty much the only viable way in sql. Otherwise, do the calculation in the application logic, not in sql. – Shadow Apr 26 '17 at 09:09
  • There are several posts on SE regarding Median, e.g. http://stackoverflow.com/a/27273310 (but a group_concat version is probably not performing well) – dnoeth Apr 26 '17 at 09:39
  • @shadow I cannot calculate the median before group concatenation. I have multiple values on the same uid, Where I need to find the median among those values for each uid. In order to do that, first I concatenated the values and then finding the median among the grouped values against each uid. – Nomiluks Apr 28 '17 at 07:10
  • Then redesign your data structure. – Shadow Apr 28 '17 at 08:21
  • I have used this code " SUBSTRING_INDEX( SUBSTRING_INDEX( GROUP_CONCAT(field ORDER BY field), ',', (( ROUND( LENGTH(GROUP_CONCAT(field)) - LENGTH( REPLACE( GROUP_CONCAT(field), ',', '' ) ) ) / 2) + 1 )), ',', -1 )" it is working... – Nomiluks Apr 28 '17 at 09:03

3 Answers3

3

Doing this isn't really viable. Comma separated lists are generally a very bad idea in a database due to problems like this.

It is potentially possible, if not quick or flexible.

The following does it by generating a list of numbers from 1 to 1000 (or 1 more than the number of commas). It only works if the max number of comma separated values is less than 1000. Can easily be expanded to cope with larger numbers but will become even less efficient as it is expanded.

It then uses that range of numbers to extract a value from the comma separated list.

Once that is done the AVG function can be used.

SELECT uID, 
        AVG(individual_responded_in_hour)
FROM
(
    SELECT uID,
            CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(text_responded_in_hour, ',', (hundreds.aCnt * 100 + tens.aCnt * 10 + units.aCnt + 1)), ',', -1) AS SIGNED) AS individual_responded_in_hour
    FROM some_table
    CROSS JOIN (SELECT 1 AS aCnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10) units
    CROSS JOIN (SELECT 1 AS aCnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10) tens
    CROSS JOIN (SELECT 1 AS aCnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10) hundreds
    WHERE (hundreds.aCnt * 100 + tens.aCnt * 10 + units.aCnt) <= (LENGTH(text_responded_in_hour) - LENGTH((REPLACE(text_responded_in_hour, ',', '')))
) sub0
GROUP BY uID

It would be possible to do this using a custom written MySQL function which might be more efficient.

But either way I suggest that the complexity of doing this is more of a good reason to not store the values in a comma separated list.

Kickstart
  • 21,106
  • 2
  • 17
  • 32
0

You really should use programming language, like python, to do that. You can't do that easily with MySQL, you could do it with postgres or MSSQL or any other dbms for that matter. In MySQL you can hack it if you know how many values there are, by using json functions. But this looks wrong (applies to MySQL 5.7.9+):

insert into yourtable
select uId as id, concat('[', text_responded_in_hour, ']') as jsoncol
from startingtable;

select ID, AVG(val)
(
select id, jsoncol->'$[0]' as val from yourtable
union all select id, jsoncol->'$[1]' from yourtable
union all select id, jsoncol->'$[2]' from yourtable
-- as many times as needed
) as a
group by ID
Adam Owczarczyk
  • 2,350
  • 11
  • 20
0

The following code for calculating medians can work for even and odd number of values. This piece of code served the purpose :)

SELECT  
uId,    
date,    
(SUBSTRING_INDEX(
        SUBSTRING_INDEX(
            GROUP_CONCAT(responded_text_time_in_hour ORDER BY responded_text_time_in_hour),
            ',',
            ((
                ROUND(
                    LENGTH(GROUP_CONCAT(responded_text_time_in_hour)) - 
                    LENGTH(
                        REPLACE(
                            GROUP_CONCAT(responded_text_time_in_hour),
                            ',',
                            ''
                        )
                    )
                ) / 2) + 1
            )),
            ',',
            -1
        ) +
        SUBSTRING_INDEX( 
            SUBSTRING_INDEX( 
                GROUP_CONCAT(responded_text_time_in_hour ORDER BY responded_text_time_in_hour), ',', (COUNT(*)/2) )
                , ',', -1))/2 as median

FROM outTable
WHERE
    (responded_text_time_in_hour>0 AND responded_text_time_in_hour <=3600)
GROUP BY 1,2
Nomiluks
  • 1,783
  • 5
  • 24
  • 49