-2

I need to group row's in one row if they have same id or same attribute. So i suppose i need to use INNER JOIN and GROUP_CONCAT, but i don't know how. Problem is that if two users do not have a common attribute, but fall into the same group with the same third user, all three must be combined into one group. Also i don't have group_id column in table.

group_id, user_id, group_attributes
1, 1, "red, green, yellow, grey, purple, coffeemaker"
1, 2, "red, green, yellow, grey, purple, coffeemaker"
1, 3, "red, green, yellow, grey, purple, coffeemaker"
1, 4, "red, green, yellow, grey, purple, coffeemaker"
1, 5, "red, green, yellow, grey, purple, coffeemaker"
2, 6, "coffee, milk, croissant"
2, 7, "coffee, milk, croissant"
2, 8, "coffee, milk, croissant"

Raw data to reduce your answer time.

CREATE TABLE task (
    user_id INT(10) NOT NULL,
    attribute VARCHAR(50) NULL DEFAULT NULL);
INSERT INTO task (user_id, attribute)
VALUES
(1, 'red'),
(1, 'green'),
(2, 'green'),
(2, 'yellow'),
(3, 'grey'),
(3, 'coffeemaker'),
(4, 'grey'),
(4, 'purple'),
(5, 'purple'),
(5, 'red'),
(6, 'black'),
(7, 'black'),
(7, 'milk'),
(8, 'milk'),
(8, 'croissant');

1 Answers1

0

This is a graph-walking problem, so a simple JOIN is not sufficient. One approach is to get all attributes associates with a given attribute. The following recursive CTE does this:

with recursive aa as (
      select distinct t1.attribute as at1, t2.attribute as at2
      from task t1 join
           task t2
           on t1.user_id = t2.user_id
     ),
     cte as (
      select at1, at2, at1 as found, 1 as lev
      from aa
      union all
      select cte.at1, aa.at2, concat_ws(',', found, aa.at2), lev + 1
      from cte join
           aa
           on cte.at2 = aa.at1
      where find_in_set(aa.at2, found) = 0 
     )
select distinct at1, at2
from cte;

You can then use the same recursive CTE to combine the values into a string:

with recursive aa as (
      select distinct t1.attribute as at1, t2.attribute as at2
      from task t1 join
           task t2
           on t1.user_id = t2.user_id
     ),
     cte as (
      select at1, at2, at1 as found, 1 as lev
      from aa
      union all
      select cte.at1, aa.at2, concat_ws(',', found, aa.at2), lev + 1
      from cte join
           aa
           on cte.at2 = aa.at1
      where find_in_set(aa.at2, found) = 0 
     )
select dense_rank() over (order by pairs.all_attributes) as group_id, t.user_id, pairs.all_attributes
from (select at1, group_concat(at2) as all_attributes
      from cte
      group by at1
     ) pairs join
     (select user_id, min(attribute) as min_attribute
      from task
      group by user_id
     ) t
     on t.min_attribute = pairs.at1;

I see nothing wrong with this code. But db<>fiddle is insisting on creating a hex string for pairs. However, I think this will work on your database. Here is the fiddle.

Gordon Linoff
  • 1,122,135
  • 50
  • 484
  • 624