Group if COUNT(*) > X, otherwise don't group items

Question

I'm trying to return a list of map items from my DB grouping the items ONLY if the group count is greater than 4 otherwise I don't want the items to be grouped.

My project is built to return all entries within a set area and I'll be using grouping to break up that area into a grid. If each cell in the grid has too many results then I will show a group marker with the group count instead of a single entry marker.

My current query works to determine if there are grouped items by the COUNT(*)

SELECT *, COUNT(*) as groupCount, floor(longitude/0.0007) AS groupLong, floor(latitude/0.0007) AS groupLat 
FROM items 
WHERE longitude>=151.1 
    AND longitude<=151.2 
    AND latitude>=-33.9 
    AND latitude<=-33.8 
GROUP BY floor(longitude/0.0007), floor(latitude/0.0007)

What I would like to do is only group the items if the group count > 4 and items in a group <=4 will be returned as ungrouped items.

I know that I can use HAVING COUNT(*)>4 to only return the groups of 5 and over but what can I do to return the ungrouped items that would be in the groups of 4 and under?

I would be happy to do two queries to get the results but if there was a way of doing it in one then that would be great!

If you have the option of switching database engines, this would be a good use case for window functions. MySQL doesn't support them (that I can find), but PostgreSQL does. — Chris Bouchard, Jan 14 '15 at 01:51

Chris Bouchard · Accepted Answer · 2015-01-14T02:50:55.940

Here's a SQL Fiddle for my solution: http://sqlfiddle.com/#!8/e40ba/1

The idea is to first figure out what the groups will be, which is done in the subquery grouping. We then join these groups to the original table, but we use a left outer join so any value that isn't in a group will get have nulls for the grouping columns. Finally, we use AVG to come up with a representative coordinate for the group. For ungrouped values, this will be the coordinate itself, which is nice.

I would recommend against actually using this without doing some testing and benchmarking. That join is probably going to be terrible if items is large. I really just wanted to find a way to do this in a single query. As I said in my comment, the right way to do this would be to use window functions, but MySQL doesn't have those.

SELECT AVG(longitude) AS longitude
     , AVG(latitude) AS latitude
     , COUNT(*) AS count

FROM items
    LEFT OUTER JOIN
        ( SELECT COUNT(*) AS group_count
               , FLOOR(longitude/0.0007) AS group_longitude
               , FLOOR(latitude/0.0007) AS group_latitude
          FROM items
          -- Repeat the filter to avoid computing unnecessary groups
          WHERE longitude >= 151.1
            AND longitude <= 151.2
            AND latitude >= -33.9
            AND latitude <= -33.8
          GROUP BY group_longitude, group_latitude
          HAVING group_count > 4
        ) AS grouping
    -- Match each row up with its group
    ON FLOOR(longitude/0.0007) = group_longitude
    AND FLOOR(latitude/0.0007) = group_latitude

WHERE longitude >= 151.1
  AND longitude <= 151.2
  AND latitude >= -33.9
  AND latitude <= -33.8

GROUP BY COALESCE(group_longitude, id)
       , COALESCE(group_latitude, id)

Really really great idea. In my case, on millions of rows, grouping without condition is 80ms, with condition (this strategy) its about 160ms. So not super fast — atomkirk, Jan 25 '18 at 15:37

score 1 · Answer 2 · answered Jan 14 '15 at 01:38

1

I think you can use union the two subquery result to solve this problem. Perhaps you can use the subselect by divide the two part, first part got the COUNT(*)>4 condition result, the second got the other items.id then retrieve the all other items.

answered Jan 14 '15 at 01:38

Feng Lin

650
4
8

Group if COUNT(*) > X, otherwise don't group items

2 Answers2