82

Here's my code:

select yr,count(*)
from movie
join casting on casting.movieid=movie.id
join actor on casting.actorid = actor.id
where actor.name = 'John Travolta'
group by yr;

Here's the question:

Which were the busiest years for 'John Travolta'. Show the number of movies he made for each year.

Here's the table structure:

movie(id, title, yr, score, votes, director)
actor(id, name)
casting(movieid, actorid, ord)

This is the output I am getting:

yr      count(*)
1976    1
1977    1
1978    1
1981    1
1994    1
-- etc.

I need to get the rows for which count(*) is max. How do I do this?

Erwin Brandstetter
  • 479,275
  • 111
  • 893
  • 1,042
Alex Gordon
  • 51,480
  • 273
  • 609
  • 976
  • 1
    The question fails to disclose RDBMS and version. The answer very much depends on it. – Erwin Brandstetter Jun 12 '16 at 23:17
  • The smartest (and probably fastest answer w/o a sub-query) is [here](https://stackoverflow.com/a/14249738/376483): Use `LIMIT 1` to avoid the subquery. – kaiser Nov 18 '19 at 21:44

11 Answers11

96

Use:

  SELECT m.yr, 
         COUNT(*) AS num_movies
    FROM MOVIE m
    JOIN CASTING c ON c.movieid = m.id
    JOIN ACTOR a ON a.id = c.actorid
                AND a.name = 'John Travolta'
GROUP BY m.yr
ORDER BY num_movies DESC, m.yr DESC

Ordering by num_movies DESC will put the highest values at the top of the resultset. If numerous years have the same count, the m.yr will place the most recent year at the top... until the next num_movies value changes.

Can I use a MAX(COUNT(*)) ?


No, you can not layer aggregate functions on top of one another in the same SELECT clause. The inner aggregate would have to be performed in a subquery. IE:

SELECT MAX(y.num)
  FROM (SELECT COUNT(*) AS num
          FROM TABLE x) y
OMG Ponies
  • 300,587
  • 73
  • 490
  • 482
  • 3
    Yes you can use MAX(COUNT(*)), but in Oracle. http://www.techonthenet.com/sql/max.php – Andrejs Oct 28 '15 at 15:59
  • 1
    @OMG Ponies - This answer was REALLY excellent - The first SQL you provided does work (obvs) BUT the second SQL you provided is so very elegant AND Enabled me to also gain a better understanding of subquery! THANK YOU for taking the trouble to give a fully expanded answer. I was trying to accomplish this - but with a Group By in there as well - and doing this made it quite possible! – kiltannen Mar 17 '17 at 22:15
  • 1
    Just in trying to understand this further - How would you use this second query to obtain the year which had the max number of movies? As it is right now it can identify how many movies happened in the year that had the most movies - but it will not give which year that was. I would quite like to understand how to pass a second value back from the subquery which is the Group By value correlated to the MAX result – kiltannen Mar 18 '17 at 00:15
  • this won't work after and including 5.7 – Yazgan Jan 11 '21 at 15:00
30

Just order by count(*) desc and you'll get the highest (if you combine it with limit 1)

Wolph
  • 69,888
  • 9
  • 125
  • 143
  • 7
    Now what if I had several rows with the max value and wanted to display all values that have a "max value" – Adam Apr 27 '15 at 15:33
  • @WhyCry: Not entirely sure what you're trying to ask this, but if you can't find the answer on Stackoverflow you should just ask it as a separate question :) – Wolph Apr 29 '15 at 09:54
  • @Wolph hes saying that you can have multiple rows with same max value, with limit 1 you cannot see that – urSus May 10 '18 at 22:37
  • Upon rereading that (3 year old) comment I think he's looking for `HAVING MAX(...) = ...` – Wolph May 11 '18 at 00:20
7
SELECT * from 
(
SELECT yr as YEAR, COUNT(title) as TCOUNT
FROM actor
JOIN casting ON actor.id = casting.actorid
JOIN movie ON casting.movieid = movie.id
WHERE name = 'John Travolta'
GROUP BY yr
order by TCOUNT desc
) res
where rownum < 2
vigour
  • 71
  • 1
  • 1
6

This question is old, but was referenced in a new question on dba.SE. I feel the best solutions haven't been provided. Plus, there are new, faster options.

Question in the title

Can I do a max(count(*)) in SQL?

Yes, you can achieve that by nesting an aggregate function in a window function:

SELECT m.yr, count(*) AS movie_count
     , max(count(*)) OVER () AS max_ct
FROM   casting c
JOIN   movie   m ON c.movieid = m.id
WHERE  c.actorid = (SELECT id FROM actor WHERE name = 'John Travolta')
GROUP  BY m.yr
ORDER  BY count(*) DESC;

db<>fiddle here

That's standard SQL. Postgres introduced it with version 8.4 (released 2009-07-01, before this question was asked. Other RDBMS should be capable of the same. Consider the sequence of events in a SELECT query:

Possible downside: window functions do not aggregate rows. You get all rows left after the aggregate step. Useful in some queries, but not ideal for this one.

To get one row with the highest count, you can use ORDER BY ct LIMIT 1:

SELECT c.yr, count(*) AS ct
FROM   actor   a
JOIN   casting c ON c.actorid = a.id
WHERE  a.name = 'John Travolta'
GROUP  BY c.yr
ORDER  BY ct DESC
LIMIT  1;

Using only basic SQL features, available in any halfway decent RDBMS - the LIMIT implementation varies:

Or you can get one row per group with the highest count with DISTINCT ON (only Postgres):

Actual Question

I need to get the rows for which count(*) is max.

There may be more than one row with the highest count.

SQL Server has had the feature WITH TIES for some time - with non-standard syntax:

SELECT TOP 1 WITH TIES
       m.yr, count(*) AS movie_count
FROM   casting c
JOIN   movie   m ON c.movieid = m.id
WHERE  c.actorid = (SELECT id FROM actor WHERE name = 'John Travolta')
GROUP  BY m.yr
ORDER  BY count(*) DESC;  -- can't sort by year for this

db<>fiddle here

PostgreSQL 13 added WITH TIES with standard SQL syntax:

SELECT m.yr, count(*) AS movie_count
FROM   casting c
JOIN   movie   m ON c.movieid = m.id
WHERE  c.actorid = (SELECT id FROM actor WHERE name = 'John Travolta')
GROUP  BY m.yr
ORDER  BY count(*) DESC  -- can't sort by year for this
FETCH  FIRST 1 ROWS WITH TIES;

db<>fiddle here

This should be the fastest possible query. Further reading:

To sort results by additional criteria (or for older versions of Postgres or other RDBMS without WITH TIES), use the window function rank() in a subquery:

SELECT yr, movie_count
FROM  (
   SELECT m.yr, count(*) AS movie_count
        , rank() OVER (ORDER BY count(*) DESC) AS rnk
   FROM   casting c
   JOIN   movie   m ON c.movieid = m.id
   WHERE  c.actorid = (SELECT id FROM actor WHERE name = 'John Travolta')
   GROUP  BY m.yr
   ) sub
WHERE  rnk = 1
ORDER  BY yr;  -- optionally sort by year

All major RDBMS support window functions nowadays.

Erwin Brandstetter
  • 479,275
  • 111
  • 893
  • 1,042
  • [...you do not need to join to the table movie at all]. A join to the table 'movie' is required as this is the only table with the column 'yr' (year of movie). – Kevin Swann Oct 06 '17 at 14:38
  • @KevinSwann: I missed this somehow, sorry. You are right, I fixed accordingly. Added new, better options while being at it. – Erwin Brandstetter Mar 12 '21 at 23:44
4

it's from this site - http://sqlzoo.net/3.htm 2 possible solutions:

with TOP 1 a ORDER BY ... DESC:

SELECT yr, COUNT(title) 
FROM actor 
JOIN casting ON actor.id=actorid
JOIN movie ON movie.id=movieid
WHERE name = 'John Travolta'
GROUP BY yr
HAVING count(title)=(SELECT TOP 1 COUNT(title) 
FROM casting 
JOIN movie ON movieid=movie.id 
JOIN actor ON actor.id=actorid
WHERE name='John Travolta'
GROUP BY yr
ORDER BY count(title) desc)

with MAX:

SELECT yr, COUNT(title) 
FROM actor  
JOIN casting ON actor.id=actorid    
JOIN movie ON movie.id=movieid
WHERE name = 'John Travolta'
GROUP BY yr
HAVING 
    count(title)=
        (SELECT MAX(A.CNT) 
            FROM (SELECT COUNT(title) AS CNT FROM actor 
                JOIN casting ON actor.id=actorid
                JOIN movie ON movie.id=movieid
                    WHERE name = 'John Travolta'
                    GROUP BY (yr)) AS A)
Ropman
  • 41
  • 1
3

Using max with a limit will only give you the first row, but if there are two or more rows with the same number of maximum movies, then you are going to miss some data. Below is a way to do it if you have the rank() function available.

SELECT
    total_final.yr,
    total_final.num_movies
    FROM
    ( SELECT 
        total.yr, 
        total.num_movies, 
        RANK() OVER (ORDER BY num_movies desc) rnk
        FROM (
               SELECT 
                      m.yr, 
                      COUNT(*) AS num_movies
               FROM MOVIE m
               JOIN CASTING c ON c.movieid = m.id
               JOIN ACTOR a ON a.id = c.actorid
               WHERE a.name = 'John Travolta'
               GROUP BY m.yr
             ) AS total
    ) AS total_final 
   WHERE rnk = 1
3

The following code gives you the answer. It essentially implements MAX(COUNT(*)) by using ALL. It has the advantage that it uses very basic commands and operations.

SELECT yr, COUNT(title)
FROM actor
JOIN casting ON actor.id = casting.actorid
JOIN movie ON casting.movieid = movie.id
WHERE name = 'John Travolta'
GROUP BY yr HAVING COUNT(title) >= ALL
  (SELECT COUNT(title)
   FROM actor
   JOIN casting ON actor.id = casting.actorid
   JOIN movie ON casting.movieid = movie.id
   WHERE name = 'John Travolta'
   GROUP BY yr)
user765195
  • 413
  • 4
  • 13
2

Thanks to the last answer

SELECT yr, COUNT(title)
FROM actor
JOIN casting ON actor.id = casting.actorid
JOIN movie ON casting.movieid = movie.id
WHERE name = 'John Travolta'
GROUP BY yr HAVING COUNT(title) >= ALL
  (SELECT COUNT(title)
   FROM actor
   JOIN casting ON actor.id = casting.actorid
   JOIN movie ON casting.movieid = movie.id
   WHERE name = 'John Travolta'
   GROUP BY yr)

I had the same problem: I needed to know just the records which their count match the maximus count (it could be one or several records).

I have to learn more about "ALL clause", and this is exactly the kind of simple solution that I was looking for.

shaedrich
  • 1,879
  • 18
  • 24
Andres
  • 21
  • 1
2

Depending on which database you're using...

select yr, count(*) num from ...
order by num desc

Most of my experience is in Sybase, which uses some different syntax than other DBs. But in this case, you're naming your count column, so you can sort it, descending order. You can go a step further, and restrict your results to the first 10 rows (to find his 10 busiest years).

Tim
  • 8,474
  • 3
  • 37
  • 54
1
     select top 1 yr,count(*)  from movie
join casting on casting.movieid=movie.id
join actor on casting.actorid = actor.id
where actor.name = 'John Travolta'
group by yr order by 2 desc
Eugene
  • 2,895
  • 2
  • 32
  • 38
1
create view sal as
select yr,count(*) as ct from
(select title,yr from movie m, actor a, casting c
where a.name='JOHN'
and a.id=c.actorid
and c.movieid=m.id)group by yr

-----VIEW CREATED-----

select yr from sal
where ct =(select max(ct) from sal)

YR 2013