51

suppose I have this table

id | cash 
1    200
2    301
3    101
4    700

and I want to return the first row in which the sum of all the previous cash is greater than a certain value:

So for instance, if I want to return the first row in which the sum of all the previous cash is greater than 500, is should return to row 3

How do I do this using mysql statement?

using WHERE SUM(cash) > 500 doesn't work

Dolph
  • 44,962
  • 13
  • 58
  • 86
kamikaze_pilot
  • 12,874
  • 29
  • 103
  • 169
  • 1
    You want to select `id=3` because `200 + 300 >= 500` or because `501 > 500`? – Dolph Jul 19 '10 at 19:42
  • Are you including an ORDER BY clause? You can't really say get the "next" record because the rows aren't stored in any particular order. – Robot Jul 19 '10 at 19:45
  • it's because 200+301 >= 500... yea i tried the order by and having clauses as well but in this scenario they would return row 4 because row 4 is >= 500, not row 3 as intended – kamikaze_pilot Jul 19 '10 at 19:49

4 Answers4

107

You can only use aggregates for comparison in the HAVING clause:

GROUP BY ...
  HAVING SUM(cash) > 500

The HAVING clause requires you to define a GROUP BY clause.

To get the first row where the sum of all the previous cash is greater than a certain value, use:

SELECT y.id, y.cash
  FROM (SELECT t.id,
               t.cash,
               (SELECT SUM(x.cash)
                  FROM TABLE x
                 WHERE x.id <= t.id) AS running_total
         FROM TABLE t
     ORDER BY t.id) y
 WHERE y.running_total > 500
ORDER BY y.id
   LIMIT 1

Because the aggregate function occurs in a subquery, the column alias for it can be referenced in the WHERE clause.

OMG Ponies
  • 300,587
  • 73
  • 490
  • 482
  • I think his edit clarified what he meant and made this no longer true – Michael Mrozek Jul 19 '10 at 19:43
  • @Michael Mrozek: Thx for the heads up, think I got it. – OMG Ponies Jul 19 '10 at 19:50
  • hi there, thanks for the help I'm wondering on whether or not this code would perform well on a large table though Any comments on the performance costs of using this approach on a large database? – kamikaze_pilot Jul 19 '10 at 19:59
  • @user380714: You could add indexes if there was a need, but being a calculated column there's no way to specify an index for the running_total column. And the more the data can be limited - say a particular account - the better. – OMG Ponies Jul 19 '10 at 20:02
  • +1 for using a derived table. I had a similar problem using HAVING on a normal grouped select statement with a counter and it seemed to create a temp table before applying the having. On a table with millions of rows, without the HAVING condition (but a limit 10 for testing) the data would be back in half a sec, with the HAVING it was 2 mins. A derived table as shown above solved my problem, and the query over the millions of rows now takes 0.4sec. – Cloudranger Dec 03 '15 at 14:42
6

Not tested, but I think this will be close?

SELECT m1.id
FROM mytable m1
INNER JOIN mytable m2 ON m1.id < m2.id
GROUP BY m1.id
HAVING SUM(m1.cash) > 500
ORDER BY m1.id
LIMIT 1,2

The idea is to SUM up all the previous rows, get only the ones where the sum of the previous rows is > 500, then skip one and return the next one.

Eric Petroelje
  • 57,359
  • 8
  • 118
  • 174
  • This does not work as it returns 2. Using `SELECT m2.id` instead of `SELECT m1.id` returns 3 - the expected output. But what if the `id`s are not in order. Should using `ORDER BY m2.id` instead of `ORDER BY m1.id` be the solution ? But the `ORDER BY ` clause makes the ordering while showing the output, right ? So can you explain how the `id`s can be sorted in `ASC` order when the `JOIN` relation works instead of when the result is being presented ? – Istiaque Ahmed Oct 25 '17 at 15:16
4

In general, a condition in the WHERE clause of an SQL query can reference only a single row. The context of a WHERE clause is evaluated before any order has been defined by an ORDER BY clause, and there is no implicit order to an RDBMS table.

You can use a derived table to join each row to the group of rows with a lesser id value, and produce the sum of each sum group. Then test where the sum meets your criterion.

CREATE TABLE MyTable ( id INT PRIMARY KEY, cash INT );

INSERT INTO MyTable (id, cash) VALUES
  (1, 200), (2, 301), (3, 101), (4, 700);

SELECT s.*
FROM (
  SELECT t.id, SUM(prev.cash) AS cash_sum
  FROM MyTable t JOIN MyTable prev ON (t.id > prev.id)
  GROUP BY t.id) AS s
WHERE s.cash_sum >= 500
ORDER BY s.id
LIMIT 1;

Output:

+----+----------+
| id | cash_sum |
+----+----------+
|  3 |      501 |
+----+----------+
Bill Karwin
  • 462,430
  • 80
  • 609
  • 762
  • shows error : '#1052 - Column 'cash' in field list is ambiguous' – Istiaque Ahmed Oct 25 '17 at 15:34
  • @IstiaqueAhmed, thanks for catching that. I added a correlation name so it's `SUM(prev.cash)` now. That should fix the error. – Bill Karwin Oct 25 '17 at 15:39
  • And now it shows `id` to be 2 instead of 3 and `cash_sum` 501.00. – Istiaque Ahmed Oct 25 '17 at 15:45
  • @IstiaqueAhmed, I changed `>=` to `>` in the join and now it identifies id 3 as the id of the row where *previous* rows totaled more than 500. – Bill Karwin Oct 25 '17 at 16:10
  • Can you explain what `GROUP BY t.id` does here ? As `t.id` is always distinct , what does grouping do here ? – Istiaque Ahmed Oct 25 '17 at 17:12
  • @IstiaqueAhmed, `t.id` is always distinct in the table, but it is joined to multiple rows based on the inequality in the join condition. So there are multiple rows per grouping. – Bill Karwin Oct 25 '17 at 17:40
  • 'there is no implicit order to an RDBMS table' - without `ASC` or `DESC`, the `SELECT` query will output rows in the order they appear in the DB, right ? – Istiaque Ahmed Oct 26 '17 at 11:02
  • @IstiaqueAhmed, yes, the SQL standard says that the implementation is free to return rows in any order if the user's query does not specify the order. In practice, this tends to be whatever order is less work for the RDBMS, and that's the order in which the RDBMS read the rows. In the case of InnoDB for example, it's the order of the index from which MySQL read the rows. – Bill Karwin Oct 26 '17 at 14:42
0

When using aggregate functions to filter, you must use a HAVING statement.

SELECT *
FROM tblMoney
HAVING Sum(CASH) > 500
Robot
  • 76
  • 4