4

I've been searching for a solution to this problem for a few days now and could not find anything that would reduce the time it takes to run the query.

I have 2 tables:

"product_db":  
unique_id - [index] 
image 
url_title 
status - [index]

"product_page"
id
product_unique_id - [index]
page_id - [index]

What I want to select is a random image from product_db where status = 'Online' and the product must be in page id = 3

product_db has over 90,000 products and product_page has over 150000 rows.

The query that I am using now is:

SELECT image FROM product_db a, product_page b WHERE b.page_id = 3 AND a.status = 'Online' AND a.unique_id = b.product_unique_id ORDER BY RAND() LIMIT 1

This query takes around 2.3secs to run. This is quite a long time for a web page to load. I have tried a few other queries that first returns a random row from product_page with page_id = 3 and then querying product_db (it did reduce the time it takes) but the problem with that is I cannot compare if the product is 'Online' or not.

hakre
  • 178,314
  • 47
  • 389
  • 754
James
  • 67
  • 1
  • 8

2 Answers2

7

It's the sorting that's slowing you down. Rather than sorting by random, just select a random product_db.unique_id

In your query, replace ORDER BY RAND() with:

AND product_db.unique_id >= ROUND(RAND()*(SELECT MAX(unique_id) FROM product_db))

using >= instead of = in case that unique_id has been deleted from the database. Not as random a result as ordering by rand but the query will execute much faster. If you wish, you can run multiple queries with = until a result is found and it still may be quite faster than sorting all those results.

With an explicit JOIN it would be:

SELECT product_db.image
FROM product_db
JOIN product_page ON product_db.unique_id = product_page.product_unique_id
WHERE product_page.page_id = 3 
AND product_db.status = 'Online' 
AND product_db.unique_id >= ROUND(RAND()*(SELECT MAX(unique_id) FROM product_db))
LIMIT 1
webbiedave
  • 46,141
  • 7
  • 83
  • 96
  • SELECT image FROM product_db a, product_page b WHERE b.page_id = 3 AND a.status = 'Online' AND a.unique_id = b.product_unique_id AND a.unique_id >= ROUND(RAND()*MAX(a.id)) – James Nov 17 '10 at 23:13
  • #1111 - Invalid use of group function Am I missing something? – James Nov 17 '10 at 23:14
  • I get the same error as before: #1111 - Invalid use of group function I've read somewhere that mysql generates this error because "aggregate functions such as COUNT, MAX, etc., is that they are not actually calculated until the rest of the query is evaluated" http://bytes.com/topic/mysql/answers/691615-invalid-use-group-function#post2749228 – James Nov 17 '10 at 23:30
  • James, I've edited as the aggregate function needs to be placed in a separate query. – webbiedave Nov 17 '10 at 23:40
  • Thanks a lot for all your help! :) – James Nov 18 '10 at 08:36
1

The problem is that MySQL has no way to pick a random row so it retrieves all your products and sorts them (unnecessarily).

You could write a stored procedure that picks a random unique_id between MIN and MAX and just tries to fetch that product until it gets one. You can employ a limit on the attempts made.

AndreKR
  • 28,030
  • 13
  • 86
  • 146
  • Hi, thanks for the suggestion. I am not very familiar with stored procedures but I will definitely look into it! :) Thanks again – James Nov 18 '10 at 08:37
  • No, please use the method alex and webbiedave pointed you to. It's much easier and needs only deterministic time. – AndreKR Nov 18 '10 at 11:46