1

I have a table which will be about 2 - 5 million rows on average. It has a primary key/index called 'instruction_id' and another indexed field called 'mode'. now 'instruction_id' is of course unique since it is the primary key but 'mode' will only be one of 3 different values. The query I run all the time is

SELECT * FROM tablename WHERE mode = 'value1' ORDER BY instruction_id LIMIT 50

This currently takes about 25 sec ( > 1 sec is unacceptably long) but there are only 600K rows right now so it will get worse as the table grows. Would indexing in a different way help? If I index instruction_id and mode together will that make a difference? If I somehow am able to naturally order the table by instruction_id so I don't have to ask for the order by would be another way around this but I don't know how to do that... Any help would be great.

Brian Tompsett - 汤莱恩
  • 5,195
  • 62
  • 50
  • 120
hackartist
  • 4,968
  • 4
  • 29
  • 47
  • Are you sure the query is taking that long to execute. It will take time to retrieve 200K rows - especially if the server is remote. Providing the EXPLAIN output of your query would help too – Adrian Cornish Dec 09 '11 at 19:45
  • 1
    @AdrianCornish, note the `LIMIT 50`. – Albin Sunnanbo Dec 09 '11 at 20:11
  • @AlbinSunnanbo Good point - although the storage engine may be returning 200K rows then because the ORDER BY/LIMIT is being applied outside the storage engine by MySQL itself. – Adrian Cornish Dec 09 '11 at 20:31

4 Answers4

5

You should try index on (mode, instruction_id), in that order.

The reasoning behind that index is that it creates an index like this

mode  instruction_id
A     1
A     3
A     4
A     5
A     10
A     11
B     2
B     8
B     12
B     13
B     14
C     6
C     7
C     9
C     15
C     16
C     17

If you search for mode B the sql server can search the index with a binary search on mode until it finds the first B, then it can simply output the next n rows. This would be really fast, about 22 compares for 4M rows.

Always use ORDER BY if you expect the result to be ordered, regardless of how the data is stored. The query engine might choose a query plan that output the rows in a different order than the order of the PK (maybe not in such simple cases as this, but in general).

Albin Sunnanbo
  • 44,354
  • 8
  • 64
  • 104
3

You should check out the following links relating to innodb clustered indexes

Then build your schema something along the lines of:

drop table if exists instruction_modes;
create table instruction_modes
(
mode_id smallint unsigned not null,
instruction_id int unsigned not null,
primary key (mode_id, instruction_id), -- note the clustered composite PK order !
unique key (instruction_id)
)
engine = innodb;

Cold (mysql restarted) runtime performance as follows:

select count(*) from instruction_modes;
+----------+
| count(*) |
+----------+
|  6000000 |
+----------+
1 row in set (2.54 sec)

select distinct mode_id from instruction_modes;
+---------+
| mode_id |
+---------+
|       1 |
|       2 |
|       3 |
+---------+
3 rows in set (0.06 sec)

select * from instruction_modes where mode_id = 2 order by instruction_id limit 10;
+---------+----------------+
| mode_id | instruction_id |
+---------+----------------+
|       2 |              2 |
|       2 |              3 |
|       2 |              4 |
|       2 |              5 |
|       2 |              6 |
|       2 |              9 |
|       2 |             14 |
|       2 |             25 |
|       2 |             28 |
|       2 |             32 |
+---------+----------------+
10 rows in set (0.04 sec)

0.04 seconds cold seems pretty performant.

Hope this helps :)

Community
  • 1
  • 1
Jon Black
  • 15,289
  • 5
  • 40
  • 41
  • Thank you. I had already accpeted Albin Sunnanbo answer but I +1'ed this one because I think that this is what I will do to get the extra performance boost I am looking for. I actually had already checked out the xaprb link and it was very helpful. – hackartist Dec 09 '11 at 20:18
2

Here is one possible solution:

ALTER TABLE `tablename` ADD UNIQUE  (`mode`, instruction_id);

Then:

SELECT A.* FROM tablename A JOIN (
     SELECT instruction_id FROM tablename 
     WHERE mode = 'value1' 
     ORDER BY instruction_id LIMIT 50
     ) B 
ON (A.instruction_id = B.instruction_id);

I have found for large tables that approach seems to work good for speed as the subquery should only be using the index.

I use a similar query on a table with >100mil records and it returns results in 1-2 seconds.

Matt MacLean
  • 17,724
  • 7
  • 47
  • 51
1

Is 'mode' a character field? If it's only ever going to hold 3 possible values, it sounds like you should make it an enum field, which will still return you the text string but is stored internally as a number.

You should also follow Albin's advice on indexing, which will benefit you further.

Greg D'Arcy
  • 898
  • 10
  • 18
  • Thank you and good point. Unfortunately what I am using it for provides more maintainability and is easier to modify if I don't use enum type values but if I need some extra performance I will look back here. – hackartist Dec 09 '11 at 20:19