7

From time to time I encounter a strange MySQL behavior. Let's assume I have indexes (type, rel, created), (type), (rel). The best choice for a query like this one:

SELECT id FROM tbl
WHERE rel = 3 AND type = 3
ORDER BY created;

would be to use index (type, rel, created). But MySQL decides to intersect indexes (type) and (rel), and that leads to worse perfomance. Here is an example:

mysql> EXPLAIN
    -> SELECT id FROM tbl
    -> WHERE rel = 3 AND type = 3
    -> ORDER BY created\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbl
         type: index_merge
possible_keys: idx_type,idx_rel,idx_rel_type_created
          key: idx_type,idx_rel
      key_len: 1,2
          ref: NULL
         rows: 4343
        Extra: Using intersect(idx_type,idx_rel); Using where; Using filesort

And the same query, but with a hint added:

mysql> EXPLAIN
    -> SELECT id FROM tbl USE INDEX (idx_type_rel_created)
    -> WHERE rel = 3 AND type = 3
    -> ORDER BY created\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: tbl
         type: ref
possible_keys: idx_type_rel_created
          key: idx_type_rel_created
      key_len: 3
          ref: const,const
         rows: 8906
        Extra: Using where

I think MySQL takes an execution plan which contains less number in the "rows" column of the EXPLAIN command. From that point of view, index intersection with 4343 rows looks really better than using my combined index with 8906 rows. So, maybe the problem is within those numbers?

mysql> SELECT COUNT(*) FROM tbl WHERE type=3 AND rel=3;
+----------+
| COUNT(*) |
+----------+
|     3056 |
+----------+

From this I can conclude that MySQL is mistaken at calculating approximate number of rows for combined index.

So, what can I do here to make MySQL take the right execution plan?

I can not use optimizer hints, because I have to stick to Django ORM The only solution I found yet is to remove those one-field indexes.

MySQL version is 5.1.49.

The table structure is:

CREATE TABLE tbl (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `type` tinyint(1) NOT NULL,
  `rel` smallint(2) NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_type` (`type`),
  KEY `idx_rel` (`rel`),
  KEY `idx_type_rel_created` (`type`,`rel`,`created`)
) ENGINE=MyISAM;
Ivan Virabyan
  • 1,518
  • 1
  • 16
  • 25

2 Answers2

12

It's hard to tell exactly why MySQL chooses index_merge_intersection over the index scan, but you should note that with the composite indexes, statistics up to the given column are stored for the composite indexes.

The value of information_schema.statistics.cardinality for the column type of the composite index will show the cardinality of (rel, type), not type itself.

If there is a correlation between rel and type, then cardinality of (rel, type) will be less than product of cardinalities of rel and type taken separately from the indexes on corresponding columns.

That's why the number of rows is calculated incorrectly (an intersection cannot be larger in size than a union).

You can forbid index_merge_intersection by setting it to off in @@optimizer_switch:

SET optimizer_switch = 'index_merge_intersection=off'
Quassnoi
  • 381,935
  • 83
  • 584
  • 593
  • Wow, I didn't know that only statistics for the first columns are used. This explains everything. But why is it so? It means I have to place most selective columns at the left part of an index. Thank you for pointing out how to disable index_merge_intersection. – Ivan Virabyan Dec 24 '10 at 14:09
  • 1
    @Ivan: I must have worded it incorrectly. Statistics on all columns are stored, but with respect to the leading columns. What is stored in `cardinality` field of secondary column is in fact cardinality of the tuples up to the given column. If there is a correlation between the columns, the statistics shown by different indexes are skewed. – Quassnoi Dec 24 '10 at 14:30
  • Why should I forbid `index_merge_intersection`? I believe it's always good. May you please take a look at my [similar question](http://stackoverflow.com/questions/37729731/can-mysql-use-indexes-when-there-is-or-between-conditions)? – Martin AJ Jun 09 '16 at 15:08
  • @Stack: quoting the op: *Surprisingly, it decides to intersect indexes (type) and (rel), and that leads to worse performance*. In this particular case, the composite index is faster than intersection of the two indexes. – Quassnoi Jun 09 '16 at 15:56
3

Another thing is worth mentioning: you would not have the problem if you deleted the index on type only. the index is not required since it duplicates a part of the composite index.

newtover
  • 28,176
  • 11
  • 78
  • 85
  • hm, and i do not see a composite index on (type, rel). – newtover Dec 24 '10 at 20:01
  • I mean you have a typo in your table definition: there is no relsite3 column – newtover Dec 24 '10 at 20:15
  • 1
    You're right, here I can delete index on type, maybe this is not very good example. But that's not always the case. There might be cases when it uses intersection of two indexes, none of which is at the leftmost part of a composite index. For example, (a, b, c), (b), (c) - and mysql does use intersection of (b) and (c). Anyway mysql makes wrong decision - and I would like to know why, and what can I do to prevent it from doing so apart from deleting other index – Ivan Virabyan Dec 24 '10 at 22:16