I have a mysql table with the potential for millions of rows of data - in some extreme cases up to 100mil. There an application I've developed that often queries this data and I've done what I can to optimize it - for the most part it works very quickly because we're only searching a very small subset of the data (tied to locations).
Table structure:
CREATE TABLE `prism_actions` (
`id` int(11) unsigned NOT NULL auto_increment,
`action_time` timestamp NOT NULL default CURRENT_TIMESTAMP,
`action_type` varchar(25) NOT NULL,
`player` varchar(16) NOT NULL,
`world` varchar(255) NOT NULL,
`x` int(11) NOT NULL,
`y` int(11) NOT NULL,
`z` int(11) NOT NULL,
`block_id` mediumint(5) unsigned NOT NULL,
`block_subid` mediumint(5) unsigned NOT NULL,
`data` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `x` (`x`),
KEY `action_type` (`action_type`),
KEY `player` (`player`),
KEY `block_id` (`block_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I have several basic indexes for the fields we're using in our WHERE statements most, and when used in queries with only a single condition - it's quite fast.
The example table I'm running these tests on has 22 million records.
Examples:
SELECT prism_actions.id FROM prism_actions WHERE prism_actions.action_type = 'block-break' LIMIT 1000;
1000 rows in set (0.00 sec)
SELECT prism_actions.id FROM prism_actions WHERE prism_actions.block_id = 2 LIMIT 1000;
1000 rows in set (0.01 sec)
My problem is that for each condition we use in a query (most queries usually have several conditions), the query takes much longer.
SELECT prism_actions.id FROM prism_actions WHERE prism_actions.action_type = 'block-break' AND prism_actions.block_id = 2 LIMIT 1000;
1000 rows in set (0.79 sec)
.79 seconds would be acceptable for the full query but that's just using part of the conditions.
A real query is more like:
SELECT prism_actions.id FROM prism_actions WHERE prism_actions.action_type = 'block-break' AND prism_actions.player = 'viveleroi' AND prism_actions.block_id = 2 LIMIT 1000;
1000 rows in set (2.22 sec)
With a single condition we ran with 0.01
, with two we ran with 0.79
and with three we ran at 2.2
seconds which is too long.
I'm going to research how I can better design my indexes but I'm mostly comfortable with the current db schema and indexes.
However, what can I try to make the conditions faster when used together like that?
Update
I spent time converting the tables into a foreign-key format. The player, action_type, and world column data were moved to separate tables, and the IDs for those were stored in the original tables. Took several hours to migrate the data.
However, I'm re-running the same queries I had used earlier and while I see a speed increase in some, I see little change in others.
The converted version of the .79 second query above runs about the same speed:
SELECT prism_actions.id FROM prism_actions WHERE prism_actions.actiontype_id = 1 AND prism_actions.block_id = 2 LIMIT 1000;
1000 rows in set (0.73 sec)
The block_id col still has an index from the original table schema.
Queries with the player_id as a condition were running very slowly so I added an index to the column and now the queries are blazing fast.
However, after taking examples of several queries from a real user, and updating them for this table structure, I see no change in speeds.
SELECT prism_actions.id FROM prism_actions WHERE (prism_actions.actiontype_id = 2 OR prism_actions.actiontype_id = 1) AND (prism_actions.player_id = 1127) AND prism_actions.action_time >= '2013-02-22 07:47:54' LIMIT 1000;
Previously took 5.83 sec
, currently takes 5.29 sec
Edit - seems to be the timestamp. Taking the timestamp condition out of the above query returns the results in 0.01 seconds. Adding an index for the timestamp does nothing - ideas?
So far, all I'm really seeing as is a slight speed increase in certain areas, a small file space savings since we store duplicate strings - but nothing so far that would warrant asking hundreds of users with databases this large to spend a day's worth of time converting data.
Any suggestions for some other way I might index stuff, etc?