42

There is a big database, 1,000,000,000 rows, called threads (these threads actually exist, I'm not making things harder just because of I enjoy it). Threads has only a few stuff in it, to make things faster: (int id, string hash, int replycount, int dateline (timestamp), int forumid, string title)

Query:

select * from thread where forumid = 100 and replycount > 1 order by dateline desc limit 10000, 100

Since that there are 1G of records it's quite a slow query. So I thought, let's split this 1G of records in as many tables as many forums(category) I have! That is almost perfect. Having many tables I have less record to search around and it's really faster. The query now becomes:

select * from thread_{forum_id} where replycount > 1 order by dateline desc limit 10000, 100

This is really faster with 99% of the forums (category) since that most of those have only a few of topics (100k-1M). However because there are some with about 10M of records, some query are still to slow (0.1/.2 seconds, to much for my app!, I'm already using indexes!).

I don't know how to improve this using MySQL. Is there a way?

For this project I will use 10 Servers (12GB ram, 4x7200rpm hard disk on software raid 10, quad core)

The idea was to simply split the databases among the servers, but with the problem explained above that is still not enought.

If I install cassandra on these 10 servers (by supposing I find the time to make it works as it is supposed to) should I be suppose to have a performance boost?

What should I do? Keep working with MySQL with distributed database on multiple machines or build a cassandra cluster?

I was asked to post what are the indexes, here they are:

mysql> show index in thread;
PRIMARY id
forumid
dateline
replycount

Select explain:

mysql> explain SELECT * FROM thread WHERE forumid = 655 AND visible = 1 AND open <> 10 ORDER BY dateline ASC LIMIT 268000, 250;
+----+-------------+--------+------+---------------+---------+---------+-------------+--------+-----------------------------+
| id | select_type | table  | type | possible_keys | key     | key_len | ref         | rows   | Extra                       |
+----+-------------+--------+------+---------------+---------+---------+-------------+--------+-----------------------------+
|  1 | SIMPLE      | thread | ref  | forumid       | forumid | 4       | const,const | 221575 | Using where; Using filesort | 
+----+-------------+--------+------+---------------+---------+---------+-------------+--------+-----------------------------+
Peter O.
  • 28,965
  • 14
  • 72
  • 87
cedivad
  • 2,404
  • 6
  • 30
  • 39
  • 2
    So, what indexes are you using? And what does `EXPLAIN SELECT` say about your query? – Victor Nicollet Dec 11 '10 at 23:30
  • 1
    re the index info added: Apparently "knowing about indexes" does not include actually knowing which ones and how to use them. See Victor's answer. – Andrew Barber Dec 11 '10 at 23:38
  • 1
    @andrew, i forgot to mention the replycount index, however that is an index in the table. – cedivad Dec 11 '10 at 23:42
  • See Victor's answer. Also read about what "using filesort" might mean. Also, your query for the explain was different than the queries you first showed, which could totally change the answer. – Andrew Barber Dec 11 '10 at 23:50
  • i see you've already accepted an answer but i posted mine anyway as you might find the information of use :) – Jon Black Dec 12 '10 at 12:43

5 Answers5

85

You should read the following and learn a little bit about the advantages of a well designed innodb table and how best to use clustered indexes - only available with innodb !

http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html

http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/

then design your system something along the lines of the following simplified example:

Example schema (simplified)

The important features are that the tables use the innodb engine and the primary key for the threads table is no longer a single auto_incrementing key but a composite clustered key based on a combination of forum_id and thread_id. e.g.

threads - primary key (forum_id, thread_id)

forum_id    thread_id
========    =========
1                   1
1                   2
1                   3
1                 ...
1             2058300  
2                   1
2                   2
2                   3
2                  ...
2              2352141
...

Each forum row includes a counter called next_thread_id (unsigned int) which is maintained by a trigger and increments every time a thread is added to a given forum. This also means we can store 4 billion threads per forum rather than 4 billion threads in total if using a single auto_increment primary key for thread_id.

forum_id    title   next_thread_id
========    =====   ==============
1          forum 1        2058300
2          forum 2        2352141
3          forum 3        2482805
4          forum 4        3740957
...
64        forum 64       3243097
65        forum 65      15000000 -- ooh a big one
66        forum 66       5038900
67        forum 67       4449764
...
247      forum 247            0 -- still loading data for half the forums !
248      forum 248            0
249      forum 249            0
250      forum 250            0

The disadvantage of using a composite key is that you can no longer just select a thread by a single key value as follows:

select * from threads where thread_id = y;

you have to do:

select * from threads where forum_id = x and thread_id = y;

However, your application code should be aware of which forum a user is browsing so it's not exactly difficult to implement - store the currently viewed forum_id in a session variable or hidden form field etc...

Here's the simplified schema:

drop table if exists forums;
create table forums
(
forum_id smallint unsigned not null auto_increment primary key,
title varchar(255) unique not null,
next_thread_id int unsigned not null default 0 -- count of threads in each forum
)engine=innodb;


drop table if exists threads;
create table threads
(
forum_id smallint unsigned not null,
thread_id int unsigned not null default 0,
reply_count int unsigned not null default 0,
hash char(32) not null,
created_date datetime not null,
primary key (forum_id, thread_id, reply_count) -- composite clustered index
)engine=innodb;

delimiter #

create trigger threads_before_ins_trig before insert on threads
for each row
begin
declare v_id int unsigned default 0;

  select next_thread_id + 1 into v_id from forums where forum_id = new.forum_id;
  set new.thread_id = v_id;
  update forums set next_thread_id = v_id where forum_id = new.forum_id;
end#

delimiter ;

You may have noticed I've included reply_count as part of the primary key which is a bit strange as (forum_id, thread_id) composite is unique in itself. This is just an index optimisation which saves some I/O when queries that use reply_count are executed. Please refer to the 2 links above for further info on this.

Example queries

I'm still loading data into my example tables and so far I have a loaded approx. 500 million rows (half as many as your system). When the load process is complete I should expect to have approx:

250 forums * 5 million threads = 1250 000 000 (1.2 billion rows)

I've deliberately made some of the forums contain more than 5 million threads for example, forum 65 has 15 million threads:

forum_id    title   next_thread_id
========    =====   ==============
65        forum 65      15000000 -- ooh a big one

Query runtimes

select sum(next_thread_id) from forums;

sum(next_thread_id)
===================
539,155,433 (500 million threads so far and still growing...)

under innodb summing the next_thread_ids to give a total thread count is much faster than the usual:

select count(*) from threads;

How many threads does forum 65 have:

select next_thread_id from forums where forum_id = 65

next_thread_id
==============
15,000,000 (15 million)

again this is faster than the usual:

select count(*) from threads where forum_id = 65

Ok now we know we have about 500 million threads so far and forum 65 has 15 million threads - let's see how the schema performs :)

select forum_id, thread_id from threads where forum_id = 65 and reply_count > 64 order by thread_id desc limit 32;

runtime = 0.022 secs

select forum_id, thread_id from threads where forum_id = 65 and reply_count > 1 order by thread_id desc limit 10000, 100;

runtime = 0.027 secs

Looks pretty performant to me - so that's a single table with 500+ million rows (and growing) with a query that covers 15 million rows in 0.02 seconds (while under load !)

Further optimisations

These would include:

  • partitioning by range

  • sharding

  • throwing money and hardware at it

etc...

hope you find this answer helpful :)

Jon Black
  • 15,289
  • 5
  • 40
  • 41
  • 7
    Thank you a lot f00! This indeed is usefull! I think that some answers should be worth in money! Thank you! =) – cedivad Dec 12 '10 at 17:15
  • no worries - glad to help. You should still explore other optimisation methods too as you seem to have the hardware for this :) – Jon Black Dec 12 '10 at 18:11
  • Wow. You did not get enough recognition for this answer. It's super awesome. – Ori Jul 29 '11 at 06:05
  • 2
    Just a comment on storage engines and clustered indexes - TokuDB supports multiple user-defined clustered indexes, it's an engine for MySQL and it scales much much better than InnoDB. Great answer tho, +1 from me. – N.B. Dec 16 '11 at 12:09
  • 1
    @f00 Thanks for the design. I have a query. The table 'threads' that you designed is prone to a lot of fragmentation (eg inserting a new thread in forum 5 when threads exist in all 250 forums) increasing memory consumption, even for indexes. I understand this table will have high insert operations. Is there some way this big table can be changed to minimize on fragmentation? (running OPTIMIZE TABLE will not help much as will lead to page splits on subsequent inserts, pointing to 15/16 fill ratio. It does more help for tables mainly for read operations). – Ethan Jul 15 '12 at 22:56
  • Thank you foo for this excellent answer . I have few questions related to this . Could you please have a look at http://stackoverflow.com/questions/14642772/innodb-forum-db-design-and-composite-primary-key-cluster-primary-key ? Thank You – Bujji Feb 01 '13 at 09:44
  • 2
    @ethan: yes fragmentation and record chaining could be a problem but a good dba will sort that out. Export and re-import in PK order as a matter of maintenance. – Jon Black Nov 26 '13 at 21:12
  • What is the configuration of the related server? Both hardware and my.cnf? – frankish Jan 27 '16 at 11:36
  • 1
    Didn't know where else to ask this, but @Jon Black, what did you use to create, load, and test your example tables/database? I've been searching how to do this and have not found clear direction. – MikelG Jul 27 '16 at 23:47
  • @MikeIG http://stackoverflow.com/questions/2463602/mysql-load-data-infile-acceleration/2504211#2504211 – Jon Black Aug 05 '16 at 19:44
  • @MikeIG http://stackoverflow.com/questions/3983325/calculate-distance-between-zip-codes-and-users/3989830#3989830 – Jon Black Aug 05 '16 at 19:44
  • @MikeIG http://stackoverflow.com/questions/5125096/for-loop-example-in-mysql/5126655#5126655 – Jon Black Aug 05 '16 at 19:45
  • @MikeIG http://stackoverflow.com/questions/4129071/optimal-mysql-settings-for-queries-that-deliver-large-amounts-of-data/4144514#4144514 – Jon Black Aug 05 '16 at 19:45
  • @MikeIG http://stackoverflow.com/questions/5451190/60-million-entries-select-entries-from-a-certain-month-how-to-optimize-databas/5451389#5451389 – Jon Black Aug 05 '16 at 19:45
  • @MikeIG http://stackoverflow.com/questions/8495258/how-do-i-increase-the-speed-of-this-data-analysis/8524437#8524437 – Jon Black Aug 05 '16 at 19:59
  • @MikeIG http://stackoverflow.com/questions/4758936/automatically-populate-table-with-dummy-data-in-mysql/4760065#4760065 – Jon Black Aug 05 '16 at 20:04
25

EDIT: Your one-column indices are not enough. You would need to, at least, cover the three involved columns.

More advanced solution: replace replycount > 1 with hasreplies = 1 by creating a new hasreplies field that equals 1 when replycount > 1. Once this is done, create an index on the three columns, in that order: INDEX(forumid, hasreplies, dateline). Make sure it's a BTREE index to support ordering.

You're selecting based on:

  • a given forumid
  • a given hasreplies
  • ordered by dateline

Once you do this, your query execution will involve:

  • moving down the BTREE to find the subtree that matches forumid = X. This is a logarithmic operation (duration : log(number of forums)).
  • moving further down the BTREE to find the subtree that matches hasreplies = 1 (while still matching forumid = X). This is a constant-time operation, because hasreplies is only 0 or 1.
  • moving through the dateline-sorted subtree in order to get the required results, without having to read and re-sort the entire list of items in the forum.

My earlier suggestion to index on replycount was incorrect, because it would have been a range query and thus prevented the use of a dateline to sort the results (so you would have selected the threads with replies very fast, but the resulting million-line list would have had to be sorted completely before looking for the 100 elements you needed).

IMPORTANT: while this improves performance in all cases, your huge OFFSET value (10000!) is going to decrease performance, because MySQL does not seem to be able to skip ahead despite reading straight through a BTREE. So, the larger your OFFSET is, the slower the request will become.

I'm afraid the OFFSET problem is not automagically solved by spreading the computation over several computations (how do you skip an offset in parallel, anyway?) or moving to NoSQL. All solutions (including NoSQL ones) will boil down to simulating OFFSET based on dateline (basically saying dateline > Y LIMIT 100 instead of LIMIT Z, 100 where Y is the date of the item at offset Z). This works, and eliminates any performance issues related to the offset, but prevents going directly to page 100 out of 200.

Victor Nicollet
  • 23,569
  • 3
  • 52
  • 88
  • +1 Excellent answer. I hope you get cedivad's very first +1 vote and answer "Accept" in his 9-mos here ;) – Andrew Barber Dec 11 '10 at 23:39
  • I'm sorry, but as already explained, indexes are not enought, unfortunatelly. I'm also sorry to have asked and have never used the "Accept function, i didn't knowd i could use it! (usually popups with "you don't have enought rep to do this" apperes)". – cedivad Dec 11 '10 at 23:46
  • You can always select the correct answer for your question. As for the indexes, check out what that "using filesort" means on your EXPLAIN... – Andrew Barber Dec 11 '10 at 23:49
  • Thank you for the amazing updated reply! I finally start understanding something more about how MySql works! =) Unfortunatelly, i didn't understood the last part about the offeset, that is important since that queries with offset of about 200k takes about 20 seconds to run. Do the "dateline > Y LIMIT 100" stuff works well, by supposing we already know the value of Y (cached in some way?) Would it have sense to add a field to the table, "Month, 201012", and add to the query MONTH = 201012 for not having to offset huge amount of data? So if results < 100 we do another query with month = 201101 – cedivad Dec 12 '10 at 10:28
  • 1
    The `dateline > Y` works like this: on page one, `Y = 9999-12-31` so you get the most recent elements. You would try to get 101 of them, display the first 100, and remember what the `dateline` of line 101 was. Then, to get items 101-200, you would use that remembered value. The value can be remembered by adding it to the link for page 2 (`/page?date=2010-09-28`). This is how many NoSQL solutions handle paging (CouchDB does, for instance : see "Fast Paging" on http://guide.couchdb.org/draft/recipes.html). As a middle ground: use OFFSET if less than 1000 and `dateline` otherwise. – Victor Nicollet Dec 12 '10 at 12:20
  • Thank you, thank you and thank you! Thank you! This community and its users are amazing! =) – cedivad Dec 12 '10 at 17:12
4

There is are part of question which related to NoSQL or MySQL option. Actually this is one fundamental thing hidden here. SQL language is easy to write for human and bit difficult to read for computer. In high volume databases I would recommend to avoid SQL backend as this requires extra step - command parsing. I have done extensive benchmarking and there are cases when SQL parser is slowest point. There is nothing you can do about it. Ok, you can possible use pre-parsed statements and access them.

BTW, it is not wide known but MySQL has grown out from NoSQL database. Company where authors of MySQL David and Monty worked was data warehousing company and they often had to write custom solutions for uncommon tasks. This leaded to big stack of homebrew C libraries used to manually write database functions when Oracle and other were performing poorly. SQL was added to this nearly 20 years old zoo on 1996 for fun. What came after you know.

Actually you can avoid SQL overhead with MySQL. But usually SQL parsing is not the slowest part but just good to know. To test parser overhead you may just make benchmark for "SELECT 1" for example ;).

Tõnu Samuel
  • 2,598
  • 1
  • 18
  • 28
3

You should not be trying to fit a database architecture to hardware you're planning to buy, but instead plan to buy hardware to fit your database architecture.

Once you have enough RAM to keep the working set of indexes in memory, all your queries that can make use of indexes will be fast. Make sure your key buffer is set large enough to hold the indexes.

So if 12GB is not enough, don't use 10 servers with 12GB of RAM, use fewer with 32GB or 64GB of RAM.

Dan Grossman
  • 49,405
  • 10
  • 105
  • 95
  • You say that by having the indexes fully cached in ram performance will be ok? I's cheaper to have 10 servers @ 12Gb than 3 @ 64Gb, unfortunatelly. I will buy one and make my tests, i suppose that it's the only way to go. – cedivad Dec 11 '10 at 23:45
  • 2
    You don't have to buy anything. Start an Amazon EC2 instance and only pay for the hours you run it to do your tests. The largest instance has 68.4GB of memory. – Dan Grossman Dec 11 '10 at 23:47
  • That's also true... i can use ec2 to test memory usage and performance! Thank you! – cedivad Dec 11 '10 at 23:49
0

Indices are a must - but remember to choose the right type of index: BTREE is more suitable when using queries with "<" or ">" in your WHERE clauses, while HASH is more suitable when you have many distinct values in one column and you are using "=" or "<=>" in your WHERE clause.

Further reading http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

descent89
  • 61
  • 6
  • Not really relevant since nobody's going to run a forum on MEMORY tables that are never persisted to disk, and only MEMORY tables support HASH indexes. If his server crashed, or MySQL hiccupped and had to be restarted, all his data would be lost. – Dan Grossman Dec 11 '10 at 23:35
  • There are several techniques how to implement safe and durable in-memory database such as snapshoting, transaction logging, replication/failover cluster. In-memory database systems are definitely faster than ANY disk-optimized one, so they may be worth a shot. – descent89 Dec 12 '10 at 00:11