2

I am looking for some help with someone with MySQL expertise. I don't need an exact solution - just some ideas and places to look for optimizing.

A little bit about the problem:

  • I need to insert a large number of rows into an InnoDB tables.
  • Each table has only one index (which is also the primary key)
  • Each row has about 1KB of data in it.
  • I am using Load Data INFILE queries of about 5000 rows at a time.
  • I am using 8 threads for the writes (each writing separate data ).

Ok, so with these characteristics, I am getting a throughput of around 1 million rows per hour written to the DB. This is about 1 GB of data or ~300KB/sec, based on the upper-end of how much data is in a row.

However, when I look at my machine statistics, I notice the I/O graph for disk writes flatlines at about 20 MB / sec, which suggest that I am I/O bound. (The CPU graph also shoots to 100%, but about 90% of that is iowait). So, my question is why would MySQL be writing about 20 MB / sec of data to disk, when the amount of data sent through queries is about 5 KB / sec.

I am guessing the discrepancy is due to log files, temp tables and transaction doubling - but I am wondering why this ratio is near 100:1? And how can shrink this ratio to something more reasonable? What sort of internal variables are causing MYSQL to write out so much data to disk instead of storing it in memory? For instance, I have already set innodb_buffer_pool_size = 12G, max_heap_table_size = 8G and tmp_table_size = 6G in an attempt to make MySQL use more memory instead of disk - but still the same result.

I appreciate any help and advice you can give me!

Multifarious
  • 206
  • 1
  • 5

2 Answers2

1

Eight threads for writings might be far too high or far too low, depending upon what your storage actually looks like.

If you have one spinning metal drive in your computer, this is far too high -- your drive will be seeking all over to perform writes. Use one thread.

If you have fragmented your database tables over eight or more SSD drives, this might be fine, but perhaps more threads would let you make the most of the very low "seek" latency. ("Seek" doesn't really apply to newer SSD devices, but I'm using the term by analogy with older drive technologies.)

sarnold
  • 96,852
  • 21
  • 162
  • 219
1

My best guess is that 90%+ of this time is disk seeks.

If you update an index and a transaction log with every row, and these things are physically far away from each other, it will result in 2-3 seeks per write. With seek time about 10ms, it will limit writing to unimpressive ~33-50 rows per second. This must not be the case with 'load data' because it avoids transactions, but it seems to update indexes still. If the tablespace is fragmented, results may be even worse. Several concurrent threads further worsen the situation.

Try disabling the index during loading. Try using fewer threads, possibly only one.

Disclaimer: I don't know exactly how 'load data' works; docs from mysql.com don't mention transactions at all.

9000
  • 37,110
  • 8
  • 58
  • 98