Understanding internals of MySqlDataReader

Question

I'd like to understand the quirks of MySqlDataReader (or IDataReader in general). Whilst researching on the internet I found many resources on how to use the MySqlDataReader, but very little about what's happening behind the scenes. I am asking because I found that in some benchmark I performed the time needed to execute MySqlCommand.ExecuteReader() is orders of magnitude smaller than reading all of my data-sets with MySqlDataReader.Reader(). This holds especially for large data sets. To provide an example: I am reading ~740000 rows, which takes 80-100 ms for the query to execute and about 15 s to read all data. Another example is reading ~2200 rows with a query time of ~200 ms and ~1 s to read all data.

According to High Performance MySQL data retrieved is buffered in common connectors (3rd Edition, p. 212) and I'd assume this also holds for Connector/Net. I understand that in the case of 740000 rows maybe not all data can or should be buffered, but it should be possible to buffer the 2200 rows in the second example with ease (I am requesting not more than 5-7 columns).

Creating a comparable amount of data structures without reading from a database takes <1 ms (measured with System.Diagnostics.Stopwatch), hence this ain't the bottleneck. I am wondering why reading from the reader takes that much time, if the data is buffered.

Related(SQL-Server): http://stackoverflow.com/questions/1383920/how-datareader-works — Tim Schmelter, Oct 10 '14 at 08:16
_"Creating a comparable amount of data structure"_ It's not clear what you have measured, <1ms sounds as if you'd used a LINQ query which is not executed(f.e. via `ToList()`). While a `DataReader` streams rows in batches it still needs to execute the query before it can start processing rows. If the query is expensive the reader has to wait until the first rows are streamed. Maybe you need to add missing indexes or optimize your query in another way. — Tim Schmelter, Oct 10 '14 at 08:17

score 2 · Answer 1 · answered Aug 02 '18 at 05:16

To understand how MySqlDataReader works, you need to understand the MySQL Protocol. Assuming MySqlCommand.Prepare() isn't called, then the text protocol will be used.

MySqlCommand.ExecuteReader sends a COM_QUERY packet to the server. The MySQL Server replies with a text resultset. This contains a header with metadata about the columns in the resultset, then all the rows.

In practice, I've found that the column metadata is not returned by the server until the query is "complete" (e.g., all the WHERE and ORDER BY clauses have been evaluated); in a complex query this can take quite some time. Once the column metadata is returned, MySqlCommand.ExecuteReader returns a MySqlDataReader object. So "executing the query" is the first delay you measure.

The standard while (reader.Read()) { } loop then keeps reading the row packets that are streamed back from the server. The speed of this loop depends on how quickly the server sends packets and how quickly the client library can deserialize them. Some libraries are much faster than others, e.g., MySqlConnector can read large numbers of rows almost twice as fast as Connector/NET (due to more efficient code). But the bulk of the time you observe is just receiving then reading the rows.

There is a certain amount of overhead when retrieving data from the networking stack before it can be deserialized, and this probably accounts for a significant part of the overall time. The new "Pipelines" feature of .NET is an attempt to address this problem, so we may see much faster MySQL connection libraries in the future.

Understanding internals of MySqlDataReader

1 Answers1