12

I heard a rumor that when inserting binary data (files and such) into MySQL, you should use the bin2hex() function and send it as a HEX-coded value, rather than just use mysql_real_escape_string on the binary string and use that.

// That you should do
$hex = bin2hex($raw_bin);
$sql = "INSERT INTO `table`(`file`) VALUES (X'{$hex}')";

// Rather than
$bin = mysql_real_escape_string($raw_bin);
$sql = "INSERT INTO `table`(`file`) VALUES ('{$bin}')";

It is supposedly for performance reasons. Something to do with how MySQL handles large strings vs. how it handles HEX-coded values

However, I am having a hard time confirming this. All my tests indicate the exact oposite; that the bin2hex method is ~85% slower and uses ~24% more memory.
(I am testing this on PHP 5.3, MySQL 5.1, Win7 x64 - Using a farily simple insert loop.)

For instance, this graph shows the private memory usage of the mysqld process while the test code was running:

Private Bytes used by the mysqld process
(source: advefir.com)

Does anybody have any explainations or reasources that would clarify this?

Thanks.

Glorfindel
  • 19,729
  • 13
  • 67
  • 91
Atli
  • 7,479
  • 2
  • 27
  • 42
  • There might be a difference performance when you use `"INSERT INTO \`table\`(\`file\`) VALUES (X{$hex})";` (remove the quotes from aroud the hex value)? (+1 btw) – Jacco Apr 02 '10 at 10:47
  • @Jacco Thanks for the suggestion. I did a couple of tests and the two methods seem to perform almost identically. The `X'...'` method seems to have a slight edge though, both in memory and CPU usage. - I edited the results together and uploaded them, in case you are interested: http://atli.advefir.com/images/myisam_joined.png, http://atli.advefir.com/images/innodb_joined.png – Atli Apr 02 '10 at 15:01
  • Interesting, I would really like to a DBA to explain the 'why' here. – Jacco Apr 02 '10 at 16:04

4 Answers4

9

This sounds like an urban legend to me.

bin2hex() maps each byte in the input to two bytes in the output ('a' -> '61'), so you should notice a significant memory increase of the script performing the query - it should use at least as much memory more as the byte length of the binary data to be inserted.

Furthermore, this implies that running bin2hex() on a long string takes much longer than running mysql_real_escape string(), which - as explained in MySQL's documentation - just escapes 6 characters: NULL, \r, \n, \, , and 'Control-Z'.

That was for the PHP part, now for MySQL: The server needs to do the reverse operation to store the data correctly. Reversing either of the functions takes almost as long as the original operation - the reverse function of mysql_real_escape_string() needs to replace escaped values (\\) with unescaped ones (\), whereas the reverse of bin2hex() would need to replace each and every byte tuple with a new byte.

Since calling mysql_real_escape_string() on binary data is safe (according to MySQL's and PHP's documentation or even when just considering that the operation does not do any other conversions than the ones listed above), it would make absolutely no sense to perform such a costly operation.

soulmerge
  • 68,989
  • 18
  • 113
  • 147
  • That makes sense. The added memory required to store the query string in PHP alone would seem enough of a reason to avoid the `bin2hex` function, and my own tests indicate that MySQL suffers as well. And then there is the added CPU cost of the conversions. - This is indeed sounding more and more like an urban legend. Still, I have to wonder what got this started; why people think this is a good idea. – Atli Apr 11 '10 at 13:24
  • I would think some people thought it might be unwise to encode binary data with a function called `..._escape_string` or to send binary data within a human-readable SQL statement, but there is actually nothing wrong with that (although there could have been an alias for the function - like `mysql_escape_data()` or similar) – soulmerge Apr 11 '10 at 20:22
  • Good points. I can see how people might look at it that way, especially those coming from strongly-typed languages. - I never really considered them all that different, though. I mean, PHP strings are basically just byte arrays, just like the binary data. (Until PHP 6, at least.) – Atli Apr 12 '10 at 06:33
  • The reason I read for this was: MySQL could use a stream operator for the hex data so it did not have to load the entire string into memory. Apparently, this is not true. – Jacco Apr 13 '10 at 09:10
5

I've been testing this myself, and I've come up with pretty consistent results. (Even though my tests are a tad crude.)

I've tested three computers

  1. Windows 7 (x64), PHP 5.3, MySQL 5.1
  2. Ubuntu 9.10 (x64) PHP 5.2, MySQL 5.1
  3. Ubuntu 10.04 (x32) PHP 5.3, MySQL 5.1

So far the tests on all three platforms have indicated the same tings:

  • Inserting into a BLOB is 2x to 8x faster on MyISAM than on InnoDB. The difference seems to be higher on binary strings than HEX-coded strings. (See the data below)
  • Using a HEX coded string (bin2hex into a X'...') uses more memory, on average, than using an escaped binary string (mysql_real_escape_string on the raw data). - This seems true for both MyISAM and InnoDB.
  • The binary string is faster on MyISAM, but the HEX-coded data is faster on InnoDB.

The test was basically just a simple loop that escaped or hex-coded the raw data (a 2.4 MiB image retrieved once at the top of the script), constructed the query string, and executed it via the mysql_query or mysqli::query functions. - I tested with both extensions. Didn't seem to be any difference.

I put the results from the Ubuntu 10.04 (#3) up in spreadsheets. The results from the Ubuntu 9.10 (#2) machine were pretty much the same, so I didn't bother set them up:
(Finally an excuse to test the Google Docs thing properly! xD)

These graphs show the private memory usage by the mysqld process on the Win7 (#1) machine.

Atli
  • 7,479
  • 2
  • 27
  • 42
4

A hex string is significantly longer than the corresponding binary string. Simply the transfer time and the copying it around inside PHP's and MySQL's memory may do the trick.

In all honesty I'm not expert on the underlying implementation, but wouldn't it be better to not pass the data inside the SQL at all, but using e.g. PDOStatement's parameter binding? Maybe someone more knowledgeable here can confirm whether that will indeed cause the data to be sent as a binary string, outside any SQL statement at all, or whether PDO just does the escaping and query string manipulation under the hood.

Either way, you get a security (and simplicity) benefit right there.

skrebbel
  • 9,557
  • 6
  • 32
  • 34
  • Thanks for the reply. That was also my first though; that the conversion process and the added length of the string would cause decreased performance. And it appears we are correct. - However, I've now found several pages that show the use of the `bin2hex` function (or even, more disturbingly, the Base64 functions), and I can't see a reason why. It makes no sense... -- Btw, personally I do use prepared statements (mysqli, usually). This question is more hypothetical than practical :) – Atli Apr 01 '10 at 08:13
  • bin2hex/base64 would avoid any character set issues, if the table were (mistakenly) created with TEXT fields, instead of BLOB. But at the cost of an up-to-3x increase in data size (assuming the data is fully non-ascii and fully converted to %xx%yy%zz...) – Marc B Apr 01 '10 at 14:29
  • Both hex and base64 increase the size of the data send. With hex the data is stored as binary. With later the data is stored in base64 encoded format and thus 33% larger. (But this does not answer the OP's question) – Jacco Apr 02 '10 at 10:51
0

for example if you encounter similar problem as described here: http://www.php.net/manual/en/function.mysql-real-escape-string.php#82015

e.g. even though mysql_real_escape_string seems to be "binary-safe" you can't use it (as an example) in combination with igbinary_serialize - unserializing will just fail.

in that case you need to bin2hex prior inserting the data into mysql.

Also, usually you read more often data from mysql than inserting :)

lifeofguenter
  • 1,066
  • 1
  • 13
  • 19