8

Currently I am using this code for converting string to byte array:

var tempByte = System.Text.Encoding.UTF8.GetBytes(tempText);

I call this line very often in my application, and I really want to use a faster one. How can I convert a string to a byte array faster than the default GetBytes method? Maybe with an unsafe code?

Wheeler
  • 266
  • 4
  • 10
  • 2
    Are you a) actually running into performance problems and b) sure it is this part that is causing those problems? – Bart Friederichs Nov 28 '13 at 19:28
  • I like to optimize the code, and this line is the most critical one in time according to the profiler. – Wheeler Nov 28 '13 at 19:30
  • Why would unsafe code help? What makes you think this code is a bottleneck? What makes you think it can be improved? What are your performance requirements? – David Heffernan Nov 28 '13 at 19:31
  • `GetBytes` *does* use unsafe code already. – Peter Ritchie Nov 28 '13 at 19:32
  • First, why do you want to optimize it? Is it actually problematic as it is? And second, have you considered optmizing the code, instead of trying to make the most-called-function faster? Perhaps you can do other things like loop unrolling or a better algorithm that will call this method less often. Use caching, dynamic programming, etc, etc. More often than not, trying to optimize a built-in function is not the way to go. – Bart Friederichs Nov 28 '13 at 19:33
  • 2
    If you need to be using UTF8 a lot, it might be faster to simply work with byte arrays rather than convert from Unicode to UTF8 all the time. – Peter Ritchie Nov 28 '13 at 19:34
  • I dont know if this could be improved, that is why asked the question. A lot of built in functions can be outrun by a faster implementation, like the GDI or the Crypto ones. – Wheeler Nov 28 '13 at 19:34
  • Peter Ritchie just gave me an idea, thank you, it can be a huge improvement! – Wheeler Nov 28 '13 at 19:35
  • How about that approach: http://stackoverflow.com/questions/472906/net-string-to-byte-array-c-sharp? – MarcinJuraszek Nov 28 '13 at 21:14

1 Answers1

11

If you don't care too much about using specific encoding and your code is performance-critical (for instance it's some kind of DB serializer and needs to be run millions of times per second), try

fixed (void* ptr = tempText)
{
    System.Runtime.InteropServices.Marshal.Copy(new IntPtr(ptr), tempByte, 0, len);
}

Edit: Marshal.Copy was around ten times faster than UTF8.GetBytes and gets you UTF-16 encoding. For converting it back to string you can use:

fixed (byte* bptr = tempByte)
{
    char* cptr = (char*)(bptr + offset);
    tempText = new string(cptr, 0, len / 2);
}
MagnatLU
  • 5,539
  • 1
  • 19
  • 16
  • This is utterly bizarre. Optimise converting to UTF8 by, er, what exactly? – David Heffernan Nov 28 '13 at 23:09
  • By using UTF-16 instead of UTF-8 and expliting fact, that internal memory representation of .NET string is already in that format and all you need to do to get it is copy memory block instead of actually converting string character by character to desired encoding. – MagnatLU Nov 29 '13 at 07:09
  • I just cannot see how it relates to the question which clearly and deliberately converts to UTF8. If you want a UTF16 representation then the code in your answer is just as pointless. Just take a copy of the string reference! Why even bother with byte[]. And the use of unsafe code here seems pointless also. – David Heffernan Nov 29 '13 at 07:14
  • 6
    I had very silimar problem to Wheeler and for my project speed was much more important than particular encoding used (as long as there was fast way to decode it as well), so I shared my opinion on this topic. Wheeler wrote he needs to convert string to byte array and my code snippets do just that. If you disagree with my answer, you are free to downvote it and provide yours. – MagnatLU Nov 29 '13 at 16:19
  • I'm coming at this from the perspective of answering the question that was asked rather than solving the problem of the question asker. – David Heffernan Nov 29 '13 at 16:26
  • @MagnatLU "If you don't care too much about using specific encoding". My comment will be "you have to". The problem with this approach is **endianness**. This code is dangerous if you want to use it on different machines. Maybe it works in many situations, but it is contrary to the standards. It probably causes problems when you want to scale. You should care about encoding after all. To solve performance problems you'd better deal with binary arrays instead. – Ehsan88 May 20 '17 at 08:01
  • how to use? is that a method? and len was undefined – nyconing Sep 27 '19 at 09:29