How to convert UTF-8 byte[] to string?

Question

I have a byte[] array that is loaded from a file that I happen to known contains UTF-8.

In some debugging code, I need to convert it to a string. Is there a one liner that will do this?

Under the covers it should be just an allocation and a memcopy, so even if it is not implemented, it should be possible.

"should be just an allocation and a memcopy": is not correct because a .NET string is UTF-16 encoded. A Unicode character might be one UTF-8 code unit or one UTF-16 code unit. another might be two UTF-8 code units or one UTF-16 code unit, another might be three UTF-8 code units or one UTF-16 code unit, another might be four UTF-8 code units or two UTF-16 code units. A memcopy might be able to widen but it wouldn't be able to handle UTF-8 to UTF-16 conversion. — Tom Blodget, Nov 19 '16 at 01:01

score 1570 · Accepted Answer · edited Feb 12 '15 at 20:19

1570

string result = System.Text.Encoding.UTF8.GetString(byteArray);

edited Feb 12 '15 at 20:19

James Webster

3,887
27
41

answered Jun 16 '09 at 18:49

Zanoni

26,952
11
51
72

15

how does it handle null ended strings ? – maazza May 12 '15 at 12:43
17

@maazza for unknown reason it doesn't at all. I'm calling it like `System.Text.Encoding.UTF8.GetString(buf).TrimEnd('\0');`. – Hi-Angel Jul 27 '15 at 07:53
19

@Hi-Angel Unknown reason? The only reason null-terminated strings ever became popular was the C language - and even that was only because of a historical oddity (CPU instructions that dealt with null-terminated strings). .NET only uses null-terminated strings when interopping with code that uses null-terminated strings (which are *finally* disappearing). It's perfectly valid for a string to contain NUL characters. And of course, while null-terminated strings are dead simple in ASCII (just build until you get the first zero byte), other encodings, including UTF-8, are not so simple. – Luaan Nov 23 '15 at 10:05
4

One of the beautiful features of UTF-8 is that a shorter sequence is never a subsequence of a longer sequence. So a null terminated UTF-8 string is simple. – plugwash Nov 24 '15 at 17:00
12

Well, good luck unpacking it if it has non-ascii. Just use Convert.ToBase64String. – Erik Bergstedt Dec 12 '15 at 10:30
1

[Example](https://dotnetfiddle.net/iNVGzi) demonstrating this does not terminate with null characters. `Encoding.Ascii` yields same results – Assimilater Jun 29 '17 at 00:02
I am very happy to be able to use your knowledge, dear friends. Good luck and thank you for your individual explanations and answers. @Hi-Angel May I ask why did you use TrimEnd ?? – elnaz jangi Jun 12 '20 at 19:43
@elnazjangi I haven't used C# for a long time, but AFAIR in C# a null byte is a valid element of a string. Not a useful one though, so the `.TrimEnd('\0')` call simply removes these if they're found at the end. Regarding, why it's expected to be there: in C and C++ langs a null byte has special meaning, it marks the end of the string. So if you know you are under circumstances where the string you're getting from the buffer can be a zero-terminated one, you'd use this function call. – Hi-Angel Jun 12 '20 at 21:37

score 346 · Answer 2 · edited Jul 29 '17 at 02:32

There're at least four different ways doing this conversion.

Encoding's GetString
, but you won't be able to get the original bytes back if those bytes have non-ASCII characters.
BitConverter.ToString
The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array.
Convert.ToBase64String
You can easily convert the output string back to byte array by using Convert.FromBase64String.
Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it.
HttpServerUtility.UrlTokenEncode
You can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode. The output string is already URL friendly! The downside is it needs System.Web assembly if your project is not a web project.

A full example:

byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters

string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1);  // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results

string s2 = BitConverter.ToString(bytes);   // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
    decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes

string s3 = Convert.ToBase64String(bytes);  // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes

string s4 = HttpServerUtility.UrlTokenEncode(bytes);    // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes

LINQ it: `var decBytes2 = str.Split('-').Select(ch => Convert.ToByte(ch, 16)).ToArray();` — drtf, Jul 13 '14 at 14:43
This should be the accepted answer. It perfectly illustrates the output of multiple methods. The current accepted answer shows only one, which may be problematic for some developers who don't scroll this far down. - unless you sort by votes, of course. — dimitar.bogdanov, Apr 11 '21 at 08:36

score 29 · Answer 3 · edited Dec 22 '15 at 14:31

29

A general solution to convert from byte array to string when you don't know the encoding:

static string BytesToStringConverted(byte[] bytes)
{
    using (var stream = new MemoryStream(bytes))
    {
        using (var streamReader = new StreamReader(stream))
        {
            return streamReader.ReadToEnd();
        }
    }
}

edited Dec 22 '15 at 14:31

slavoo

4,967
63
33
38

answered Sep 20 '15 at 08:24

Nir

1,435
18
19

4

But this assumes that there is either an encoding BOM in the byte stream or that it is in UTF-8. But you can do the same with Encoding anyway. It doesn't magically solve the problem when you don't know the encoding. – Sebastian Zander Sep 26 '17 at 17:05

score 12 · Answer 4 · edited Aug 01 '15 at 20:56

12

Definition:

public static string ConvertByteToString(this byte[] source)
{
    return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}

Using:

string result = input.ConvertByteToString();

edited Aug 01 '15 at 20:56

Peter Mortensen

28,342
21
95
123

answered Oct 16 '14 at 01:04

Erçin Dedeoğlu

3,688
4
39
58

score 9 · Answer 5 · edited Jun 29 '15 at 07:42

9

Converting a byte[] to a string seems simple but any kind of encoding is likely to mess up the output string. This little function just works without any unexpected results:

private string ToString(byte[] bytes)
{
    string response = string.Empty;

    foreach (byte b in bytes)
        response += (Char)b;

    return response;
}

edited Jun 29 '15 at 07:42

Erçin Dedeoğlu

3,688
4
39
58

answered Apr 22 '15 at 11:48

AndrewJE

830
2
9
19

I received System.FormatException using your method when I unpacked it with Convert.FromBase64String. – Erik Bergstedt Dec 12 '15 at 10:20
@ AndrewJE this will take for even to compute if you have a large byte array like the one used from the pictures. – user3841581 Nov 04 '17 at 16:55

metadings · Answer 6 · 2015-11-29T16:44:16.653

Using (byte)b.ToString("x2"), Outputs b4b5dfe475e58b67

public static class Ext {

    public static string ToHexString(this byte[] hex)
    {
        if (hex == null) return null;
        if (hex.Length == 0) return string.Empty;

        var s = new StringBuilder();
        foreach (byte b in hex) {
            s.Append(b.ToString("x2"));
        }
        return s.ToString();
    }

    public static byte[] ToHexBytes(this string hex)
    {
        if (hex == null) return null;
        if (hex.Length == 0) return new byte[0];

        int l = hex.Length / 2;
        var b = new byte[l];
        for (int i = 0; i < l; ++i) {
            b[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
        }
        return b;
    }

    public static bool EqualsTo(this byte[] bytes, byte[] bytesToCompare)
    {
        if (bytes == null && bytesToCompare == null) return true; // ?
        if (bytes == null || bytesToCompare == null) return false;
        if (object.ReferenceEquals(bytes, bytesToCompare)) return true;

        if (bytes.Length != bytesToCompare.Length) return false;

        for (int i = 0; i < bytes.Length; ++i) {
            if (bytes[i] != bytesToCompare[i]) return false;
        }
        return true;
    }

}

score 4 · Answer 7 · answered May 18 '15 at 13:38

4

There is also class UnicodeEncoding, quite simple in usage:

ByteConverter = new UnicodeEncoding();
string stringDataForEncoding = "My Secret Data!";
byte[] dataEncoded = ByteConverter.GetBytes(stringDataForEncoding);

Console.WriteLine("Data after decoding: {0}", ByteConverter.GetString(dataEncoded));

answered May 18 '15 at 13:38

P.K.

1,685
4
22
56

But not UTF-8 methinks? – david.pfx Jul 14 '15 at 10:36
2

`UnicodeEncoding` is the worst class name ever; unicode isn't an encoding at all. That class is actually UTF-16. The little-endian version, I think. – Nyerguds Nov 17 '16 at 08:16

score 4 · Answer 8 · answered Jan 05 '17 at 10:53

4

BitConverter class can be used to convert a byte[] to string.

var convertedString = BitConverter.ToString(byteAttay);

Documentation of BitConverter class can be fount on MSDN

answered Jan 05 '17 at 10:53

Sagar

446
9
21

1

This converts the byte array to a hexadecimal string representing each byte, which is generally not what you want when converting bytes to a string. If you do, then that's another question, see for example [How do you convert Byte Array to Hexadecimal String, and vice versa?](http://stackoverflow.com/questions/311165/how-do-you-convert-byte-array-to-hexadecimal-string-and-vice-versa). – CodeCaster Jan 05 '17 at 10:59
Not what OP asked – Winter Jul 19 '17 at 13:46

score 3 · Answer 9 · answered Sep 15 '16 at 05:55

3

Alternatively:

 var byteStr = Convert.ToBase64String(bytes);

answered Sep 15 '16 at 05:55

Fehr

308
3
4

Nyerguds · Answer 10 · 2016-11-17T08:24:19.190

A Linq one-liner for converting a byte array byteArrFilename read from a file to a pure ascii C-style zero-terminated string would be this: Handy for reading things like file index tables in old archive formats.

String filename = new String(byteArrFilename.TakeWhile(x => x != 0)
                              .Select(x => x < 128 ? (Char)x : '?').ToArray());

I use '?' as default char for anything not pure ascii here, but that can be changed, of course. If you want to be sure you can detect it, just use '\0' instead, since the TakeWhile at the start ensures that a string built this way cannot possibly contain '\0' values from the input source.

Assimilater · Answer 11 · 2017-06-29T00:24:18.723

To my knowledge none of the given answers guarantee correct behavior with null termination. Until someone shows me differently I wrote my own static class for handling this with the following methods:

// Mimics the functionality of strlen() in c/c++
// Needed because niether StringBuilder or Encoding.*.GetString() handle \0 well
static int StringLength(byte[] buffer, int startIndex = 0)
{
    int strlen = 0;
    while
    (
        (startIndex + strlen + 1) < buffer.Length // Make sure incrementing won't break any bounds
        && buffer[startIndex + strlen] != 0       // The typical null terimation check
    )
    {
        ++strlen;
    }
    return strlen;
}

// This is messy, but I haven't found a built-in way in c# that guarentees null termination
public static string ParseBytes(byte[] buffer, out int strlen, int startIndex = 0)
{
    strlen = StringLength(buffer, startIndex);
    byte[] c_str = new byte[strlen];
    Array.Copy(buffer, startIndex, c_str, 0, strlen);
    return Encoding.UTF8.GetString(c_str);
}

The reason for the startIndex was in the example I was working on specifically I needed to parse a byte[] as an array of null terminated strings. It can be safely ignored in the simple case

Mine does, actually. `byteArr.TakeWhile(x => x != 0)` is a quick and easy way to solve the null termination problem. — Nyerguds, Sep 21 '17 at 09:11

The One · Answer 12 · 2019-02-21T16:38:04.820

2

In adition to the selected answer, if you're using .NET35 or .NET35 CE, you have to specify the index of the first byte to decode, and the number of bytes to decode:

string result = System.Text.Encoding.UTF8.GetString(byteArray,0,byteArray.Length);

edited Feb 21 '19 at 16:38

answered Feb 01 '19 at 19:55

The One

3,729
5
27
44

Antonio Leonardo · Answer 13 · 2020-05-21T21:40:08.520

I saw some answers at this post and it's possible to be considered completed base knowledge, because have a several approaches in C# Programming to resolve the same problem. Only one thing that is necessary to be considered is about a difference between Pure UTF-8 and UTF-8 with B.O.M..

In last week, at my job, I need to develop one functionality that outputs CSV files with B.O.M. and other CSVs with pure UTF-8 (without B.O.M.), each CSV file Encoding type will be consumed by different non-standardized APIs, that one API read UTF-8 with B.O.M. and the other API read without B.O.M.. I need to research the references about this concept, reading "What's the difference between UTF-8 and UTF-8 without B.O.M.?" Stack Overflow discussion and this Wikipedia link "Byte order mark" to build my approach.

Finally, my C# Programming for the both UTF-8 encoding types (with B.O.M. and pure) needed to be similar like this example bellow:

//for UTF-8 with B.O.M., equals shared by Zanoni (at top)
string result = System.Text.Encoding.UTF8.GetString(byteArray);

//for Pure UTF-8 (without B.O.M.)
string result = (new UTF8Encoding(false)).GetString(byteArray);

Don't you need to specifically strip the BOM off the start though? As far as I know, even if you use a UTF8Encoding with BOM, it will not strip that off automatically. — Nyerguds, Jan 14 '21 at 13:03
@Nyerguds, the UTF8Encoding object with "false" value at parameter is without BOM. — Antonio Leonardo, Feb 12 '21 at 17:42
No, I mean, if the text has a BOM, even the `System.Text.Encoding.UTF8` will _not_ automatically strip that off. Try it out. — Nyerguds, Feb 14 '21 at 01:41

score 1 · Answer 14 · answered Sep 29 '19 at 07:49

Try this console app:

static void Main(string[] args)
{
    //Encoding _UTF8 = Encoding.UTF8;
    string[] _mainString = { "Héllo World" };
    Console.WriteLine("Main String: " + _mainString);

    //Convert a string to utf-8 bytes.
    byte[] _utf8Bytes = Encoding.UTF8.GetBytes(_mainString[0]);

    //Convert utf-8 bytes to a string.
    string _stringuUnicode = Encoding.UTF8.GetString(_utf8Bytes);
    Console.WriteLine("String Unicode: " + _stringuUnicode);
}

score 0 · Answer 15 · answered Jul 06 '18 at 13:27

hier is a result where you didnt have to bother with encoding. I used it in my network class and send binary objects as string with it.

        public static byte[] String2ByteArray(string str)
        {
            char[] chars = str.ToArray();
            byte[] bytes = new byte[chars.Length * 2];

            for (int i = 0; i < chars.Length; i++)
                Array.Copy(BitConverter.GetBytes(chars[i]), 0, bytes, i * 2, 2);

            return bytes;
        }

        public static string ByteArray2String(byte[] bytes)
        {
            char[] chars = new char[bytes.Length / 2];

            for (int i = 0; i < chars.Length; i++)
                chars[i] = BitConverter.ToChar(bytes, i * 2);

            return new string(chars);
        }

didnt have one. But this function is in use for binary transmission in our company-network and so far 20TB were re- and encoded correctly. So for me this function works :) — Marco Pardo, Sep 17 '18 at 19:10

How to convert UTF-8 byte[] to string?

15 Answers15

Linked

Related