251

Let's suppose I have just used a BufferedInputStream to read the bytes of a UTF-8 encoded text file into a byte array. I know that I can use the following routine to convert the bytes to a string, but is there a more efficient/smarter way of doing this than just iterating through the bytes and converting each one?

public String openFileToString(byte[] _bytes)
{
    String file_string = "";

    for(int i = 0; i < _bytes.length; i++)
    {
        file_string += (char)_bytes[i];
    }

    return file_string;    
}
Raedwald
  • 40,290
  • 35
  • 127
  • 207
skeryl
  • 4,777
  • 2
  • 22
  • 28
  • 19
    Why can't you just do this `String fileString = new String(_bytes,"UTF-8");` ? – CoolBeans Dec 14 '11 at 21:51
  • 1
    Alternatively, you could use BufferedReader to read into a char array. – Andy Thomas Dec 14 '11 at 21:51
  • possible duplicate of [In Java, how do I read/convert an InputStream to a String?](http://stackoverflow.com/questions/309424/in-java-how-do-i-read-convert-an-inputstream-to-a-string) – Bruno Dec 14 '11 at 21:58
  • 1
    @CoolBeans I could if I had known to do that ;) Thank you. – skeryl Dec 14 '11 at 22:08
  • Depending on the file size, I'm not sure loading the whole `byte[]` in memory and converting it via `new String(_bytes,"UTF-8")` (or even by chunks with `+=` on the string) is the most efficient. Chaining InputStreams and Readers might work better, especially on large files. – Bruno Dec 14 '11 at 22:13
  • @Bruno - That's a valid observation. I guess he will find out if he starts getting out of memory exceptions :) – CoolBeans Dec 14 '11 at 22:14
  • Your provided cide does **not** decode UTF-8. It does not handle *any* of the code points that require multiple bytes. – Raedwald Apr 10 '15 at 07:11

11 Answers11

511

Look at the constructor for String

String str = new String(bytes, StandardCharsets.UTF_8);

And if you're feeling lazy, you can use the Apache Commons IO library to convert the InputStream to a String directly:

String str = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
gladed
  • 1,580
  • 1
  • 15
  • 22
Jason Nichols
  • 11,063
  • 4
  • 31
  • 52
  • 13
    Or Guava's [Charsets.UTF_8](https://code.google.com/p/guava-libraries/wiki/StringsExplained#Charsets) if you are on JDK older than 1.7 – siledh Oct 01 '13 at 08:24
  • 6
    Use Guava's Charsets.UTF_8 if you are on Android API below 19 too – Ben Clayton Oct 23 '14 at 09:50
  • And if checkstyle says: "Illegal Instantiation: Instantiation of java.lang.String should be avoided.", then what? – Attila Neparáczki Oct 29 '14 at 08:58
  • 1
    You can see in here the `java.nio.charset.Charset.availableCharsets()` map all the charsets not just the charsets in the `StandardCharsets`. And if you want to use some other charset and still want to prevent the String constructor from throwing `UnsupportedEncodingException` you may use `java.nio.charset.Charset.forName()` – nyxz Feb 15 '15 at 15:46
  • 2
    IOUtils.toString(inputStream, StandardCharsets.UTF_8) is deprecated now. – Aung Myat Hein May 11 '16 at 04:48
  • This will crash on large data with out-of-memory on a TC75 Zebra device. – Dayan May 03 '18 at 18:26
  • Isn't using the String constructor discouraged as it may result in having two different string objects containing the same character data? – Sero Jun 09 '18 at 11:11
41

Java String class has a built-in-constructor for converting byte array to string.

byte[] byteArray = new byte[] {87, 79, 87, 46, 46, 46};

String value = new String(byteArray, "UTF-8");
Mmir
  • 337
  • 3
  • 10
Kashif Khan
  • 2,503
  • 12
  • 14
11

To convert utf-8 data, you can't assume a 1-1 correspondence between bytes and characters. Try this:

String file_string = new String(bytes, "UTF-8");

(Bah. I see I'm way to slow in hitting the Post Your Answer button.)

To read an entire file as a String, do something like this:

public String openFileToString(String fileName) throws IOException
{
    InputStream is = new BufferedInputStream(new FileInputStream(fileName));

    try {
        InputStreamReader rdr = new InputStreamReader(is, "UTF-8");
        StringBuilder contents = new StringBuilder();
        char[] buff = new char[4096];
        int len = rdr.read(buff);
        while (len >= 0) {
            contents.append(buff, 0, len);
        }
        return buff.toString();
    } finally {
        try {
            is.close();
        } catch (Exception e) {
            // log error in closing the file
        }
    }
}
Ted Hopp
  • 222,293
  • 47
  • 371
  • 489
4

You can use the String(byte[] bytes) constructor for that. See this link for details. EDIT You also have to consider your plateform's default charset as per the java doc:

Constructs a new String by decoding the specified array of bytes using the platform's default charset. The length of the new String is a function of the charset, and hence may not be equal to the length of the byte array. The behavior of this constructor when the given bytes are not valid in the default charset is unspecified. The CharsetDecoder class should be used when more control over the decoding process is required.

GETah
  • 19,549
  • 6
  • 51
  • 95
  • 1
    And if your bytes are not in the platform's default charset, you can use the version that has the second `Charset` argument to make sure the conversion is correct. – Mike Daniels Dec 14 '11 at 21:50
  • 1
    @MikeDaniels Indeed, I did not want to include all the details. Just edited my answer – GETah Dec 14 '11 at 21:54
2

Here's a simplified function that will read in bytes and create a string. It assumes you probably already know what encoding the file is in (and otherwise defaults).

static final int BUFF_SIZE = 2048;
static final String DEFAULT_ENCODING = "utf-8";

public static String readFileToString(String filePath, String encoding) throws IOException {

    if (encoding == null || encoding.length() == 0)
        encoding = DEFAULT_ENCODING;

    StringBuffer content = new StringBuffer();

    FileInputStream fis = new FileInputStream(new File(filePath));
    byte[] buffer = new byte[BUFF_SIZE];

    int bytesRead = 0;
    while ((bytesRead = fis.read(buffer)) != -1)
        content.append(new String(buffer, 0, bytesRead, encoding));

    fis.close();        
    return content.toString();
}
scottt
  • 7,931
  • 1
  • 28
  • 39
2

Knowing that you are dealing with a UTF-8 byte array, you'll definitely want to use the String constructor that accepts a charset name. Otherwise you may leave yourself open to some charset encoding based security vulnerabilities. Note that it throws UnsupportedEncodingException which you'll have to handle. Something like this:

public String openFileToString(String fileName) {
    String file_string;
    try {
        file_string = new String(_bytes, "UTF-8");
    } catch (UnsupportedEncodingException e) {
        // this should never happen because "UTF-8" is hard-coded.
        throw new IllegalStateException(e);
    }
    return file_string;
}
Asaph
  • 147,774
  • 24
  • 184
  • 187
2

You could use the methods described in this question (especially since you start off with an InputStream): Read/convert an InputStream to a String

In particular, if you don't want to rely on external libraries, you can try this answer, which reads the InputStream via an InputStreamReader into a char[] buffer and appends it into a StringBuilder.

Community
  • 1
  • 1
Bruno
  • 110,518
  • 24
  • 258
  • 357
1

String has a constructor that takes byte[] and charsetname as parameters :)

soulcheck
  • 34,060
  • 6
  • 82
  • 89
0

I use this way

String strIn = new String(_bytes, 0, numBytes);

  • 1
    This doesn't specify a character set so you get the platform default character set which may well not be UTF-8. – greg-449 Apr 17 '17 at 10:24
0

This also involves iterating, but this is much better than concatenating strings as they are very very costly.

public String openFileToString(String fileName)
{
    StringBuilder s = new StringBuilder(_bytes.length);

    for(int i = 0; i < _bytes.length; i++)
    {
        s.append((char)_bytes[i]);
    }

    return s.toString();    
}
bragboy
  • 32,353
  • 29
  • 101
  • 167
0

Why not get what you are looking for from the get go and read a string from the file instead of an array of bytes? Something like:

BufferedReader in = new BufferedReader(new InputStreamReader( new FileInputStream( "foo.txt"), Charset.forName( "UTF-8"));

then readLine from in until it's done.

digitaljoel
  • 25,150
  • 14
  • 83
  • 114