You seem to be misunderstanding something.
For all the system cares, and, MOST OF THE TIME, the developer cares, char
s could as well be carrier pigeons, and String
s sequence of said carrier pigeons. Although yes, internally, strings are sequences of char
s (which are more precisely UTF-16 code units), this is not the problem at hand here.
You don't write char
s into files, neither do you read char
s from files. You write, and read, bytes.
And in order to read a sequence of bytes as a sequence of chars/carrier pigeons, you need a decoder; similarly (and this is what you do here), in order to turn chars/carrier pigeons into bytes, you need an encoder. In Java, both of these are available from a Charset
.
String.getBytes()
just happens to use an encoder with the default platform character coding (obtained using Charset.defaultCharset()
), and it happens that for your input string "ABC"
and your JRE implementation, the sequence of bytes generated is 65, 66, 67. Hence the result.
Now, try and String.getBytes(Charset.forName("UTF-32LE"))
, and you'll get a different result.