I am writing a server program where I am reading an UTF-8 encoded byte stream from a network socket and continuously interpreting these characters.
For characters which take more than one bytes to represent, sometime I just receive first byte of the character on the socket and the program interprets this byte to an invalid character.
For example, client runs below code:-
String s = "Cañ";
byte[] b = s.getBytes("UTF-8");
//sending first three bytes
send(b, 0, 3)); //send(byte[], offset, length)
//sending last byte
send(b, 3, 1);
When server receives first three bytes, it decodes them to Ca?.
How can i detect character boundaries on server?
The code given is made up to produce the issue. The character is broken by TCP sometimes, I believe.