0

I am writing a server program where I am reading an UTF-8 encoded byte stream from a network socket and continuously interpreting these characters.

For characters which take more than one bytes to represent, sometime I just receive first byte of the character on the socket and the program interprets this byte to an invalid character.

For example, client runs below code:-

  String s = "Cañ";

  byte[] b = s.getBytes("UTF-8");

  //sending first three bytes
  send(b, 0, 3));   //send(byte[], offset, length)

  //sending last byte
  send(b, 3, 1);

When server receives first three bytes, it decodes them to Ca?.

How can i detect character boundaries on server?

The code given is made up to produce the issue. The character is broken by TCP sometimes, I believe.

JFreeman
  • 848
  • 1
  • 8
  • 25
  • 3
    possible duplicate of http://stackoverflow.com/questions/8512121/utf-8-byte-to-string – Yoav Gur May 19 '17 at 05:12
  • How exactly does the server "receive" the bytes? When reading character data you should not try reading a raw `InputStream` but rather wrap it up in an `InputStreamReader` that knows about things like characters and UTF-8 – piet.t May 19 '17 at 06:17

1 Answers1

0

The TCP protocol is reliable, you may lost some packet sometimes if the network jams. U can design a protocol yourself.By setting the first and last tag of your protocol data frame, you can check whether you have received the full data easily.

dawnfly
  • 83
  • 2