0

Short version:

I'm trying to segment a stream of bytes that may represent anything (for example, a TCP stream) and send them over multiple (unreliable) links to a receiving system under my control. There, I want to reorder these segments into the order the sending system first received them. See the second ascii image for an example. The resulting tunnel doesn't need to be as reliable as TCP (UDP-ish is fine), but it should handle the fact that multiple inner links are being used (which is not trivial).

Using a header that contains a size and sequence number field would solve all problems, but only in an perfect world. If one of the links misbehave and the peers get out of sync (meaning, the receiver will read bytes that it thinks should constitute a header, but don't) some mechanism of resyncing is needed. In regard to UDP, I suppose the major network stacks parse the complete UDP header and make an educated guess on whether it got out of sync, but are there good ways of getting back on track after such failure?

Longer version, use if more details/context is needed:

I'm experimenting with creating a single tunnel on top of several distinct links/tunnels. These links may be unreliable and it doesn't matter how they have been set up: sockets, tun, tap, what have you. The tunnel that I'm creating doesn't care. What it does care about, obviously, is that its customers get a good service, similar to what you would expect from UDP.

Current version:

I've implemented such a tool on a single, reliable link, as a first version (simply put, I ported the source referenced here to C++):

 +------\     TCP     /------+
tun   socket <---> socket   tun
 +------/             \------+
  client               server

How this works is simple: the client reads bytes from its tun device and writes them into the socket. Both are simply implemented as performing a Linux read and write on file descriptors. The server does the reverse for receiving. This works full-duplex. Noteworthy here is:

  • The client and server don't care about what the bytes they operate on represent, they just move them over to the other side. I have to assume that the Linux networking stack is doing smart stuff to understand what the bytes it gets mean, even in case of weird fragmentation.
  • The two endpoints implement a very simply protocol, where each bunch of data that is written into the socket is preceded with the amount of bytes that are to be read (again, all this comes from the url above). This is how the endpoints keep track of things (it's only for improving performance, as far as I can see)

Next version:

Now, instead of the single stream socket, these endpoints will have to talk to each other using multiple links, like:

  +----------\           /----------+
  |    .-- tun1A <---> tun1B --.    | 
tun0A ----  tapA <---> tapB  ---- tun0B
  |    '--  socA <---> socB  --'    |
  +----------/           \----------+

As there's no easy way of telling whether an endpoint has read a full TCP segment on its tun0 interface, I'm planning to keep regarding everything that comes in on the tun devices simply as a byte stream. I want to send it to the other side using any of the three links, with no regard whatsoever to what the original byte stream represented. This way, a bunch of bytes that originally represented a single TCP segment may be sent to the other side using 4 separate packages. These packages need to be reordered at the receiving end, which is no problem in itself: just add a second field to the one that I already have: a sequence number, right?.

But, what if the underlying link is unreliable (which it probably is)? If the packages simply get a bit swapped in their payload, I assume, for now, that the networking stack at the receiving side is capable of coping with it. But what if the header gets broken? The endpoint might learn a wrong number of bytes that it should read and the two endpoints go out of sync with no apparent fix.

I've looked at implementing it using acks and timeouts of sorts, but I've not given up hope on a simpler and more elegant approach (remember that the end result doesn't need to be reliable). Some protocols use a reliable out-of-band channel (like TCP) for managing the payload tunnel, but the goal is to purely use the links. Also, even though Linux offers all kinds of cool networking tools (iproute2, netfilter, etc), all of this should be implemented in C/C++ (small concessions may be made here if a little use of those offers a good solution though).

Right now, I'm short on ideas and I'm hoping anyone could propose some other approaches to this problem! If more info is needed do ask, I'm happy to write more :)

*Originally put this question on networkengineering.stackexchange.

Edit 1

I'm thinking about wrapping every single "inner link" as shown in the image above with its own datagram socket wrapper. This socket would (hopefully) guarantee at least that I receive full datagrams so that I don't have to worry about the link breaking in case of truncated messages. It does add quite some overhead though.

delins
  • 105
  • 8
  • What is the transport? If it's UDP you don't have to worry about corrupted data, for example. – user207421 May 07 '16 at 13:17
  • What layer are you referring to here? The inner tunnels may make use of anything they like, preferably udp. What is being transported could be anything as well: the outer tun device should be able to be used as a default gateway. My main concern is that inner tunnels "stall" because of lost blocks – delins May 07 '16 at 15:28
  • Its not very clear what you want to achieve and I guess it's because you do not really understand the difference between stream-oriented protocol (SOCK_STREAM) and datagram-oriented ones (SOCK_DGRAM). You seem to want to create a stream (byte-oriented) by aggregating multiple underlying channels ("I'm trying to segment a **stream of bytes** that may represent anything") but at the same time you seems to be exposing TUN devices (tun0A, tun0B) which are (at least usually) not stream-oriented. – ysdx May 19 '16 at 14:30
  • "In regard to UDP, I suppose the major network stacks parse the complete UDP header and make an educated guess on whether it got out of sync", There is no such as thing at this level (IP or UDP). Your IP stack expects to have delimited IP packets and each packet contains one UDP datagram. This out-of-sync can is managed at layer 2 (MAC sublayer) and this does not involve any parsing or knowledge of IP or UDP. – ysdx May 19 '16 at 14:37
  • I'm familiar with the difference, just not so much in how both do what they do. Thanks for pointing out that layer 2 takes care of delimiting. Out of curiosity, does this also apply to transporting IP packets read from one tun device, over a socket connection, out to another tun device? Now it's not up to layer 2 but up to the application doing the socket connection to make sure the packets are delimited, correct? – delins May 22 '16 at 09:23

1 Answers1

1

It is difficult to understand your idea of out-of-sync because the network stack deals with that. In any case there are plenty of ways of solving that.

  • In TCP you have Data offset field that tells you where payload begins.
  • The size of the whole TCP segment is obtained from the field "Total Length" in the IP datagram header.
  • TCP has a checksum field providing some level of certainty that an error will be detected. IP has a header checksum. In UDP you can enable or disable that checksum.
  • At layer 1 and 2 there are additional technologies and methodologies for assuring the received frame is correct: synchornization, CRC, etc.

You application needs to implement its own application layer protocol and for dealing with messages coming in any order in different channels you can use different techniques:

  • First, have a way of determine the length of the application layer received message. Probably in TCP sockets you experienced that you might not read a complete message when recv or read returns. To deal with this you can implement a TLV format (Type, Length, Value).
  • For dealing with out of order and lost messages you can take a look at how IPv4 implements fragmentation. It's exactly the same problem. https://tools.ietf.org/html/rfc791
  • You can also look at the way TCP deals with lost segments while not discarding subsequent received segments
  • Sequence numbers and acknowledgements are essential to have a reliable connection oriented virtual tunnel.
rodolk
  • 4,969
  • 3
  • 22
  • 32
  • Thanks for the elaborate answer, to my admittedly vague question. I seemed to be thinking I should create my own transport layer protocol, which is nonsense in my case. What seems to be working pretty well now is working with sockets on top of the inner connections. Both TCP and UDP work fine. In the end I don't care about reordering at the receiving end anymore as the applications using this tunneling system should be able to handle it in the degree necessary (either they use TCP or some application layer protocol on top of UDP). My TCP connections are working with some form of TLV format. – delins May 22 '16 at 08:56
  • @user3499027, I'm happy you're progressing on this. Now just 2 recommendations: 1) Your idea of having multiple links is very interesting and necessary, if you have multiple links you will need to reorder messages at the application layer. 2) If you work with TCP please take into account TCP is stream oriented, there is a typical problem for socket programmers that I explain here (it's with Java but it's valid for any programming language): http://stackoverflow.com/questions/19839172/how-to-read-all-of-inputstream-in-server-socket-java/19863726#19863726 – rodolk May 22 '16 at 13:07
  • Reordering is probably indeed done if necessary, but not by my code :) I read about the inherent problem of read/write returning early before. I first read 3 bytes (in a loop) which comprises of 2 bytes showing the size of the "message" and 1 byte showing the message type (data or control (control isn't used yet, so, meh)). Then I again loop to get all the bytes that the "size" field tells me to get. The very crude poc (which it will stay for a long time, I'm afraid) can be found here: https://github.com/delins/multitun. No readme yet, but the tool has a "help" option :) – delins May 24 '16 at 21:34