0

I'm writing a program where a client tries to download a file from the server.

The following is the code for the client side.

FILE * file_to_download;
file_to_download = fopen("CLIENT_file_downloaded", "ab");  

long int total_bytes = 0;
int bytes_received = 0;
char received_buffer[2048];

printf(">>> Downloading file...\n");
while ((bytes_received = read(connector, received_buffer, sizeof received_buffer)) > 0) { // keep receiving until data stops sending
    printf("=");
    total_bytes += bytes_received;
    fwrite(received_buffer, 1, bytes_received, file_to_download);
}

if (bytes_received < 0) {
    printf("\n>>> Read error\n");
    exit(1);
}

printf("\n>>> File downloaded (Bytes received: %ld)\n\n", total_bytes);

This works perfectly when I close the socket connection immediately after, however, if I leave it open for some other functionality (say sending a message to the server), it halts at ">>> Downloading file...", however, I can see the downloaded file in the folder. Also, once I terminate the server side program, it prints out everything else.

From other SO threads, I think this has something to do with the socket getting "blocked".

  1. But then why is my file downloaded (I can see it in the folder)? The fwrite function is responsible for this which is inside the while loop, but if it is going inside the while loop, why doesn't it print anything?
  2. How can I have the server tell the client that it's done sending it data and how can I have the client side program move forward?
Govind Parmar
  • 18,500
  • 6
  • 49
  • 78
Shreya
  • 362
  • 5
  • 15
  • `recv` in it's default state will block forever waiting for more data to arrive. It has no clue that the sender stopped sending. For all it knows it's still waiting for a packet routed through a switch on Pluto and will arrive some time tomorrow. The only way it knows you are done is if you close the socket (which will return 0) or an error occurs( returns less than zero). – user4581301 Nov 25 '20 at 17:54
  • 3
    I dont think the buffer knows when to stop waiting, you would be better of asking for file size and then reading exactly that much bytes i think – WARhead Nov 25 '20 at 17:55
  • 'fwrite(received_buffer, 1, 2048, file_to_download);'.... why are you writing 2048 bytes when 'bytes_received' is returned and used in the previous two lines? Why did you do that, it makes no sense? – Martin James Nov 25 '20 at 17:56
  • @MartinJames, you're right. I must've done that while trying to figure out where the issue is. Edited. Thanks! – Shreya Nov 25 '20 at 17:59
  • 2
    @WARhead that is exactly it. The only protocol the OP has is connection open/close so, without that, the recv() call will never return 0 and the while loop will never exit:( – Martin James Nov 25 '20 at 17:59
  • @WARhead, hm, is there some way I could add a timeout, maybe? Without ever having to close the socket, that is. – Shreya Nov 25 '20 at 18:01
  • 'How can I have the server tell the client that it's done sending it data and how can I have the client side program move forward?' close the socket. If you want/need to keep the connection open for further exchanges, you will need a better protocol than open/close:) – Martin James Nov 25 '20 at 18:04
  • @MartinJames, I understand. Can you give me a direction? What protocols should I be looking at? – Shreya Nov 25 '20 at 18:07
  • 1
    WARhead's suggestion of sending the length of the file first is probably the easiest way to handle the problem. Make sure you send the length in a fixed width integer with known endian because if you just use `int` you never know if you have a 16 bit, 32 bit, 64 bit, or whatever size `int` at the receiver or which byte order is being used. – user4581301 Nov 25 '20 at 18:07
  • 1
    Sending the file size first is one such protocol, but bear in mind that is not as trivial as it might seem at first glance. Your whole TCP system should be designed/written so that it will still work correctly even if every recv() call that reads data reads only one byte and so returns 1. – Martin James Nov 25 '20 at 18:08
  • @Shreya regardless the better option might be to establish a transfer protocol. like first a N byte value describing size of data ( type of your choice) then the data. So here you can read exactly however much you want and then continue in client code, without closing the socket – WARhead Nov 25 '20 at 18:08
  • ..and you should have some way of isolating, framing and validating the file size header. – Martin James Nov 25 '20 at 18:11
  • @MartinJames when its all said and done you have just re-invented TCP or at the very least UDP ;) – WARhead Nov 25 '20 at 18:12
  • I prefer to send the header with an ASCII numerical size, with a checksum,so eliminating endianness etc. issues, and analysing it with a byte-by-byte state machine to be absolutely sure of correctness/sanity. – Martin James Nov 25 '20 at 18:15
  • How does the client know when to stop reading? – user253751 Nov 25 '20 at 18:16
  • 2
    @Shreya You should study existing file transfer protocols like FTP and HTTP. Then, if you choose not to use an existing protocol, I strongly advise you to *document* the protocol you're going to implement, even if it's only a few paragraphs. – David Schwartz Nov 25 '20 at 18:16
  • @WARhead lol, but how many, (hundreds), of examples have we seen on SO of clients/servers that read the file size with one recv() and assume it is complete and correct:) – Martin James Nov 25 '20 at 18:18
  • @Shreya if the server side is also done by you and it is just a small project / sample or something, you can make do with probably just a header with size and maybe a footer with some kind of sanity check, and maybe a return code to the server – WARhead Nov 25 '20 at 18:19
  • @Shreya if the code is going anywhere near a deployable application, please look at existing protocols before DIYing it. – WARhead Nov 25 '20 at 18:21
  • @WARhead, yeah, that is what it is right now (an assignment for university) so I'll probably do that, but I think I'll eventually end up trying something more complex after reading this discussion, haha! Very cool! – Shreya Nov 25 '20 at 18:21
  • 1
    *why doesn't it print anything?* [**Why does printf not flush after the call unless a newline is in the format string?**](https://stackoverflow.com/questions/1716296/why-does-printf-not-flush-after-the-call-unless-a-newline-is-in-the-format-strin) I'm not sure if this is a dupe - I'll let others decide... – Andrew Henle Nov 25 '20 at 18:23
  • @AndrewHenle, ah yes, that's what it is! Thank you! – Shreya Nov 25 '20 at 18:27

1 Answers1

3

You've already gotten a lot of comments, but I'll try to summarize at least a few of the bits and pieces into one place.

First of all, the problem: as you're doing things right now, there are basically three ways you call to read can return:

  1. It returns a strictly positive value (i.e., at least 1) that tells you how may bytes you read.
  2. It returns 0 to indicate that the socket was closed.
  3. It return a negative value to indicate an error.

But there's not a defined way for it to return and tell you: "the socket's still open, there was no error, but there's no data waiting to be read."

So, as others have said, if you're going to transfer a file (or some other defined chunk of data) you generally need to define some application-level protocol on top of TCP to support that. The most obvious starting point is that you send the size of the file first (typically as a single fixed-size chunk, such as 4 or 8 bytes), followed by that many bytes of data.

If you do just that, you can define something that at least can work. There are all sorts of possible errors it can miss, but at least if things all work well, it can be fine.

The next step beyond that is typically to add something like some sort of checksum/CRC so when you think a transfer is complete, you can verify the data to get at least a reasonable assurance that it worked (i.e., that the data you received matches what was sent).

Another generally direction to consider is how you're doing your reading. There are a couple of choices here. One is to avoid calling read until you're sure it can/will succeed. If you're dealing only with one (or a few) sockets at a time, you can call select, which will tell you when your socket is ready to read, so issuing a read is guaranteed to succeed quickly. It might read less than you've asked, but it will return rather than waiting indefinitely for data. If you have to deal with a lot of sockets, you might prefer to look up epoll, which does roughly the same thing, but reduces overhead when you have to deal with many handles.

Another possible way to deal with this problem is to set the O_NONBLOCK option for your socket. In this case, attempting to read when no data is available will be treated as an error, so it'll return immediately with an error of EAGAIN or EWOUDLBLOCK (you have to be prepared for either). This gives you a fairly easy way to at least proceed when you have no more data available, but does nothing about any of the other difficulties in transferring data effectively.

As others have noted, there are quite a few existing protocols for doing things like this and reinventing it may not be the best use of your time. On the other hand, some protocols can be somewhat painful (e.g., ftp's normal mode requires that you open/use two separate sockets). Others are complex enough that you probably don't want to try to implement them on your own, but libraries to support them well can be difficult to find.

Personally, I've found that websockets work pretty reasonably for quite a few tasks like this. They include framing (so what was sent as a single websocket write will be received with a single websocket read). They also use CRC to do error checking. So, for quite a few cases like this, it'll take care most of the details more or less automatically. It also includes (and in most cases uses) what they call a ping/pong protocol to detect loss of connection much faster than TCP normally does on its own.

But as noted above, there are lots of alternatives, some of them designed much more specifically for transferring files (so what you receive isn't just the content of the file, but things like the name and other metadata attached to that content).

Jerry Coffin
  • 437,173
  • 71
  • 570
  • 1,035