0

I am writing a program in C++ that is sending 1500 bytes of data from machine A to machine B.

Assume the following:

char* tx_data = (char*)operator new(1500)
for (int i = 0; i < 1500; i++) {
  tx_data[i] = (char)((int) 65); // ASCII 65 = A;
}
send_tx_data();

So we have the letter 'A' 1500 times to fill this buffer (tx_data). The receiving host grabs incoming data and dumps it into a buffer (rx_data), and grabs the length of the received data (rxLength);

rxLength = recvfrom(sock, rxdata, 1500, 0, NULL, NULL);
           //             ^ the buffer we are putting into

If rxLength == 1500, we have received 1500 bytes of data, but we need to be sure it's the same 1500 bytes we sent (and not 1500 bytes of data from else where, flying round the network)! Typically when comparing some small amount of user input for example, strncmp() might be used. I'm not suggestion that strncmp() is appropriate here, what I am saying though, is that I don't think it's a good idea to have on the receiving end, a buffer that might be called expected_data for example, that contains 1500 x 'A' and I loop through and compare the two, like strncmp().

How can I evaluate the 1500 bytes of data received in an efficient manner? [By efficient I mean quickly. This will be happening thousands of times a second, so the code needs to be fairly optimal!]. I had an idea about a checksum of some sort, as we know what we should be receiving, but I'm not sure of a good way to do this. Can anyone suggest a good checksum method? Alternatively, if that is a silly idea, could you explain why and perhaps recommend something else?

tshepang
  • 10,772
  • 21
  • 84
  • 127
jwbensley
  • 8,994
  • 18
  • 65
  • 89
  • Forgot to mention, I'm quite the c++ noob if that isn't obvious from the question, and you're reading this thinking 'man, who doesn't know the answer to that question!' :) – jwbensley May 21 '12 at 19:40
  • 3
    this is what [hashes](http://en.wikipedia.org/wiki/List_of_hash_functions) are made for. But, if you know what you will receive, why do you send anything? – moooeeeep May 21 '12 at 19:41
  • 1
    The most efficient thing would be not to do any validation, since there are checksums done for you by networking stack. Any data that gets corrupted on its way across the network will be dropped, so user-level checksumming is redundant, unless you are paranoid enough not to trust the OS's networking implementation to do the right thing. – Jeremy Friesner May 21 '12 at 19:43
  • @moooeeeep Well I am aware of networking protocols and how they work, that's what gave me the idea of a checksum because I was thing of CRC's but I'm not sure how I would implement it here – jwbensley May 21 '12 at 19:43
  • @JeremyFriesner This is being sent over a raw layer 2 connection without FCS or CRC :) – jwbensley May 21 '12 at 19:44
  • 1
    You could just xor the `size_t` results of std::hash for each char. – 111111 May 21 '12 at 19:45
  • 1
    There are all sorts of checksum/crc/error-detecting/message authentication/error-correcting schemes. They all have tradeoffs in terms of how well they detect errors (and the types of errors they detect or don't detect) and the resources they require. So you might want to add some information about the system that's being targeted (a small embedded system might not have the resources to do more than a simple CRC16, for example). – Michael Burr May 21 '12 at 19:47
  • CRCs have been around forever and used for just this purpose, you should be able to find a high quality implementation somewhere. – Mark Ransom May 21 '12 at 19:47
  • 1
    Well, if you really want to do your own checking, you'll need to have the sender compute a checksum of the data, and include that checksum with the data. The the receiver can re-compute the checksum again after it receives the data, and compare the checksum it computed against the one that was included with the data, and verify that they match. As far as what checksum algorithm to use, pretty much any of them will work, as checksumming doesn't take much CPU. – Jeremy Friesner May 21 '12 at 19:47
  • P.S. The [Wikipedia article on CRC](http://en.wikipedia.org/wiki/Cyclic_redundancy_check) looks pretty complete. – Mark Ransom May 21 '12 at 19:50
  • 1
    In addition to my previous answer, a char will only produce 256bits of entropy and std::hash just returns the char again, so XOR the results will make you hash only have 256 bit of entropy, if you want to hash and have high entropy then pack your chars input a size_t then hash. – 111111 May 21 '12 at 20:01
  • You can remove the comment "ASCII 65 = A" by using a `'A'` instead; much more readable. – Thomas Matthews Jun 10 '12 at 16:43

2 Answers2

2

First of all consider your specific process of sending 1500 bytes of data from machine A to machine B, and check if you have any step in this process which might introduce any losses or corruption. It might be the case that your process does not have any steps introducing any of these. If you are sending the data using TCP/IP for instance, your data is guaranteed to be received correctly, if at all, by the underlying TCP/IP stack.

If on the other hand you have some steps introducing data loss, corruption, change of order, etc, you should consider CRCs, if you value performance that much. You can find a thorough explanation and source code example of using CRCs here: http://www.barrgroup.com/Embedded-Systems/How-To/CRC-Calculation-C-Code

Hakan Serce
  • 10,888
  • 3
  • 26
  • 43
  • CRC's are indeed the correct method: they're highly likely to catch any _random_ corruption, and do run at network speeds. "Secure hashes" are needed only to guard against _intentional_ corruption. – MSalters May 22 '12 at 07:49
0

After multiple suggesstions of hash functions and CRCs, I used code from this answer to another question, to make a simple hash function.

Community
  • 1
  • 1
jwbensley
  • 8,994
  • 18
  • 65
  • 89