19

I'm wondering if CRC32 sum and CRC32C in particular ever return to 0? The simple answer would be "yes" given a large enough data set. However, I was wondering if there is any provisioning in CRC32C standard that would explicitly prevent this from happening.

The use case for this is that I need to be able to check if remote file is empty and all I have is its CRC32C check sum. So, in other words can I infer that if CRC32C is 0 then file is guaranteed to be empty.

If possible, please provide any reference to a standard where this is defined.

dtoux
  • 1,374
  • 2
  • 16
  • 34
  • 1
    Can you use your own checksum? In that case, define zero to be only used for the empty file. If zero happens to be produced by the hash function, just set it to 1. – usr Aug 29 '14 at 18:02
  • You know the CRC32 value but not the length of the file? Huh? – kay Aug 29 '14 at 18:05
  • @usr CRC32C algorith is highly optimized for speed and is implemented in hardware on Intel CPUs. I need this for calculations at wire speed, so custom implementation is not an option. – dtoux Aug 29 '14 at 20:06
  • @Kay This is just an example. The actual use case is more complicated than that. – dtoux Aug 29 '14 at 20:14
  • 4
    @dtoux you only need to append: `if (crcValue == 0) crcValue = 1;`. That's all. – usr Aug 29 '14 at 22:29
  • @usr that is a neat idea, thanks – dtoux Aug 29 '14 at 22:56

3 Answers3

27

@Yanek is almost completely correct.

Just for fun, here is a five-character sequence that gives a CRC-32C of zero: DYB|O. Here is a four-byte sequence in hex that gives zero: ab 9b e0 9b. In fact, that is the only four-byte sequence that can do so. There are no three-byte or shorter sequences that will give you zero. That is where @Yanek is not exactly right, in that for three-byte or shorter sequences, zero is not just as likely. The probability of getting a zero is zero in those cases.

Mark Adler
  • 79,438
  • 12
  • 96
  • 137
  • For 3 byte inputs, there are about 256 outputs that have probability zero. There is nothing special about the zero output as far as I can tell. – usr Aug 29 '14 at 22:30
  • 2
    There has to be a _lot_ more than that. There are only 2^24 possible 3-byte inputs, so there has to be 2^32-2^24 == 4,278,190,080 outputs with probability zero. The rest have probability 2^-24. – Mark Adler Aug 29 '14 at 22:35
  • Right, I mistakenly divided the numbers instead of subtracting. – usr Aug 29 '14 at 22:37
  • @MarkAdler Thanks Mark, that is very useful. – dtoux Aug 29 '14 at 22:58
21

A zero is as likely as any other value of a CRC32 checksum. A CRC is essentially the remainder of dividing the entire input (taken as one large binary number) by a pre-selected value. If the input happens to be divisible by that value, the remainder, and thus the CRC, is zero.

Yanek Martinson
  • 346
  • 2
  • 3
  • That is my current understanding but I'm still hoping that someone will prove me wrong :-) – dtoux Aug 29 '14 at 20:13
0

How about this, not a 32-bit CRC, though:

1011 | 110011001010.000
       1011
       ----
        1111
        1011
        ----
         1001
         1011
         ----
           1000
           1011
           ----
             1110
             1011
             ----
              1011
              1011
              ----
                  0000 (...)
                  1011
                  ----
                  1011
                  1011
                  ----
                  0000

Or:

1100 | 11001010.000
       1100
       ----
           1010
           1100
           ----
            1100
            1100
            ----
            (...) 0
Sparky
  • 142
  • 12