117

Maybe I'm just not seeing it, but CRC32 seems either needlessly complicated, or insufficiently explained anywhere I could find on the web.

I understand that it is the remainder from a non-carry-based arithmetic division of the message value, divided by the (generator) polynomial, but the actual implementation of it escapes me.

I've read A Painless Guide To CRC Error Detection Algorithms, and I must say it was not painless. It goes over the theory rather well, but the author never gets to a simple "this is it." He does say what the parameters are for the standard CRC32 algorithm, but he neglects to lay out clearly how you get to it.

The part that gets me is when he says "this is it" and then adds on, "oh by the way, it can be reversed or started with different initial conditions," and doesn't give a clear answer of what the final way of calculating a CRC32 checksum given all of the changes he just added.

  • Is there a simpler explanation of how CRC32 is calculated?

I attempted to code in C how the table is formed:

for (i = 0; i < 256; i++)
{
    temp = i;

    for (j = 0; j < 8; j++)
    {
        if (temp & 1)
        {
            temp >>= 1;
            temp ^= 0xEDB88320;
        }
        else {temp >>= 1;}
    }
    testcrc[i] = temp;
}

but this seems to generate values inconsistent with values I have found elsewhere on the Internet. I could use the values I found online, but I want to understand how they were created.

Any help in clearing up these incredibly confusing numbers would be very appreciated.

Oreo
  • 438
  • 3
  • 13
aquanar
  • 1,301
  • 2
  • 9
  • 8
  • 9
    Your code for generating the CRC32 table appears to be correct. Your lsbit-first (*reversed*) CRC32 polynomial of `0xEDB88320` can also be written msbit-first (*normal*) as `0x04C11DB7`. Were the table values you found elsewhere generated using the same CRC polynomial? – jschmier Jan 27 '11 at 20:23
  • 1
    @jschmier hi , I feel like Im a step behind this guy asking the questions ? https://stackoverflow.com/questions/62168128/understanding-cyclic-redundancy-code-algorithm-for-beginners – bluejayke Jun 03 '20 at 08:35
  • If anyone else is curious to read "A Painless Guide To CRC Error Detection Algorithms" linked to above, that original URL is hosed but Google easily found several copies, including this one: https://zlib.net/crc_v3.txt – Stéphane Jul 12 '20 at 07:00

7 Answers7

132

The polynomial for CRC32 is:

x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1

Or in hex and binary:

0x 01 04 C1 1D B7
1 0000 0100 1100 0001 0001 1101 1011 0111

The highest term (x32) is usually not explicitly written, so it can instead be represented in hex just as

0x 04 C1 1D B7

Feel free to count the 1s and 0s, but you'll find they match up with the polynomial, where 1 is bit 0 (or the first bit) and x is bit 1 (or the second bit).

Why this polynomial? Because there needs to be a standard given polynomial and the standard was set by IEEE 802.3. Also it is extremely difficult to find a polynomial that detects different bit errors effectively.

You can think of the CRC-32 as a series of "Binary Arithmetic with No Carries", or basically "XOR and shift operations". This is technically called Polynomial Arithmetic.

To better understand it, think of this multiplication:

(x^3 + x^2 + x^0)(x^3 + x^1 + x^0)
= (x^6 + x^4 + x^3
 + x^5 + x^3 + x^2
 + x^3 + x^1 + x^0)
= x^6 + x^5 + x^4 + 3*x^3 + x^2 + x^1 + x^0

If we assume x is base 2 then we get:

x^7 + x^3 + x^2 + x^1 + x^0

Why? Because 3x^3 is 11x^11 (but we need only 1 or 0 pre digit) so we carry over:

=1x^110 + 1x^101 + 1x^100          + 11x^11 + 1x^10 + 1x^1 + x^0
=1x^110 + 1x^101 + 1x^100 + 1x^100 + 1x^11 + 1x^10 + 1x^1 + x^0
=1x^110 + 1x^101 + 1x^101          + 1x^11 + 1x^10 + 1x^1 + x^0
=1x^110 + 1x^110                   + 1x^11 + 1x^10 + 1x^1 + x^0
=1x^111                            + 1x^11 + 1x^10 + 1x^1 + x^0

But mathematicians changed the rules so that it is mod 2. So basically any binary polynomial mod 2 is just addition without carry or XORs. So our original equation looks like:

=( 1x^110 + 1x^101 + 1x^100 + 11x^11 + 1x^10 + 1x^1 + x^0 ) MOD 2
=( 1x^110 + 1x^101 + 1x^100 +  1x^11 + 1x^10 + 1x^1 + x^0 )
= x^6 + x^5 + x^4 + 3*x^3 + x^2 + x^1 + x^0 (or that original number we had)

I know this is a leap of faith but this is beyond my capability as a line-programmer. If you are a hard-core CS-student or engineer I challenge to break this down. Everyone will benefit from this analysis.

So to work out a full example:

   Original message                : 1101011011
   Polynomial of (W)idth 4         :      10011
   Message after appending W zeros : 11010110110000

Now we divide the augmented Message by the Poly using CRC arithmetic. This is the same division as before:

            1100001010 = Quotient (nobody cares about the quotient)
       _______________
10011 ) 11010110110000 = Augmented message (1101011011 + 0000)
=Poly   10011,,.,,....
        -----,,.,,....
         10011,.,,....
         10011,.,,....
         -----,.,,....
          00001.,,....
          00000.,,....
          -----.,,....
           00010,,....
           00000,,....
           -----,,....
            00101,....
            00000,....
            -----,....
             01011....
             00000....
             -----....
              10110...
              10011...
              -----...
               01010..
               00000..
               -----..
                10100.
                10011.
                -----.
                 01110
                 00000
                 -----
                  1110 = Remainder = THE CHECKSUM!!!!

The division yields a quotient, which we throw away, and a remainder, which is the calculated checksum. This ends the calculation. Usually, the checksum is then appended to the message and the result transmitted. In this case the transmission would be: 11010110111110.

Only use a 32-bit number as your divisor and use your entire stream as your dividend. Throw out the quotient and keep the remainder. Tack the remainder on the end of your message and you have a CRC32.

Average guy review:

         QUOTIENT
        ----------
DIVISOR ) DIVIDEND
                 = REMAINDER
  1. Take the first 32 bits.
  2. Shift bits
  3. If 32 bits are less than DIVISOR, go to step 2.
  4. XOR 32 bits by DIVISOR. Go to step 2.

(Note that the stream has to be dividable by 32 bits or it should be padded. For example, an 8-bit ANSI stream would have to be padded. Also at the end of the stream, the division is halted.)

ilkkachu
  • 5,613
  • 11
  • 25
  • 16
    +1 for the "Average Guy Review" at the end - maybe consider moving this right to the top - a sort of TL; DR :P – aaronsnoswell Nov 29 '13 at 05:56
  • why XOR is being used instead of subtraction, in long division part? can you please explain – abstractnature Sep 17 '15 at 10:35
  • 4
    @abstractnature Remember that we're dividing polynomials, not just binary numbers. We can't do "normal" subtraction because we can't "borrow" $x^n$ from $x^{n+1}$; they're different kinds of things. Also, since the bits are only 0 or 1, what would -1 even be? Really, we're working in the ring of polynomials with coefficients in the field $Z/2Z$, which only has two elements, 0 and 1, and where $1+1=0$. By putting the cofficients be in a field, then the polynomials form what is called a Euclidean Domain, which basically just allows what we're trying to do to be well-defined in the first place. – calavicci Nov 10 '15 at 23:11
  • 6
    Just to clarify the actual polynomial is 100000100110000010001110110110111 = 0x104C11DB7. The MSB is implicit, but still should be taken into account in an implementation. Since it will always be set because the polynomial needs to be 33 bits long (so the remainder can be 32 bits long) some people omit the MSB. – Felipe T. Jun 07 '16 at 09:22
  • Why did you append four 0s though? the polynomial is 5 bits, and there are originally 10 bits, so it seems like it would be done with just 2 passes, so no need to append anything? Also, how do you compute a CRC that's larger than the word size of the CPU? if it's 64 bits, the the poly is 65, that won't fit in a variable... – MarcusJ Aug 28 '16 at 07:32
  • 2
    `x^6 + x^5 + x^4 + 3*x^3 + x^2 + x^1 + x^0 ... If we assume x is base 2 then we get: x^7 + x^3 + x^2 + x^1 + x^0`. This isn't how the math works. The coefficients to the polynomial are mod(2) or GF(2), the x's are left alone, resulting in x^6 + x^5 + x^4 + x^3 + x^2 + x^1 + x^0 (since 3 mod(2) = 1). `Tack the remainder on the end of your message` - technically the remainder is subtracted from the 0 bits that were appended to the message, but since this is mod(2) math, both add and subtract are the same as XOR, and the zero bits XOR'ed with the remainder is the same as the remainder. – rcgldr May 29 '19 at 04:21
  • 2
    @MarcusJ - `Why did you append four 0s though?` - the software algorithms to calculate crc effectively append the 0s, even though it's not apparent. If showing the CRC calculation using long hand division, then then 0s need to be appended for the division example to appear correctly. – rcgldr May 29 '19 at 04:25
  • 1
    what is x exactly? https://stackoverflow.com/questions/62168128/understanding-cyclic-redundancy-code-algorithm-for-beginners – bluejayke Jun 03 '20 at 08:34
13

For IEEE802.3, CRC-32. Think of the entire message as a serial bit stream, append 32 zeros to the end of the message. Next, you MUST reverse the bits of EVERY byte of the message and do a 1's complement the first 32 bits. Now divide by the CRC-32 polynomial, 0x104C11DB7. Finally, you must 1's complement the 32-bit remainder of this division bit-reverse each of the 4 bytes of the remainder. This becomes the 32-bit CRC that is appended to the end of the message.

The reason for this strange procedure is that the first Ethernet implementations would serialize the message one byte at a time and transmit the least significant bit of every byte first. The serial bit stream then went through a serial CRC-32 shift register computation, which was simply complemented and sent out on the wire after the message was completed. The reason for complementing the first 32 bits of the message is so that you don't get an all zero CRC even if the message was all zeros.

Pavlo Bobrek
  • 131
  • 1
  • 2
  • 2
    This is the best answer here so far, although I would replace 'bit-reverse each of the 4 bytes', with 'bit-reverse the 4 bytes, treating them as one entity' e.g. 'abcdefgh ijklmnop qrstuvwx yzABCDEF' to 'FEDCBAzy xwvutsrq ponmlkji hgfedcba'. See also: [CRC-32 hash tutorial - AutoHotkey Community](https://autohotkey.com/boards/viewtopic.php?f=7&t=35671). – vafylec Aug 09 '17 at 23:28
  • 1
    hi, what "message" exact;y do you reverse by? https://stackoverflow.com/questions/62168128/understanding-cyclic-redundancy-code-algorithm-for-beginners – bluejayke Jun 03 '20 at 08:37
11

A CRC is pretty simple; you take a polynomial represented as bits and the data, and divide the polynomial into the data (or you represent the data as a polynomial and do the same thing). The remainder, which is between 0 and the polynomial is the CRC. Your code is a bit hard to understand, partly because it's incomplete: temp and testcrc are not declared, so it's unclear what's being indexed, and how much data is running through the algorithm.

The way to understand CRCs is to try to compute a few using a short piece of data (16 bits or so) with a short polynomial -- 4 bits, perhaps. If you practice this way, you'll really understand how you might go about coding it.

If you're doing it frequently, a CRC is quite slow to compute in software. Hardware computation is much more efficient, and requires just a few gates.

WhirlWind
  • 13,246
  • 2
  • 39
  • 41
  • 1
    For CRC32 or CRC32b , do we get hash collision meaning for two different strings do we get same CRC – indianwebdevil May 18 '20 at 02:42
  • 1
    hi, Im a bit confused what you mean by "divifde the polynomials into the data"? https://stackoverflow.com/questions/62168128/understanding-cyclic-redundancy-code-algorithm-for-beginners what is X in the polynomial represnted by? Do I use the oter bytes from the chunk? – bluejayke Jun 03 '20 at 08:38
11

I published a tutorial on CRC-32 hashes, here: CRC-32 hash tutorial - AutoHotkey Community

In this example from it, I demonstrate how to calculate the CRC-32 hash for the 'ANSI' (1 byte per character) string 'abc':

calculate the CRC-32 hash for the 'ANSI' string 'abc':

inputs:
dividend: binary for 'abc': 0b011000010110001001100011 = 0x616263
polynomial: 0b100000100110000010001110110110111 = 0x104C11DB7

start with the 3 bytes 'abc':
61 62 63 (as hex)
01100001 01100010 01100011 (as bin)

reverse the bits in each byte:
10000110 01000110 11000110

append 32 0 bits:
10000110010001101100011000000000000000000000000000000000

XOR (exclusive or) the first 4 bytes with 0xFFFFFFFF:
(i.e. flip the first 32 bits:)
01111001101110010011100111111111000000000000000000000000

next we will perform 'CRC division':

a simple description of 'CRC division':
we put a 33-bit box around the start of a binary number,
start of process:
if the first bit is 1, we XOR the number with the polynomial,
if the first bit is 0, we do nothing,
we then move the 33-bit box right by 1 bit,
if we have reached the end of the number,
then the 33-bit box contains the 'remainder',
otherwise we go back to 'start of process'

note: every time we perform a XOR, the number begins with a 1 bit,
and the polynomial always begins with a 1 bit,
1 XORed with 1 gives 0, so the resulting number will always begin with a 0 bit

'CRC division':
'divide' by the polynomial 0x104C11DB7:
01111001101110010011100111111111000000000000000000000000
 100000100110000010001110110110111
 ---------------------------------
  111000100010010111111010010010110
  100000100110000010001110110110111
  ---------------------------------
   110000001000101011101001001000010
   100000100110000010001110110110111
   ---------------------------------
    100001011101010011001111111101010
    100000100110000010001110110110111
    ---------------------------------
         111101101000100000100101110100000
         100000100110000010001110110110111
         ---------------------------------
          111010011101000101010110000101110
          100000100110000010001110110110111
          ---------------------------------
           110101110110001110110001100110010
           100000100110000010001110110110111
           ---------------------------------
            101010100000011001111110100001010
            100000100110000010001110110110111
            ---------------------------------
              101000011001101111000001011110100
              100000100110000010001110110110111
              ---------------------------------
                100011111110110100111110100001100
                100000100110000010001110110110111
                ---------------------------------
                    110110001101101100000101110110000
                    100000100110000010001110110110111
                    ---------------------------------
                     101101010111011100010110000001110
                     100000100110000010001110110110111
                     ---------------------------------
                       110111000101111001100011011100100
                       100000100110000010001110110110111
                       ---------------------------------
                        10111100011111011101101101010011

we obtain the 32-bit remainder:
0b10111100011111011101101101010011 = 0xBC7DDB53

note: the remainder is a 32-bit number, it may start with a 1 bit or a 0 bit

XOR the remainder with 0xFFFFFFFF:
(i.e. flip the 32 bits:)
0b01000011100000100010010010101100 = 0x438224AC

reverse bits:
bit-reverse the 4 bytes (32 bits), treating them as one entity:
(e.g. 'abcdefgh ijklmnop qrstuvwx yzABCDEF'
to 'FEDCBAzy xwvutsrq ponmlkji hgfedcba':)
0b00110101001001000100000111000010 = 0x352441C2

thus the CRC-32 hash for the 'ANSI' string 'abc' is: 0x352441C2
vafylec
  • 895
  • 1
  • 6
  • 22
  • 1
    If you want more speed there was a method worked out by some engineers at Intel about 2006 using typically 4 or 8 bytes of the machine's data bus width simultaneously. Academic paper: https://static.aminer.org/pdf/PDF/000/432/446/a_systematic_approach_to_building_high_performance_software_based_crc.pdf Project on Sourceforge: http://sourceforge.net/projects/slicing-by-8/ General crc page: https://create.stephan-brumme.com/crc32/ – Alan Corey Nov 03 '18 at 04:12
  • 1
    Hi thanks looks great, but how exactly do you get the polynomial value? what does X represent exactly? And when it says x^32, is that x to the power of 32, or the bitwise operator `^`? https://stackoverflow.com/questions/62168128/understanding-cyclic-redundancy-code-algorithm-for-beginners – bluejayke Jun 03 '20 at 08:36
7

In addition to the Wikipedia Cyclic redundancy check and Computation of CRC articles, I found a paper entitled Reversing CRC - Theory and Practice* to be a good reference.

There are essentially three approaches for computing a CRC: an algebraic approach, a bit-oriented approach, and a table-driven approach. In Reversing CRC - Theory and Practice*, each of these three algorithms/approaches is explained in theory accompanied in the APPENDIX by an implementation for the CRC32 in the C programming language.

* PDF Link
Reversing CRC – Theory and Practice.
HU Berlin Public Report
SAR-PR-2006-05
May 2006
Authors:
Martin Stigge, Henryk Plötz, Wolf Müller, Jens-Peter Redlich

jschmier
  • 14,414
  • 6
  • 51
  • 71
2

Then there is always Rosetta Code, which shows crc32 implemented in dozens of computer languages. https://rosettacode.org/wiki/CRC-32 and has links to many explanations and implementations.

don bright
  • 1,013
  • 10
  • 13
  • 1
    can you elaborate a bit? https://stackoverflow.com/questions/62168128/understanding-cyclic-redundancy-code-algorithm-for-beginners – bluejayke Jun 03 '20 at 08:35
1

In order to reduce crc32 to taking the reminder you need to:

  1. Invert bits on each byte
  2. xor first four bytes with 0xFF (this is to avoid errors on the leading 0s)
  3. Add padding at the end (this is to make the last 4 bytes take part in the hash)
  4. Compute the reminder
  5. Reverse the bits again
  6. xor the result again.

In code this is:


func CRC32 (file []byte) uint32 {
    for i , v := range(file) {
        file[i] = bits.Reverse8(v)
    }
    for i := 0; i < 4; i++ {
        file[i] ^= 0xFF
    }

    // Add padding
    file = append(file, []byte{0, 0, 0, 0}...)
    newReminder := bits.Reverse32(reminderIEEE(file))

    return newReminder ^ 0xFFFFFFFF
}

where reminderIEEE is the pure reminder on GF(2)[x]

Gabriel Furstenheim
  • 1,941
  • 19
  • 18
  • 1
    im having a bit (pun intended) of trouble understanding this? https://stackoverflow.com/questions/62168128/understanding-cyclic-redundancy-code-algorithm-for-beginners – bluejayke Jun 03 '20 at 08:35
  • 1
    hey @bluejayke, check this library https://github.com/furstenheim/sparse_crc32/blob/master/main.go it implements the crc32 for sparse files, you can see there the nitty gritty details on the computation. It is not optimized so it is easier to follow than normal implementations. It can be that what you don't understand is the GF(2)[x] part.Basically x^3 + x means 1010, x ^4 + x + 1 means 10011. Then you need to perform division, for example x^3 + x is x * (x^2 + 1). so the reminder of x^3 +x over x is 0, but over x ^2 it would be x ^ 2 * x + x, that is, the reminder would be x. – Gabriel Furstenheim Jun 03 '20 at 18:09
  • 1
    @bluejayke and reminderIEEE means reminder against a well known polynomial, the IEEE polynomial – Gabriel Furstenheim Jun 03 '20 at 18:10
  • hi again, thanks for your response. I'm just trying to understand (for javascript purposes) what the "x" represnets in the polynomial. Is "x" some kind of code-word for something I'm missing here? There's a llot of terms that confuse me here, I've never heard of CRC32 before, and even after searching I couldn't find it actually explained. For a PNG for example it says I need to take the "CRC for each chunk", does that mean "for all the data in the chunk"? But how do I "plug it in" to the polynomial? What does "x" represent? Also when it sayss x^32, is that like Math.pow(x, 32) or the bitwise ^ – bluejayke Jun 04 '20 at 06:58
  • 1
    Hi @bluejayke, x is an abstraction to make computations easy. It is not expected to be substituted by anything. x^2 I mean x * x, as a formal multiplication. Here http://chrisballance.com/wp-content/uploads/2015/10/CRC-Primer.html you can find a nice explanation of that division. What I tried with my answer was to fill the gap between the division (in that link) and the actual computation – Gabriel Furstenheim Jun 04 '20 at 15:51