4

This post suggests (see Anirudh Ramanathan's answer) that a Base64 encoded string can end up with up to 3 equal characters. Elsewhere on the web I see that it can either be one or two equal signs. Theoretically speaking, since we have to make the length a multiple of 4, Anirudh's answer appears correct. Which of these pieces of information is wrong?

Using brute-force, I have tried looking for a string whose Base64 would end up 3 equal sign, but haven't succeeded in finding any for strings of up to 10,000 characters length.

Or am I missing the obvious here?

Community
  • 1
  • 1
dotNET
  • 28,678
  • 19
  • 120
  • 206
  • 1
    That was not correct. He based that guess on the requirement that the string must have a length that is a multiple of 4, so assumed that up to 3 padding characters are required. What he did not realize is that base64 always produces an even number of characters before padding. So at most needs 2 padding characters. Do beware that FromBase64String [has a bug](http://stackoverflow.com/a/21203467/17034). – Hans Passant Feb 05 '17 at 13:33
  • the linked answer before I edited it is here http://stackoverflow.com/revisions/6309439/2 – Slai Feb 05 '17 at 13:40

2 Answers2

3

No, it cannot end with 3 "=" signs. Every 4 characters of base64 encoded string represent exactly 3 bytes, because byte contains 8 bits (2^8), and 64 = 2^6. So 4 characters of base-64 encoding can hold up to 2^6 * 2^6 * 2^6 * 2^6 bits, which is exactly 2^8 * 2^8 * 2^8 = 3 bytes. Because 2^8 > 2^6, you need at least two base-64 characters to encode one byte. From that it follows that base-64 string cannot contain 3 characters padding: 1 byte will be encoded with two characters + two padding "=" characters. 2 bytes obviously can require no more than two padding characters either. 0 and 3 bytes do not need padding at all.

Evk
  • 84,454
  • 8
  • 110
  • 160
3

The following table shows how three input bytes map to four Base64 characters.

        1       2       3                
8-bit:  111111112222222233333333
Base64: 111111222222333333444444 
        1     2     3     4                   

This, a modulo 3 input, is the optimal encoding scenario: there are no bits wasted and no padding is required: the output string is four characters.

Now when you want to encode only two input bytes, you need three output characters. This means the output gets padded with one padding character, up till a total of four characters.

Then the minimum non-empty input, being one 8-bit byte, gets encoded into two Base64 characters. Now two padding characters are required to fill the output string to four characters.

There's no input for which the output is one character, so you'll never have to use three padding characters - as long as you're encoding entire 8-bit bytes.

CodeCaster
  • 131,656
  • 19
  • 190
  • 236