1

I know there are dozens of questions about this already, in various forms. My question is slightly more direct though.

Using Free Pascal and the s:=DecodeStringBase64(s); function, is there anyway to validate if the decoded string that is passed as s is actually decoded from proper Base64 input data in the first place to avoid decoded garbage?

The best I have done is used a reg exp to identify potential Base64 data (from the accepted answer here). I then check if it is divisible by 4 using mod. If it is divisible by 4, I pass it to DecodeStringBase64. However, I am still getting lots of false positives and returned data that has 'decoded' but was clearly not Base64 in the first place, despite matching the reg exp. For example "WindowsXP=" matches the expression but is not Base64 encoded data.

Equally, the name 'Ted' encodes as VGVk which doesn't even have the usual '=' padding (which can help to flag it as a footer) but it still a potential Base64 fragment that I'd like to find and decode.

In PHP, there is base64_decode() for which a true parameter can be passed to help with validation.

AFAIK, Free Pascal does not have this with DecodeStringBase64 and I need some way of validating.

Other useful replies around the subject of decoding and encoding, if the reader happens to be looking for it as I was yesterday, is here

Community
  • 1
  • 1
Gizmo_the_Great
  • 959
  • 13
  • 27

2 Answers2

2

Short answer is no, there is no 100% working validation for Base64 encoded strings.

The = sign in Base64 encoded string is not significant, it is for padding and so it doesn't always need to be there (encoded string just have to be multiple 4 in length). You can only check if the string length is multiple of 4, check for valid characters from the Base64 alphabet (see Page 5, Table 1) and verify, if there is not more than two = padding sign chars at the end of the input string. Here's a code, that can verify, if the passed string can be a valid Base64 encoded string (nothing more you can do, anyway):

function CanBeValidBase64EncodedString(const AValue: string): Boolean;
const
  Base64Alphabet = ['A'..'Z', 'a'..'z', '0'..'9', '+', '/'];
var
  I: Integer;
  ValLen: Integer;
begin
  ValLen := Length(AValue);
  Result := (ValLen > 0) and (ValLen mod 4 = 0);
  if Result then
  begin
    while (AValue[ValLen] = '=') and (ValLen > Length(AValue) - 2) do
      Dec(ValLen);
    for I := ValLen downto 1 do
      if not (AValue[I] in Base64Alphabet) then
      begin
        Result := False;
        Break;
      end;
  end;
end;
TLama
  • 71,521
  • 15
  • 192
  • 348
  • Afaik there are also requirements about the = chars at the end. – Marco van de Voort Oct 18 '12 at 08:03
  • @Marco, if there's more than 2 of `=` chars at the end of the encoded string, you should ignore them, but I guess `DecodeStringBase64` doesn't do that... I'll add also the check, if the `=` char count at the end doesn't exceed 2 chars. Thanks for pointing this out! Anyway, when the `=` char is found before the end of the string, it should be also treated as invalid... – TLama Oct 18 '12 at 08:14
  • My Reg Exp is : Base64StringPattern.Expression := '([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)'; So, the strings passed to the function you wrote TLama will, in theory, only have one or two = signs at the end of them anyway. Though I take the point that for implementation into FPC (As Marco writes below) a check throughout the string will be necessary. Many thanks for your help. – Gizmo_the_Great Oct 18 '12 at 10:08
  • I'll update the code to reflect also the `=` sign rules. There can't be more than two of them at the end of the string and there can't be any inside of the string. I'll be right back... – TLama Oct 18 '12 at 10:18
  • I've added the validation for more than two `=` signs at the end of the input string and removed the `=` sign from the Base64 alphabet. – TLama Oct 18 '12 at 11:46
  • TLama...just gave your imropvoed code a whirl and it rejects a Base64 value that it previously decoded (i.e. my program doesn't process it so the function must be returning false). The value is d2lyZXNoYXJrLXN0dWRlbnRzOm5ldHdvcms= and it should decode to 'wireshark-students:network' (or similar). As I don't need it due to my reg exp, I have just taken that bit out for now but I thought I'd let you know. Maybe it something I doing wrong, but all I did was add those two lines and the var and it now doesn't process it. – Gizmo_the_Great Oct 18 '12 at 21:22
  • I've just tried the `d2lyZXNoYXJrLXN0dWRlbnRzOm5ldHdvcms=` and it gives me a True as a result. Don't you have e.g. line break in that string (#13#10) ? – TLama Oct 18 '12 at 21:32
  • My source data is fed into a StringList and I pass that to the regular expression using a for i := 0 to Sl.Count -1 do So I assume the answer is yes, because SL's have end of line markers I think? Do you think that is the problem? – Gizmo_the_Great Oct 18 '12 at 21:40
1

In the next version (2.6.2), the DecodeStringBase64 will have an extra boolean parameter that invokes strict mode. (which was already available in the "stream" version).

If there is a validation error an exception will be thrown.

Marco van de Voort
  • 24,435
  • 5
  • 52
  • 86
  • Thankyou Marco. I'm always relieved when a question I ask turns out to be appropriate and helpful to the overall development of FPC\Lazarus. – Gizmo_the_Great Oct 18 '12 at 10:16