-2

I need to validate a base64 string if it was encoded from a pdf file. The string must be:

  • Starts with "JVBER" (to verify Pdf mime type)
  • Matches "^[a-zA-Z0-9+/]*={0,3}$" and string length is multiple of 4 (to verify a valid base64 string)

Can anyone help me to combine those conditions into a regex?

Thanks.

 public static bool HasPdfMimeType(string str)
    {
        if (!string.IsNullOrEmpty(str) && str.Length > 4)
        {
            return str.StartsWith("JVBER");
        }
        return false;
    }

 public static bool IsBase64string(string str)
    {
        if (string.IsNullOrEmpty(str))return false;

        str = str.Trim();
        return (str.Length % 4 == 0) && Regex.IsMatch(str, @"^[a-zA-Z0-9\+/]*={0,3}$", RegexOptions.None);  
    }
  • Perhaps this page can be helpful https://stackoverflow.com/questions/475074/regex-to-parse-or-validate-base64-data – The fourth bird Aug 15 '19 at 06:20
  • 4
    Why do you need Regex. Just use string method StartsWith("JVBER"). – jdweng Aug 15 '19 at 06:23
  • @jdweng because I need to validate a valid base64 string – Truong Anh Aug 15 '19 at 07:49
  • A string, is a string, is a string. A base 64 string is a string. You do not need Regex to check characters in a string. – jdweng Aug 15 '19 at 08:58
  • Note that the PDF spec is very lax and allows the header to start anywhere in the first 1024 bytes so simply check the first bytes will return false for many real PDF files – phuclv Aug 15 '19 at 09:52
  • @jdweng as I mentioned above, I need to check If a string is a valid base64 string, and also it's encoded from pdf file. So it's not simple to check if it starts with "JVBER". – Truong Anh Aug 23 '19 at 04:34
  • What is a "valid base64 string". How can you check if it is valid? Rather than check if Regex which will never 100% determine if the base64 string is valid, just unpack and see if you get an error. – jdweng Aug 23 '19 at 08:37
  • Please refer the link provided above by Thefourthbird to know how to check a valid base64 string, thanks. @jdweng – Truong Anh Aug 26 '19 at 10:34
  • Checking the characters in the base 64 string does not mean the string is valid. There is a size and CRC which needs to be checked which REGEX does not do. The link in my opinion does not validate the base 64 string. It does just a very small part of the validation. – jdweng Aug 26 '19 at 10:44
  • Does this answer your question? [RegEx to parse or validate Base64 data](https://stackoverflow.com/questions/475074/regex-to-parse-or-validate-base64-data) – dustytrash Jan 27 '20 at 18:31

1 Answers1

3

It's a little unusual to use regex to assert string length and start-of-string characters when you have several other libraries built for that, but if you'd like to do it, this will work:

(?=^(?:.{4})*$)^JVBER[a-zA-Z0-9\+\/]*={0,3}$

Getting the string to be an exact multiple of 4 was the tricky part.

BREAKDOWN

This first portion of the regex asserts that this string's length is an exact multiple of 4.
By making a group of four characters, repeating it as many times as necessary, and anchoring
it between the beginning and end of the string, the regex is forced to accept only strings
which are multiples of 4.

(?=^(?:.{4})*$)

(?=           )    positive lookahead - make sure this is true before continuing
   ^         $     between the start and the end of the string...
    (?:    )*      ...get as many...
       .{4}        ...groupings of exactly 4 characters (any will do) as possible.


The second portion asserts the string starts with JVBER (which is %PDF encoded in Base64),
and then asserts any number of legal Base64 characters follow. The end allows between zero
and three equal signs for padding.

^JVBER[a-zA-Z0-9\+\/]*={0,3}$

^                           $    anchor between start and end of the string
 JVBER                           match "JVBER" literally
      [a-zA-Z0-9\+\/]*           match as many valid Base64 characters as needed
                      ={0,3}     match between 0 and 3 = symbols 

Note that the + and / symbols are escaped. You may want to change this if you're working in C#.

Try it here!

Nick Reed
  • 5,029
  • 4
  • 14
  • 34