139

Is there a way in C# to see if a string is Base 64 encoded other than just trying to convert it and see if there is an error? I have code code like this:

// Convert base64-encoded hash value into a byte array.
byte[] HashBytes = Convert.FromBase64String(Value);

I want to avoid the "Invalid character in a Base-64 string" exception that happens if the value is not valid base 64 string. I want to just check and return false instead of handling an exception because I expect that sometimes this value is not going to be a base 64 string. Is there some way to check before using the Convert.FromBase64String function?

Thanks!

Update:
Thanks for all of your answers. Here is an extension method you can all use so far it seems to make sure your string will pass Convert.FromBase64String without an exception. .NET seems to ignore all trailing and ending spaces when converting to base 64 so "1234" is valid and so is " 1234 "

public static bool IsBase64String(this string s)
{
    s = s.Trim();
    return (s.Length % 4 == 0) && Regex.IsMatch(s, @"^[a-zA-Z0-9\+/]*={0,3}$", RegexOptions.None);

}

For those wondering about performance of testing vs catching and exception, in most cases for this base 64 thing it is faster to check than to catch the exception until you reach a certain length. The smaller the length faster it is

In my very unscientific testing: For 10000 iterations for character length 100,000 - 110000 it was 2.7 times faster to test first.

For 1000 iterations for characters length 1 - 16 characters for total of 16,000 tests it was 10.9 times faster.

I am sure there is a point where it becomes better to test with the exception based method. I just don't know at what point that is.

abatishchev
  • 92,232
  • 78
  • 284
  • 421
Chris Mullins
  • 6,207
  • 2
  • 27
  • 38
  • 1
    It depends on how "thorough" you want the check to be. You can use some pre-validation using a regex as others have answered, but that isn't the only indicator. base64 encoding requires padding in some cases using the `=` sign. If the padding is wrong, it will give an error even though the input matches an expression. – vcsjones Jun 10 '11 at 16:41
  • 1
    Your condition does not exclusively satisfy base64 strings. Consider the string `\n\fLE16` - your method would yield a false positive for this. For anyone reading and looking for a foolproof method; I would recommend catching the FormatException or using a spec-suited RegEx, see http://stackoverflow.com/questions/475074/regex-to-parse-or-validate-base64-data. – nullable May 18 '17 at 20:21
  • if the method above returns false, how can I pad the string to the correct length? – Paul Alexander Jul 07 '17 at 09:55
  • 3
    I believe that the RegEx should be `@"^[a-zA-Z0-9\+/]*={0,2}$"` – 4Z4T4R Jan 04 '18 at 22:05
  • This solution is not reliable. It fails if you add 4 same characters string. – Bettimms Sep 11 '19 at 04:29
  • Title and first sentence ask different question. You can determine if a given string is valid base64 encoded string, but you cannot determine if a string is base64 encoded or not. – Jussi Palo Dec 04 '19 at 13:05

19 Answers19

72

Use Convert.TryFromBase64String from C# 7.2

public static bool IsBase64String(string base64)
{
   Span<byte> buffer = new Span<byte>(new byte[base64.Length]);
   return Convert.TryFromBase64String(base64, buffer , out int bytesParsed);
}
Bakudan
  • 17,636
  • 9
  • 48
  • 69
Tomas Kubes
  • 20,134
  • 14
  • 92
  • 132
48

I know you said you didn't want to catch an exception. But, because catching an exception is more reliable, I will go ahead and post this answer.

public static bool IsBase64(this string base64String) {
     // Credit: oybek https://stackoverflow.com/users/794764/oybek
     if (string.IsNullOrEmpty(base64String) || base64String.Length % 4 != 0
        || base64String.Contains(" ") || base64String.Contains("\t") || base64String.Contains("\r") || base64String.Contains("\n"))
        return false;

     try{
         Convert.FromBase64String(base64String);
         return true;
     }
     catch(Exception exception){
     // Handle the exception
     }
     return false;
}

Update: I've updated the condition thanks to oybek to further improve reliability.

Tomas Kubes
  • 20,134
  • 14
  • 92
  • 132
harsimranb
  • 2,143
  • 1
  • 33
  • 54
  • 1
    calling `base64String.Contains` multiple times may result in poor performance incase of `base64String` being a large string. – NucS Jul 23 '15 at 11:46
  • @NucS You are right, we can use a compiled regex here. – harsimranb Jan 14 '16 at 19:21
  • 1
    you can check for `base64String== null || base64String.Length == 0` with `string.IsNullOrEmpty(base64String)` – Daniël Tulp Apr 13 '17 at 13:54
  • Note that a Base64 can contain whitespace (e.g. line breaks) without issue. They're ignored by the parser. – Timothy Aug 01 '18 at 22:31
  • I think doing base64String.Contains("\t") etc actually escapes the \t, so it might not be recognized, might have to do base64String.Contains("\\t") but I'm not sure – bluejayke Mar 22 '19 at 07:32
  • 2
    Since we have access to the .NET source code now we can see the FromBase64String() function does all these checks. https://referencesource.microsoft.com/#mscorlib/system/convert.cs,08c34f52087ba624 If it's a valid base64 string then you are checking it twice. It maybe chepaer to just try/catch the exception. – iheartcsharp Jun 19 '19 at 14:57
  • I have plain and base64 encrypted. But when the plain text is 1234, it return true where program will proceed decoded and ends with Exception `Length of the data to decrypt is invalid.` – Luiey Jul 30 '20 at 01:42
48

Update: For newer versions of C#, there's a much better alternative, please refer to the answer by Tomas below.


It's pretty easy to recognize a Base64 string, as it will only be composed of characters 'A'..'Z', 'a'..'z', '0'..'9', '+', '/' and it is often padded at the end with up to three '=', to make the length a multiple of 4. But instead of comparing these, you'd be better off ignoring the exception, if it occurs.

Anirudh Ramanathan
  • 43,868
  • 20
  • 121
  • 177
  • 1
    I think you are on the right track. I did some testing and it seems it is multiples of 4 instead of 3. – Chris Mullins Jun 10 '11 at 17:24
  • 1
    Its length needs to be a multiple of 3, at the time of encoding, for successful encoding! Sorry about that... and yeah, you're right... The encoded string has a length which is a multiple of 4. Thats why we'd pad upto 3 '=' . – Anirudh Ramanathan Jun 10 '11 at 17:27
  • 4
    Marked Correct because you were first to mention the multiple thing. I updated my question with an implementation of the solution let me know if you see any problems with it. – Chris Mullins Jun 10 '11 at 18:05
  • This method does not work! I found it after several years. examine it with simple value `test` – Homayoun Behzadian Dec 09 '20 at 18:42
  • The padding is up to 2 '='. That's because converting only one byte (8bit) will end up in 2 base64 characters and 2 '=' paddings. Try to find an example with 3 '=' at the end if you don't believe me. – Zoltan Tirinda Feb 12 '21 at 20:57
17

I believe the regex should be:

    Regex.IsMatch(s, @"^[a-zA-Z0-9\+/]*={0,2}$")

Only matching one or two trailing '=' signs, not three.

s should be the string that will be checked. Regex is part of the System.Text.RegularExpressions namespace.

S. ten Brinke
  • 1,569
  • 2
  • 14
  • 34
JD Brennan
  • 792
  • 1
  • 9
  • 19
8

Just for the sake of completeness I want to provide some implementation. Generally speaking Regex is an expensive approach, especially if the string is large (which happens when transferring large files). The following approach tries the fastest ways of detection first.

public static class HelperExtensions {
    // Characters that are used in base64 strings.
    private static Char[] Base64Chars = new[] { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/' };
    /// <summary>
    /// Extension method to test whether the value is a base64 string
    /// </summary>
    /// <param name="value">Value to test</param>
    /// <returns>Boolean value, true if the string is base64, otherwise false</returns>
    public static Boolean IsBase64String(this String value) {

        // The quickest test. If the value is null or is equal to 0 it is not base64
        // Base64 string's length is always divisible by four, i.e. 8, 16, 20 etc. 
        // If it is not you can return false. Quite effective
        // Further, if it meets the above criterias, then test for spaces.
        // If it contains spaces, it is not base64
        if (value == null || value.Length == 0 || value.Length % 4 != 0
            || value.Contains(' ') || value.Contains('\t') || value.Contains('\r') || value.Contains('\n'))
            return false;

        // 98% of all non base64 values are invalidated by this time.
        var index = value.Length - 1;

        // if there is padding step back
        if (value[index] == '=')
            index--;

        // if there are two padding chars step back a second time
        if (value[index] == '=')
            index--;

        // Now traverse over characters
        // You should note that I'm not creating any copy of the existing strings, 
        // assuming that they may be quite large
        for (var i = 0; i <= index; i++) 
            // If any of the character is not from the allowed list
            if (!Base64Chars.Contains(value[i]))
                // return false
                return false;

        // If we got here, then the value is a valid base64 string
        return true;
    }
}

EDIT

As suggested by Sam, you can also change the source code slightly. He provides a better performing approach for the last step of tests. The routine

    private static Boolean IsInvalid(char value) {
        var intValue = (Int32)value;

        // 1 - 9
        if (intValue >= 48 && intValue <= 57) 
            return false;

        // A - Z
        if (intValue >= 65 && intValue <= 90) 
            return false;

        // a - z
        if (intValue >= 97 && intValue <= 122) 
            return false;

        // + or /
        return intValue != 43 && intValue != 47;
    } 

can be used to replace if (!Base64Chars.Contains(value[i])) line with if (IsInvalid(value[i]))

The complete source code with enhancements from Sam will look like this (removed comments for clarity)

public static class HelperExtensions {
    public static Boolean IsBase64String(this String value) {
        if (value == null || value.Length == 0 || value.Length % 4 != 0
            || value.Contains(' ') || value.Contains('\t') || value.Contains('\r') || value.Contains('\n'))
            return false;
        var index = value.Length - 1;
        if (value[index] == '=')
            index--;
        if (value[index] == '=')
            index--;
        for (var i = 0; i <= index; i++)
            if (IsInvalid(value[i]))
                return false;
        return true;
    }
    // Make it private as there is the name makes no sense for an outside caller
    private static Boolean IsInvalid(char value) {
        var intValue = (Int32)value;
        if (intValue >= 48 && intValue <= 57)
            return false;
        if (intValue >= 65 && intValue <= 90)
            return false;
        if (intValue >= 97 && intValue <= 122)
            return false;
        return intValue != 43 && intValue != 47;
    }
}
Community
  • 1
  • 1
Oybek
  • 5,998
  • 4
  • 25
  • 47
7

Why not just catch the exception, and return False?

This avoids additional overhead in the common case.

Tyler Eaves
  • 11,379
  • 1
  • 30
  • 39
  • 1
    This is an unusual case I guess where I am going to use the value is more likely to not be base 64 so I would rather avoid the overhead of the exception. It is much faster to check before. I am trying to convert an old system I inherited from clear text passwords to hashed values. – Chris Mullins Jun 10 '11 at 17:22
  • 2
    Regular expressions are never faster than what Tyler is suggesting. – Vincent Koeman Jun 10 '11 at 17:33
  • See the comment at the bottom of my post. I think depending upon the length of the strings you are working with it can be faster to test first, especially for small strings like hashed passwords. The string has to be a multiple of 4 to even get to the regex, and then regex on a small string is faster than on a very large string. – Chris Mullins Jun 10 '11 at 18:25
  • 4
    In a perfect world, one should not write code whose business logic is designed or is known to throw exceptions. Exception try/catch block is too expensive to be used as a decision block. – Ismail Hawayel Feb 27 '18 at 15:46
4

The answer must depend on the usage of the string. There are many strings that may be "valid base64" according to the syntax suggested by several posters, but that may "correctly" decode, without exception, to junk. Example: the 8char string Portland is valid Base64. What is the point of stating that this is valid Base64? I guess that at some point you'd want to know that this string should or should not be Base64 decoded.

In my case, I have Oracle connection strings that may be in plain text like:

Data source=mydb/DBNAME;User Id=Roland;Password=.....`

or in base64 like

VXNlciBJZD1sa.....................................==

I just have to check for the presence of a semicolon, because that proves that it is NOT base64, which is of course faster than any above method.

Roland
  • 3,391
  • 5
  • 32
  • 63
  • Agree, case specifics also impose certain additional fast checks. Just like plaintext connectionstring vs base64 encoded. – Oybek Nov 24 '14 at 05:11
3

I prefer this usage:

    public static class StringExtensions
    {
        /// <summary>
        /// Check if string is Base64
        /// </summary>
        /// <param name="base64"></param>
        /// <returns></returns>
        public static bool IsBase64String(this string base64)
        {
            //https://stackoverflow.com/questions/6309379/how-to-check-for-a-valid-base64-encoded-string
            Span<byte> buffer = new Span<byte>(new byte[base64.Length]);
            return Convert.TryFromBase64String(base64, buffer, out int _);
        }
    }

Then usage

if(myStr.IsBase64String()){

    ...

}
Scholtz
  • 863
  • 9
  • 15
  • and thats the best way to do that. People dont remember about extensions, you gave them great lesson. – Kamil Jan 22 '21 at 11:58
2

Knibb High football rules!

This should be relatively fast and accurate but I admit I didn't put it through a thorough test, just a few.

It avoids expensive exceptions, regex, and also avoids looping through a character set, instead using ascii ranges for validation.

public static bool IsBase64String(string s)
    {
        s = s.Trim();
        int mod4 = s.Length % 4;
        if(mod4!=0){
            return false;
        }
        int i=0;
        bool checkPadding = false;
        int paddingCount = 1;//only applies when the first is encountered.
        for(i=0;i<s.Length;i++){
            char c = s[i];
            if (checkPadding)
            {
                if (c != '=')
                {
                    return false;
                }
                paddingCount++;
                if (paddingCount > 3)
                {
                    return false;
                }
                continue;
            }
            if(c>='A' && c<='z' || c>='0' && c<='9'){
                continue;
            }
            switch(c){ 
              case '+':
              case '/':
                 continue;
              case '=': 
                 checkPadding = true;
                 continue;
            }
            return false;
        }
        //if here
        //, length was correct
        //, there were no invalid characters
        //, padding was correct
        return true;
    }
Jason K
  • 155
  • 2
  • 6
2
public static bool IsBase64String1(string value)
        {
            if (string.IsNullOrEmpty(value))
            {
                return false;
            }
            try
            {
                Convert.FromBase64String(value);
                if (value.EndsWith("="))
                {
                    value = value.Trim();
                    int mod4 = value.Length % 4;
                    if (mod4 != 0)
                    {
                        return false;
                    }
                    return true;
                }
                else
                {

                    return false;
                }
            }
            catch (FormatException)
            {
                return false;
            }
        }
Dev
  • 2,250
  • 22
  • 45
  • why you first try to convert then control other things – Snr Mar 29 '18 at 14:42
  • @Snr you are right. I think this is what he need to change : if (value.EndsWith("=")) { value = value.Trim(); int mod4 = value.Length % 4; if (mod4 != 0) { return false; } Convert.FromBase64String(value); return true; } else { return false; } – Wajid khan Feb 10 '19 at 08:56
2

I will use like this so that I don't need to call the convert method again

   public static bool IsBase64(this string base64String,out byte[] bytes)
    {
        bytes = null;
        // Credit: oybek http://stackoverflow.com/users/794764/oybek
        if (string.IsNullOrEmpty(base64String) || base64String.Length % 4 != 0
           || base64String.Contains(" ") || base64String.Contains("\t") || base64String.Contains("\r") || base64String.Contains("\n"))
            return false;

        try
        {
             bytes=Convert.FromBase64String(base64String);
            return true;
        }
        catch (Exception)
        {
            // Handle the exception
        }

        return false;
    }
Yaseer Arafat
  • 71
  • 1
  • 4
2

Do decode, re encode and compare the result to original string

public static Boolean IsBase64(this String str)
{
    if ((str.Length % 4) != 0)
    {
        return false;
    }

    //decode - encode and compare
    try
    {
        string decoded = System.Text.Encoding.UTF8.GetString(System.Convert.FromBase64String(str));
        string encoded = System.Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(decoded));
        if (str.Equals(encoded, StringComparison.InvariantCultureIgnoreCase))
        {
            return true;
        }
    }
    catch { }
    return false;
}
PKOS
  • 21
  • 3
1

Imho this is not really possible. All posted solutions fails for strings like "test" and so on. If they can be divided through 4, are not null or empty, and if they are a valid base64 character, they will pass all tests. That can be many strings ...

So there is no real solution other than knowing that this is a base 64 encoded string. What I've come up with is this:

if (base64DecodedString.StartsWith("<xml>")
{
    // This was really a base64 encoded string I was expecting. Yippie!
}
else
{
    // This is gibberish.
}

I expect that the decoded string begins with a certain structure, so I check for that.

testing
  • 17,950
  • 38
  • 208
  • 373
1

All answers were been digested into 1 function that ensures 100% that its results will be accurate.


1) Use function as below:

    string encoded = "WW91ckJhc2U2NHN0cmluZw==";
    msgbox("Is string base64=" + IsBase64(encoded));

2) Below is the function:

public bool IsBase64(string base64String)
{
    try
    {
        if (!base64String.Length < 1)
        {
            if (!base64String.Equals(Convert.ToBase64String(Encoding.UTF8.GetBytes(Encoding.UTF8.GetString(Convert.FromBase64String(base64String)))), StringComparison.InvariantCultureIgnoreCase) & !System.Text.RegularExpressions.Regex.IsMatch(base64String, @"^[a-zA-Z0-9\+/]*={0,2}$"))
            {
                return false;
            }
            if ((base64String.Length % 4) != 0 || string.IsNullOrEmpty(base64String) || base64String.Length % 4 != 0 || base64String.Contains(" ") || base64String.Contains(Constants.vbTab) || base64String.Contains(Constants.vbCr) || base64String.Contains(Constants.vbLf))
            {
                return false;
            }
        }
        else
        {
            return false;
        }
                    
        return true;
    }
    catch (FormatException ex)
    {
            return false;
    }
}

Sorry IwontTell
  • 302
  • 7
  • 20
1

Yes, since Base64 encodes binary data into ASCII strings using a limited set of characters, you can simply check it with this regular expression:

/^[A-Za-z0-9\=\+\/\s\n]+$/s

which will assure the string only contains A-Z, a-z, 0-9, '+', '/', '=', and whitespace.

Rob Raisch
  • 15,416
  • 3
  • 43
  • 55
  • That isn't always a sure fire way to tell. Base64 does some padding for you using the `=` character at the end. If that padding is invalid, it's not a correct base64 encoding, even though it matches your regex. You can demo this by finding a base 64 string with 1 or 2 `=` at the end, removing them, and trying to decode it. – vcsjones Jun 10 '11 at 16:40
  • I believe the OP asked to trap for illegal characters, not if the str was legal Base64. If the latter, you are correct, though padding errors in Base64 are easier to trap using exceptions. – Rob Raisch Jun 10 '11 at 16:42
  • Not true, at least the .Net version of base64 parser ignores padding completely. – Jay Jun 10 '11 at 16:44
0

I have just had a very similar requirement where I am letting the user do some image manipulation in a <canvas> element and then sending the resulting image retrieved with .toDataURL() to the backend. I wanted to do some server validation before saving the image and have implemented a ValidationAttribute using some of the code from other answers:

[AttributeUsage(AttributeTargets.Property, AllowMultiple = false, Inherited = false)]
public class Bae64PngImageAttribute : ValidationAttribute
{
    public override bool IsValid(object value)
    {
        if (value == null || string.IsNullOrWhiteSpace(value as string))
            return true; // not concerned with whether or not this field is required
        var base64string = (value as string).Trim();

        // we are expecting a URL type string
        if (!base64string.StartsWith("data:image/png;base64,"))
            return false;

        base64string = base64string.Substring("data:image/png;base64,".Length);

        // match length and regular expression
        if (base64string.Length % 4 != 0 || !Regex.IsMatch(base64string, @"^[a-zA-Z0-9\+/]*={0,3}$", RegexOptions.None))
            return false;

        // finally, try to convert it to a byte array and catch exceptions
        try
        {
            byte[] converted = Convert.FromBase64String(base64string);
            return true;
        }
        catch(Exception)
        {
            return false;
        }
    }
}

As you can see I am expecting an image/png type string, which is the default returned by <canvas> when using .toDataURL().

germankiwi
  • 1,012
  • 10
  • 10
0

Check Base64 or normal string

public bool IsBase64Encoded(String str)

{

try

{
    // If no exception is caught, then it is possibly a base64 encoded string
    byte[] data = Convert.FromBase64String(str);
    // The part that checks if the string was properly padded to the
    // correct length was borrowed from d@anish's solution
    return (str.Replace(" ","").Length % 4 == 0);
}
catch
{
    // If exception is caught, then it is not a base64 encoded string
   return false;
}

}

Navdeep Kapil
  • 261
  • 2
  • 5
0

I would suggest creating a regex to do the job. You'll have to check for something like this: [a-zA-Z0-9+/=] You'll also have to check the length of the string. I'm not sure on this one, but i'm pretty sure if something gets trimmed (other than the padding "=") it would blow up.

Or better yet check out this stackoverflow question

Community
  • 1
  • 1
Jay
  • 6,037
  • 3
  • 18
  • 23
0

Sure. Just make sure each character is within a-z, A-Z, 0-9, /, or +, and the string ends with ==. (At least, that's the most common Base64 implementation. You might find some implementations that use characters different from / or + for the last two characters.)