35

Is there some way to detect if a string has been base64_encoded() in PHP?

We're converting some storage from plain text to base64 and part of it lives in a cookie that needs to be updated. I'd like to reset their cookie if the text has not yet been encoded, otherwise leave it alone.

Ian McIntyre Silber
  • 5,315
  • 12
  • 50
  • 74

12 Answers12

25
function is_base64_encoded($data)
{
    if (preg_match('%^[a-zA-Z0-9/+]*={0,2}$%', $data)) {
       return TRUE;
    } else {
       return FALSE;
    }
};

is_base64_encoded("iash21iawhdj98UH3"); // true
is_base64_encoded("#iu3498r"); // false
is_base64_encoded("asiudfh9w=8uihf"); // false
is_base64_encoded("a398UIhnj43f/1!+sadfh3w84hduihhjw=="); // false

http://php.net/manual/en/function.base64-decode.php#81425

Tofandel
  • 1,702
  • 1
  • 14
  • 38
alex
  • 359
  • 3
  • 6
  • This is very useful, but your fourth example `is_base64_encoded("a398UIhnj43f/1!+sadfh3w84hduihhjw=="); // true` returns FALSE in my tests. – Dylan Sep 12 '17 at 01:16
  • 3
    @Dylan that is because that is not a valid base64. He just commented it wrong. – Digital Human Jun 06 '18 at 10:24
  • This is just matching a string with any length and ending with = or not. It would not tell a difference from a normal string or a base64 encoded one. – renatoaraujoc Sep 03 '20 at 14:08
  • base64_decode returns false if it fails to parse a base64 encoded string, so you'd just need to do: return base64_decode($str) !== false. – renatoaraujoc Sep 03 '20 at 14:09
25

Apologies for a late response to an already-answered question, but I don't think base64_decode($x,true) is a good enough solution for this problem. In fact, there may not be a very good solution that works against any given input. For example, I can put lots of bad values into $x and not get a false return value.

var_dump(base64_decode('wtf mate',true));
string(5) "���j�"

var_dump(base64_decode('This is definitely not base64 encoded',true));
string(24) "N���^~)��r��[jǺ��ܡם"

I think that in addition to the strict return value check, you'd also need to do post-decode validation. The most reliable way is if you could decode and then check against a known set of possible values.

A more general solution with less than 100% accuracy (closer with longer strings, inaccurate for short strings) is if you check your output to see if many are outside of a normal range of utf-8 (or whatever encoding you use) characters.

See this example:

<?php
$english = array();
foreach (str_split('az019AZ~~~!@#$%^*()_+|}?><": Iñtërnâtiônàlizætiøn') as $char) {
  echo ord($char) . "\n";
  $english[] = ord($char);
}
  echo "Max value english = " . max($english) . "\n";

$nonsense = array();
echo "\n\nbase64:\n";
foreach (str_split(base64_decode('Not base64 encoded',true)) as $char) {
  echo ord($char) . "\n";
  $nonsense[] = ord($char);
}

  echo "Max nonsense = " . max($nonsense) . "\n";

?>

Results:

Max value english = 195
Max nonsense = 233

So you may do something like this:

if ( $maxDecodedValue > 200 ) {} //decoded string is Garbage - original string not base64 encoded

else {} //decoded string is useful - it was base64 encoded

You should probably use the mean() of the decoded values instead of the max(), I just used max() in this example because there is sadly no built-in mean() in PHP. What measure you use (mean,max, etc) against what threshold (eg 200) depends on your estimated usage profile.

In conclusion, the only winning move is not to play. I'd try to avoid having to discern base64 in the first place.

chrishiestand
  • 2,044
  • 21
  • 23
22

I had the same problem, I ended up with this solution:

if ( base64_encode(base64_decode($data)) === $data){
    echo '$data is valid';
} else {
    echo '$data is NOT valid';
}
Amir
  • 1,951
  • 1
  • 20
  • 31
11

Better late than never: You could maybe use mb_detect_encoding() to find out whether the encoded string appears to have been some kind of text:

function is_base64_string($s) {
  // first check if we're dealing with an actual valid base64 encoded string
  if (($b = base64_decode($s, TRUE)) === FALSE) {
    return FALSE;
  }

  // now check whether the decoded data could be actual text
  $e = mb_detect_encoding($b);
  if (in_array($e, array('UTF-8', 'ASCII'))) { // YMMV
    return TRUE;
  } else {
    return FALSE;
  }
}
Marki
  • 590
  • 7
  • 21
10

We can combine three things into one function to check if given string is a valid base 64 encoded or not.

function validBase64($string)
{
    $decoded = base64_decode($string, true);

    // Check if there is no invalid character in string
    if (!preg_match('/^[a-zA-Z0-9\/\r\n+]*={0,2}$/', $string)) return false;

    // Decode the string in strict mode and send the response
    if (!$decoded) return false;

    // Encode and compare it to original one
    if (base64_encode($decoded) != $string) return false;

    return true;
}
Abhinav bhardwaj
  • 2,205
  • 19
  • 21
5

I was about to build a base64 toggle in php, this is what I did:

function base64Toggle($str) {
    if (!preg_match('~[^0-9a-zA-Z+/=]~', $str)) {
        $check = str_split(base64_decode($str));
        $x = 0;
        foreach ($check as $char) if (ord($char) > 126) $x++;
        if ($x/count($check)*100 < 30) return base64_decode($str);
    }
    return base64_encode($str);
}

It works perfectly for me. Here are my complete thoughts on it: http://www.albertmartin.de/blog/code.php/19/base64-detection

And here you can try it: http://www.albertmartin.de/tools

Albert
  • 71
  • 1
  • 2
  • I personally love this solution because it gets closest (`return false` in place of `return base64_encode($str)` and `return true` in place of `return base64_decode($str)` and you get a near perfect solution to OP). I appreciate how you explained it on your blog. – Fr0zenFyr Jul 05 '19 at 09:09
  • I think you should also check out (Marki's solution)[https://stackoverflow.com/a/51877882/1369473]. It's more flexible and less prone to errors – Fr0zenFyr Jul 05 '19 at 09:21
3

base64_decode() will not return FALSE if the input is not valid base64 encoded data. Use imap_base64() instead, it returns FALSE if $text contains characters outside the Base64 alphabet imap_base64() Reference

Sivaguru
  • 57
  • 6
3

Here's my solution:

if(empty(htmlspecialchars(base64_decode($string, true)))) { return false; }

It will return false if the decoded $string is invalid, for example: "node", "123", " ", etc.

Special K.
  • 491
  • 1
  • 8
  • 18
2
$is_base64 = function(string $string) : bool {
    $zero_one = ['MA==', 'MQ=='];
    if (in_array($string, $zero_one)) return TRUE;

    if (empty(htmlspecialchars(base64_decode($string, TRUE))))
        return FALSE;

    return TRUE;
};

var_dump('*** These yell false ***');
var_dump($is_base64(''));
var_dump($is_base64('This is definitely not base64 encoded'));
var_dump($is_base64('node'));
var_dump($is_base64('node '));
var_dump($is_base64('123'));
var_dump($is_base64(0));
var_dump($is_base64(1));
var_dump($is_base64(123));
var_dump($is_base64(1.23));

var_dump('*** These yell true ***');
var_dump($is_base64(base64_encode('This is definitely base64 encoded')));
var_dump($is_base64(base64_encode('node')));
var_dump($is_base64(base64_encode('123')));
var_dump($is_base64(base64_encode(0)));
var_dump($is_base64(base64_encode(1)));
var_dump($is_base64(base64_encode(123)));
var_dump($is_base64(base64_encode(1.23)));
var_dump($is_base64(base64_encode(TRUE)));

var_dump('*** Should these yell true? Might be edge cases ***');
var_dump($is_base64(base64_encode('')));
var_dump($is_base64(base64_encode(FALSE)));
var_dump($is_base64(base64_encode(NULL)));
Francisco Luz
  • 2,225
  • 1
  • 21
  • 31
0

Usually a text in base64 has no spaces.

I used this function which worked fine for me. It tests if the number of spaces in the string is less than 1 in 20.

e.g: at least 1 space for each 20 chars --- ( spaces / strlen ) < 0.05

function normalizaBase64($data){
    $spaces = substr_count ( $data ," ");
    if (($spaces/strlen($data))<0.05)
    {
        return base64_decode($data);
    }
    return $data;
}
Szabolcs Páll
  • 1,229
  • 5
  • 23
  • 26
0

May be it's not exactly what you've asked for. But hope it'll be usefull for somebody.

In my case the solution was to encode all data with json_encode and then base64_encode.

$encoded=base64_encode(json_encode($data));

this value could be stored or used whatever you need. Then to check if this value isn't just a text string but your data encoded you simply use

function isData($test_string){
   if(base64_decode($test_string,true)&&json_decode(base64_decode($test_string))){
      return true;
   }else{
    return false;
   }

or alternatively

function isNotData($test_string){
   if(base64_decode($test_string,true)&&json_decode(base64_decode($test_string))){
      return false;
   }else{
    return true;
   }

Thanks to all previous answers authors in this thread:)

Mikhail.root
  • 624
  • 8
  • 17
0

Your best option is:

$base64_test = mb_substr(trim($some_base64_data), 0, 76);
return (base64_decode($base64_test, true) === FALSE ? FALSE : TRUE);
Digital Human
  • 1,426
  • 16
  • 23