13

I need to detect strings with the form @base64 (e.g. @VGhpcyBpcyBhbiBlbmNvZGVkIHN0cmluZw==) in my application.

The @ has to be at the beginning and the charset for base64 encoded strings is a-z, A-Z, 0-9, +, / and =. Would be the appropiate regular expresion to detect them?

Thanks

federico-t
  • 11,157
  • 16
  • 58
  • 108
  • 1
    possible duplicate of [RegEx to parse or validate Base64 data](http://stackoverflow.com/questions/475074/regex-to-parse-or-validate-base64-data) – Regexident Nov 12 '11 at 17:22

3 Answers3

13

Something like this should do (does not check for proper length!):

^@[a-zA-Z0-9+/]+={,2}$

The length of any base64 encoded string must be a multiple of 4, hence the additional.

See here for a solution that checks against proper length: RegEx to parse or validate Base64 data

A quick explanation of the regex from the linked answer:

^@ #match "@" at beginning of string
(?:[A-Za-z0-9+/]{4})* #match any number of 4-letter blocks of the base64 char set
(?:
    [A-Za-z0-9+/]{2}== #match 2-letter block of the base64 char set followed by "==", together forming a 4-letter block
| # or
    [A-Za-z0-9+/]{3}= #match 3-letter block of the base64 char set followed by "=", together forming a 4-letter block
)?
$ #match end of string
evandrix
  • 5,608
  • 4
  • 25
  • 33
Regexident
  • 29,108
  • 10
  • 91
  • 98
  • Something I forgot to mention is that base64 encoded strings have "=" characters only at the end, and have 2 at most. Is possible to check for this? – federico-t Nov 12 '11 at 17:16
  • ^@(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$ would be correct then? – federico-t Nov 12 '11 at 17:24
  • 2
    Yes and no, if you have confidence on the source with anything that starts with @ symbol then yes that should be good enough. Although I'm assuming you are trying to detect it because it might not be a valid source in which case even something like @HeyThisIsMyTweeterHandle might be detected as base64. Those are things you should consider. If you have control of both ends of the communications I would restructure it a bit. It might also help to simply do a - if first char @ then if base64_decode($str, true) !== false then base64_decode. No reg ex required. – JRomero Nov 12 '11 at 17:26
  • Well, if you basically just want to check for character set correctness and some basic prefix/suffix checking, then my short one would suffice. The longer one however also checks against proper length. – Regexident Nov 12 '11 at 17:28
  • That would be nice solution, problem is that I'm trying to extract the base64 from a context (in the middle of a text the user submits, for example). And yes, @HeyThisIsMyTweeterHandle would validate aswell, but that's not a problem for me, as long as it is valid (with proper length aswell) base64 – federico-t Nov 12 '11 at 17:35
  • +1 for J.Romero's suggestion to just use a native php base64 function. – Regexident Nov 12 '11 at 17:35
  • Good, the only thing is that this also matches nothing at all. – Alix Axel Sep 25 '12 at 00:07
  • @AlixAxel: Works fine for me. Mind to share what it doesn't work with? – Regexident Sep 25 '12 at 11:54
  • @Regexident: What I meant is `preg_match('~^@(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$~', '@'); // 1`, get it? – Alix Axel Sep 25 '12 at 13:08
4

try with:

^@(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$

=> RegEx to parse or validate Base64 data

Community
  • 1
  • 1
  • @PierrOz probably extracted from http://stackoverflow.com/questions/475074/regex-to-parse-or-validate-base64-data, but still I'm having a hard time so see what's going on there – federico-t Nov 12 '11 at 17:21
  • 1
    @Federico-Quagliotto how about linking to Gumbo's answer instead of blatantly stealing it without giving credit where credit is due? – Regexident Nov 12 '11 at 17:21
  • 2
    no steal, simply checked on my archive of useful regex. i use base64 for many things, that's all. i can see that the regex it's pretty the same, sorry for haven't checked on stackoverflow before. – Federico Quagliotto Nov 12 '11 at 17:26
  • @PierrOz: see my answer for an explaination of the regex. – Regexident Nov 12 '11 at 17:32
  • @FedericoQuagliotto: Sorry about the accusation then. Was the first result to show up and looked like a blatant steal. – Regexident Nov 12 '11 at 17:34
0

Here's an alternative regular expression:

^@(?=(.{4})*$)[A-Za-z0-9+/]*={0,2}$

It satisfies the following conditions:

  • The string length after the @ sign must be a multiple of four - (?=^(.{4})*$)
  • The content must be alphanumeric characters or + or / - [A-Za-z0-9+/]*
  • It can have up to two padding (=) characters on the end - ={0,2}
Paul
  • 41
  • 2