1

I've seen many RegEx answers on how to check for Base64, but I can't find one specifically for representations of 256-bit numbers.

I'm brand new to Base64, byte conversions, and RegEx. This answer seems to be the best for checking Base64, but I can't tell from the details if it can be specifically applied to a representation of a 256-bit number.

^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$

I need to make sure of the validity of these strings because I'm using them as encodings of Ed25519 keys, and my en/decoder seems to accept non-ASCII characters.

I don't really understand if that can specifically be applied to a representation of a 256-bit number.

How can RegEx validate a Base64 encoded 256 bit number?

Community
  • 1
  • 1
  • 2
    I don't think there's anything special about bas64 when the original was a number. If you want to validate it, decode it first. – Barmar Jan 16 '14 at 01:01
  • @Barmar Thank you for looking Barmar! This is for Ed25519 keys, so I need to be very sure that what the user is giving is a true Base64 encoded 256bit number. The Base64 de/encoder I'm using seems to accept non-ASCII characters which I don't think will work with Ed25519 when decoded. –  Jan 16 '14 at 01:05

2 Answers2

4

As portforwardpodcast pointed out, a 256 bit number will be decoded into 43 characters with one = at the end as filler.

Only the first four bits are used of the number that the 43th character represents, so it can only be a character that represents a number where the two last bits are zero.

You can make a simpler regular expression to validate this than any base64 string, as you know exactly how long it should be:

^[A-Za-z0-9+/]{42}[AEIMQUYcgkosw048]=$
Guffa
  • 640,220
  • 96
  • 678
  • 956
  • Thank you so very much Guffa! I'm so new to regex that even though I think it does, I still have to ask: does that check for a trailing `=`? Also, for a 256 bit number, doesn't the first character before the `=` have fewer possibilities? Thank you so much in advance! –  Jan 16 '14 at 01:13
  • 1
    @Gracchus: Yes, you are right about the last character, and I was already looking into that. The `=` before the `$` in the regular expression verifies that the last character is `=`. – Guffa Jan 16 '14 at 01:18
  • Thank you so very, very much Guffa for the very thorough answer! –  Jan 16 '14 at 01:20
0

I would take the following steps:

  • Run the existing RegEx to decide if it's valid base64 or not
  • If true, decode from base64 and see if there are 256 bits, or 32 bytes.
    • I believe this is a 256 bit number: ampqampqampqampqampqampqampqampqampqampqamo= All 256 bit numbers will have the same length when encoded into base64. This means they will all have 43 characters, followed by one = for a total of 44 characters. You should be able to use this as a shortcut to determine of the base64 encoded string represents 256 bits.
benathon
  • 6,828
  • 2
  • 32
  • 64
  • Thank you for answering portforwardpodcast! I'm using this for digital signature keys, so I need to make absolutely sure. I can use that regex, make sure the string is 44 long, ends in only one `=`, but I need to be 100% certain. –  Jan 16 '14 at 01:09
  • If you need to be 100% certain, then generate 1000 random examples of keys, base64 encode them, and check what they look like. According to wikipedia, "Base64 encoding converts three octets into four encoded characters.". So one octet is 8 bits, so your key has 32 octets. (32 / 3) * 4 = 42.6. Base64 uses = padding on the end to account for the fraction. http://en.wikipedia.org/wiki/Base64 – benathon Jan 16 '14 at 01:13
  • Thank you again portforwardpodcast! I'm almost positive I read somewhere that a Base64 encoded 256-bit number will have fewer possibilities for the character before the `=`. Do you know what they are and how to check for that? –  Jan 16 '14 at 01:16
  • 1
    If you are using a 256 bit key, each bit can be any value, meaning there is no restriction on it's format. Once you convert to Base64, your key has the potential to look like anything (as long as it is valid Base64). So you should stack the regular expression that you have, and then do another check for length and final character. That should be everything you need. – benathon Jan 16 '14 at 01:18
  • Ah, so that's why it "works" when I use non-ASCII or violate the limit for the character before the `=`? Thank you for your help! –  Jan 16 '14 at 01:19