1

I am trying to match the base64 encoded block using a regex in php. It is preceded by 'Content-Transfer-Encoding: base64' so I was hoping I could just match the content after this but my regex below isn't working. Please help me fix this regex to match the base64 block. The base64 block is repeated twice as this in an e-mail body, I presume first block is text version and second html. I would like to match both base 64 blocks which is why I am using preg match all but not this middle bit of text.

This is the code I have so far

$regex = '/Content-Transfer-Encoding:\sbase64\s\n(.*?)/';

preg_match_all($regex, $message, $matches); 

This is the message:

Content-Transfer-Encoding: base64

DQoNCg0KDQoNCg0KDQoNCg0KDQpbQiZRXTxodHRwOi8vd3d3LmRpeS5jb20+DQoNCg0KDQoNCg0K
W2h0dHA6Ly9raW5nZmlzaGVyLnNjZW5lNy5jb20vaXMvaW1hZ2UvS2luZ2Zpc2hlci9pY29uX3N0
b3JlX2xvY2F0b3I/d2lkPTM2JmhlaT0zNiZxbHQ9MTAwXTxodHRwOi8vd3d3LmRpeS5jb20vZmlu
ZC1hLXN0b3JlPg0KDQoNCg0KRmluZCBhIHN0b3JlPGh0dHA6Ly93d3cuZGl5LmNvbS9maW5kLWEt
c3RvcmU+DQoNCg0KDQoNCkN1c3RvbWVyIFNlcnZpY2VzDQoNCjAzMzMgMDE0IDMzNTcNCg0KDQoN
Cg0KDQoNCg0KDQoNCkluc3BpcmF0aW9uPGh0dHA6Ly93d3cuZGl5LmNvbS9pbnNwaXJhdGlvbi8w
Lmlyb290Pg0KDQpQcm9qZWN0czxodHRwOi8vd3d3LmRpeS5jb20vcHJvamVjdHMvMi5wcm9vdD4N
Cg0KU2hvcDxodHRwOi8vd3d3LmRpeS5jb20vc2hvcC8+DQoNCkhlbHAgJiBBZHZpY2U8aHR0cDov
L3d3dy5kaXkuY29tL2hlbHAtYWR2aWNlLzEuaHJvb3Q+DQoNCk15IGFjY291bnQ8aHR0cDovL3d3
dy5kaXkuY29tL2N1c3RvbWVyL215X2FjY291bnQvPg0KDQoNCg0KDQoNCg0KDQoNCg0KRGVhciBC
ZW4gUGF0b24NCg0KDQoNCg0KVGhhbmsgeW91IGZvciB5b3VyIG9yZGVyDQoNCg0KDQoNCg0KT3Jk
ZXIgbnVtYmVyOg0KDQowMDYzMTA5MDU1DQoNCg0KDQpUb3RhbCBDb3N0Og0KDQrCozMuMjcNCg0K
DQoNClRoYW5rIHlvdSBmb3Igb3JkZXJpbmcgZnJvbSBCJlEuIFlvdeKAmWxsIGZpbmQgZGV0YWls
cyBvZiB5b3VyIG9yZGVyIGFuZCBkZWxpdmVyeSBvciBjb2xsZWN0aW9uIGluZm9ybWF0aW9uIGJl
bG93LiBGb3IgaGVscCB3aXRoIHF1ZXN0aW9ucyBhYm91dCBvdXIgc2VydmljZSwgcGxlYXNlIHNl

--_000_D16F6E4A2986D34F9D752E3564EAC46F51043449APP1198ghakfplc_
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: base64

PGh0bWwgeG1sbnM6dj0idXJuOnNjaGVtYXMtbWljcm9zb2Z0LWNvbTp2bWwiIHhtbG5zOm89InVy
bjpzY2hlbWFzLW1pY3Jvc29mdC1jb206b2ZmaWNlOm9mZmljZSIgeG1sbnM6dz0idXJuOnNjaGVt
YXMtbWljcm9zb2Z0LWNvbTpvZmZpY2U6d29yZCIgeG1sbnM6bT0iaHR0cDovL3NjaGVtYXMubWlj
cm9zb2Z0LmNvbS9vZmZpY2UvMjAwNC8xMi9vbW1sIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcv
rock321987
  • 10,292
  • 1
  • 23
  • 36
Ben Paton
  • 1,414
  • 8
  • 32
  • 55
  • use `.*` instead of `.*?` and `s` flag for matching `.` with new line like :- https://regex101.com/r/eM4hB3/1 – rock321987 Apr 14 '16 at 18:00
  • 1
    Listen to @rock321987, but before that add some more details as to what's the whole context ? –  Apr 14 '16 at 18:03

2 Answers2

1

This should work

/Content-Transfer-Encoding:\sbase64\s+(.*)(?=Content-Transfer-Encoding: base64|$)/g

Regex Demo

PHP Code

$re = "/Content-Transfer-Encoding:\\sbase64\\s+(.*)(?=Content-Transfer-Encoding: base64|$)/"; 
$str = "Content-Transfer-Encoding: base64\n\nDQoNCg0KDQoNCg0KDQoNCg0KDQpbQiZRXTxodHRwOi8vd3d3LmRpeS5jb20+DQoNCg0KDQoNCg0K W2h0dHA6Ly9raW5nZmlzaGVyLnNjZW5lNy5jb20vaXMvaW1hZ2UvS2luZ2Zpc2hlci9pY29uX3N0 b3JlX2xvY2F0b3I/d2lkPTM2JmhlaT0zNiZxbHQ9MTAwXTxodHRwOi8vd3d3LmRpeS5jb20vZmlu ZC1hLXN0b3JlPg0KDQoNCg0KRmluZCBhIHN0b3JlPGh0dHA6Ly93d3cuZGl5LmNvbS9maW5kLWEt c3RvcmU+DQoNCg0KDQoNCkN1c3RvbWVyIFNlcnZpY2VzDQoNCjAzMzMgMDE0IDMzNTcNCg0KDQoN Cg0KDQoNCg0KDQoNCkluc3BpcmF0aW9uPGh0dHA6Ly93d3cuZGl5LmNvbS9pbnNwaXJhdGlvbi8w Lmlyb290Pg0KDQpQcm9qZWN0czxodHRwOi8vd3d3LmRpeS5jb20vcHJvamVjdHMvMi5wcm9vdD4N Cg0KU2hvcDxodHRwOi8vd3d3LmRpeS5jb20vc2hvcC8+DQoNCkhlbHAgJiBBZHZpY2U8aHR0cDov L3d3dy5kaXkuY29tL2hlbHAtYWR2aWNlLzEuaHJvb3Q+DQoNCk15IGFjY291bnQ8aHR0cDovL3d3 dy5kaXkuY29tL2N1c3RvbWVyL215X2FjY291bnQvPg0KDQoNCg0KDQoNCg0KDQoNCg0KRGVhciBC ZW4gUGF0b24NCg0KDQoNCg0KVGhhbmsgeW91IGZvciB5b3VyIG9yZGVyDQoNCg0KDQoNCg0KT3Jk ZXIgbnVtYmVyOg0KDQowMDYzMTA5MDU1DQoNCg0KDQpUb3RhbCBDb3N0Og0KDQrCozMuMjcNCg0K DQoNClRoYW5rIHlvdSBmb3Igb3JkZXJpbmcgZnJvbSBCJlEuIFlvdeKAmWxsIGZpbmQgZGV0YWls cyBvZiB5b3VyIG9yZGVyIGFuZCBkZWxpdmVyeSBvciBjb2xsZWN0aW9uIGluZm9ybWF0aW9uIGJl bG93LiBGb3IgaGVscCB3aXRoIHF1ZXN0aW9ucyBhYm91dCBvdXIgc2VydmljZSwgcGxlYXNlIHNl--_000_D16F6E4A2986D34F9D752E3564EAC46F51043449APP1198ghakfplc_ Content-Type: text/html; charset=\"utf-8\" Content-Transfer-Encoding: base64\n\nPGh0bWwgeG1sbnM6dj0idXJuOnNjaGVtYXMtbWljcm9zb2Z0LWNvbTp2bWwiIHhtbG5zOm89InVy bjpzY2hlbWFzLW1pY3Jvc29mdC1jb206b2ZmaWNlOm9mZmljZSIgeG1sbnM6dz0idXJuOnNjaGVt YXMtbWljcm9zb2Z0LWNvbTpvZmZpY2U6d29yZCIgeG1sbnM6bT0iaHR0cDovL3NjaGVtYXMubWlj cm9zb2Z0LmNvbS9vZmZpY2UvMjAwNC8xMi9vbW1sIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcv\n"; 

preg_match_all($re, $str, $matches);

print_r($matches[1]);

Ideone Demo

$matches is an array of array.

$matches[0] contains all the values that are matched including Content-Transfer-Encoding:\\sbase64\\s+.

$matches[1] contains all the values matched after Content-Transfer-Encoding:\\sbase64\\s+

rock321987
  • 10,292
  • 1
  • 23
  • 36
0

Try this:

preg_match_all('/Content-Transfer-Encoding: base64\s+(.*?)$/', $subject, $result, PREG_PATTERN_ORDER);
$baseString = $result[1][0];

REGEX EXPLANATION:

Content-Transfer-Encoding: base64\s+(.*?)$

Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Greedy quantifiers

Match the character string “Content-Transfer-Encoding: base64” literally (case sensitive) «Content-Transfer-Encoding: base64»
Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 1 «(.*?)»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Assert position at the end of the string, or before the line break at the end of the string, if any (line feed) «$»

REGEX DEMO:

https://regex101.com/r/lI8lJ6/1


LIVE PHP DEMO:

http://ideone.com/fK3z3n


UPDATE:

Based on your comments, you can use this regex to capture and validate a base64 string:

^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$

SRC: https://stackoverflow.com/a/475217/797495

Community
  • 1
  • 1
Pedro Lobito
  • 75,541
  • 25
  • 200
  • 222
  • Sorry I have edited the question to add more context as the base64 repeats twice. – Ben Paton Apr 14 '16 at 18:37
  • Which base64 do you need ? the 1st or 2nd ? The 2nd doesn't seem to be a valid base64 string. – Pedro Lobito Apr 14 '16 at 18:41
  • I need to capture both and then stop after the base64 part on both and I need it working in php. The 2nd probably isn't valid as it's a small part of it. I don't want to past the whole thing as it's very long and contains information I shouldn't be sharing on here. – Ben Paton Apr 14 '16 at 18:49
  • You can change the contents of the base64 string to something bogus but with a valid syntax. Without a valid example I cannot help you further. – Pedro Lobito Apr 14 '16 at 18:51
  • Actually, none of your base 64 strings are valid. Check my update – Pedro Lobito Apr 14 '16 at 18:55
  • It is a valid example, I've just cut it off. – Ben Paton Apr 14 '16 at 18:59
  • That regex gives this error message though Warning: preg_match_all(): Unknown modifier ']' – Ben Paton Apr 14 '16 at 19:14