4

I have this regex:

preg_match_all('/{.*?}/', $html, $matches);

Which returns all strings that are written inside curly braces. The $matches variable contains the { and } characters also. How can I remove them?

I don't want to do:

if ($matches[0] == "{variable}")

And I don't want to add ( and ) characters to the regexp because I don't want to use:

preg_match_all('/{(.*?)}/', $html, $matches);
if ($matches[0][0] == "variable")

So is there a simpler way to remove the curly braces from the $matches within the regex?

sleepless_in_seattle
  • 1,912
  • 2
  • 20
  • 33

3 Answers3

7

In PCRE (PHP's implementation of regex), you can use lookarounds to do zero-length assertions. A lookbehind, (?<=...), will make sure that expression occurs behind the internal pointer. A lookahead, (?=...), will make sure that expression occurs ahead of the internal pointer. These can both be negated if need be: (?<!...) or (?!...).


This brings us to this expression:

(?<={).*?(?=})

Demo


Implement it the same way:

preg_match_all('/(?<={).*?(?=})/', $html, $matches);
// $matches[0] = 'variable';

@CasimirEtHippolyte makes a good point. This is a great example of where a lazy dot-match-all is not necessary and will potentially decrease performance with backtracking. You can replace the .*? with [^}]* to match 0+ non-} characters.

Sam
  • 18,756
  • 2
  • 40
  • 65
  • Note, you may want to use the `s` modifier. This will make your `.*?` match newline characters as well. – Sam Sep 22 '14 at 15:41
  • Beautiful! Just beautiful! Worked like a charm. I hope this doesn't have a huge impact on the performance of the regex. I will accept your answer in 5 minutes. – sleepless_in_seattle Sep 22 '14 at 15:45
  • 2
    @unska: If you want to improve performances, replace `.*?` with `[^}]+` – Casimir et Hippolyte Sep 22 '14 at 15:46
  • @unska it should have minimal, if any, performance impact compared to `{.*?}`. It will work the same way, but just *assert* for the `{` instead of actually matching it `{`. In other words, there is no "extra" logic for the lookarounds. – Sam Sep 22 '14 at 15:46
  • ... Or `[^{}]+` to avoid falsevar, at *"{falsevar{var1} {var2}}"* – Peter Krauss Sep 24 '14 at 17:01
3

Or reset after the { and match characters, that are not }. If {} are balanced, don't need another }

{\K[^}]*

See example on regex101

Jonny 5
  • 11,051
  • 2
  • 20
  • 42
  • can you provide a refrence to all these `\K`,*SPOT etc PCRE function where they are described in detail.In perl docs it is not very understandable – vks Sep 22 '14 at 16:05
  • @vks I added a [link for \K](http://www.rexegg.com/regex-php.html#K) the other you mean [Special Backtracking Control Verbs](http://perldoc.perl.org/5.12.0/perlre.html#Special-Backtracking-Control-Verbs) ? – Jonny 5 Sep 22 '14 at 16:07
  • got in in the reset link.This is again a better answer :) – vks Sep 22 '14 at 16:07
  • 1
    Thanx a lot .will go through those to learn these as well. :) – vks Sep 22 '14 at 16:09
  • @vks sure you know the [Regex FAQ](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075) here on Stackoverflow already don't you? Always a good quick-reference I think. – Jonny 5 Sep 22 '14 at 16:13
  • 1
    :O never new something of that sort.That is just great.thanx a lot . happy regexing :) – vks Sep 22 '14 at 16:15
  • Almost put this solution, as it is probably how I would do it. But I feel dirty preaching `\K` when you can use a lookaround, since one is more supported across flavors. Either way, +1 for the change in pace :) – Sam Sep 22 '14 at 16:32
  • 1
    @Sam As the question is about "ignore characters" and php tagged, I thought `\K` should at least be mentioned. Yours is a greatly explained and more compatible solution as you said. Already upvoted of course :) – Jonny 5 Sep 22 '14 at 16:41
2
(?<={).*?(?=})

Replace your regex by this.This will work.

vks
  • 63,206
  • 9
  • 78
  • 110