2

Working on a personal project I am using PHP and I'd like to run a preg_replace_callback function against the following strings:

1. {{hello}}
2. {{hello}{there}{how}{are}{you}}

I'd like to detect the hello there how are and you and send to a function as $matches[0-4] (or however many there may be, needs to be variable from 1 to infinity).

The above isn't too hard for me, but i'd also like it so if I pass this string :

3. {{hello}{there}{how}{are}{you}} blabla {{I}{Am}{Fine}{Thanks for asking}}

The function I send the $matches[0-X] to should be run TWICE, as the the little {{}} system I designed is opened and shut twice!

The pattern should also ignore {text just on its own like this} and BUT SHOULD run for {{text like this, i.e. just one box}}.

If I can type things with a back slash as a condition, such as:

4. {{ignore this next closing curly bracket \} as the slash makes it text}

...And it also could then also remove that now un-required backslash... well... THAT WOULD GET MASSIVE BONUS POINTS!!

All this is a preg_replace_callback too so I need the entire {{thing}{here}} to be replaced by whatever the function returns.

Is this simple? Or hard? I'm stuck!

Love learning though so if anyone could help me, it would be more appreciated than you'd ever imagine. Thank you!

EDIT p.s. If it is just too hard to do as I explain above, i'd accept it working for something like:

[{hello}{there}{how}{are}{you}]

Using square brackets as well as curly - But that is much less desirable...

mrmrw
  • 110
  • 8
  • 1
    Where's the code you've attempted? – l'L'l Oct 02 '14 at 03:13
  • Its only working with one batch of "{{foo}{bar}}" per line of input (when it should run twice for "{{foo1}{bar1}} {{foo2}{bar2}}") and it doesn't do the \ trick for making { or } text... so its a bit of an un-usefull mess I didn't post to avoid confusion. – mrmrw Oct 02 '14 at 03:18

1 Answers1

2

A variable amount of capture groups is impossible; however, you can do a global match and match all of them (it would be near impossible to see if it came from the first group or the second group though, with example #3):

(?:\G(?!\A)|\{)[^}]*?\K\{(.*?)(?<!\\)\}

Demo


P.S. Here is an example of an expression to show why variable capture groups are impossible. The repeated capture group will be replaced with each match and the contents will equal that of the final match: (a)+bc

Sam
  • 18,756
  • 2
  • 40
  • 65
  • Wow, thats a very helpful answer :) – mrmrw Oct 02 '14 at 03:26
  • What if I changed my syntax to [{hello}{there}{how}{are}{you}] Then run a preg_replace_callback to find [text in square brackets] then do ANOTHER preg_replace_callback (within the first!) to find the {curly brackets}? I think I could adapt your code to do this, but is that bad practice? – mrmrw Oct 02 '14 at 03:27
  • If you have control over the syntax, that would make for an easier expression. I almost mentioned that alternative (find all "top-level" curly brackets, then parse the contents of each set), but it is near-impossible to "count" with regex unless you are using .NET. If you are using square brackets for the top level, this would be another viable option. – Sam Oct 02 '14 at 03:30
  • 1
    [`/(?<=\[)[^\]]*(?=\])/g`](http://regex101.com/r/cU6kZ9/7) and [`/(?<=\{).*?(?=(? – Sam Oct 02 '14 at 03:32
  • I hate to ask more of you as you have already been so helpful - but could you perhaps provide the two patterns for that? One for square brackets --- it must not detect [this] or [[this]] but only [{this}] (i.e. a square bracket with a curly bracket inside it) and then the curly bracket one would just need to detect any text between {curly brackets} -- but with the / rule again ------ EDIT ---- Looks like you may have answered before I even posted this comment – mrmrw Oct 02 '14 at 03:33
  • 1
    See the above two. If you need the square brackets to contain a set of curly brackets inside of them (you shouldn't need to, since the second expression wouldn't match anything), you can use this: [`/(?<=\[)(?=[^\]]*?\{[^\]]*?\})[^\]]*(?=\])/g`](http://regex101.com/r/cU6kZ9/10). – Sam Oct 02 '14 at 03:35
  • one thing I think your latest code misses is your square one would match [this] which is incorrect. It should only detect if has [{this}] or [{this}{this}] etc (i.e. all find square bracket containers, but only if they contain both { and } characters inside them) – mrmrw Oct 02 '14 at 03:35
  • There is recursive regex in PCRE, similar to the feature in .NET which solves the bracket balance problem. – nhahtdh Oct 02 '14 at 04:31
  • @nhahtdh I meant to say "nearly-impossible" as I didn't think it was as nicely implemented as .NET -- do you have a link or mind posting an answer? I'm curious. – Sam Oct 02 '14 at 04:32
  • 2
    http://stackoverflow.com/q/3746487 Palindrome, which is pretty similar to bracket balancing. Or my answer for a different question here: http://stackoverflow.com/questions/16258143/get-all-nested-curly-braces/16262638#16262638 . I don't understand what OP wants, though, so I don't write an answer here. – nhahtdh Oct 02 '14 at 04:37
  • Nice resources, I'll take a closer look when I'm less tired :) – Sam Oct 02 '14 at 04:40
  • 1
    Interesting question @mrmrw and approach Sam +1 [I also played with it](http://regex101.com/r/fM2zH2/1) – Jonny 5 Oct 02 '14 at 10:21