Niels, this is an interesting and tricky question because you are looking for overlapping matches. Even with recursion, the task is not trivial.
You asked about any idea how to achieve this with regex
, so it sounds like even if this is not available in matlab, you would be interested in seeing an answer that shows you how to do it in regex.
This makes sense to me because tools often change the regex libraries they use. For instance Notepad++, which used to have crippled regex, switched to PCRE in version 6. (As it happens, PCRE would work with this solution.)
In Perl and PCRE, you can use this short regex:
(?=(\b\w+\((?:\d+|(?1))\)))
This will match:
cos(t(2))
t(2)
t(51)
For instance, in php, you could use this code (see the results at the bottom of the online demo).
$regex = "~(?=(\b\w+\((?:\d+|(?1))\)))~";
$string = "cos(t(2))+t(51)";
$count = preg_match_all($regex,$string,$matches);
print_r($matches[1]);
How does it work?
- To allow overlapping matches, we use a lookahead. That way, after matching
cos(t(2))
, the engine will position itself NOT after cos(t(2))
, but before the o
in cos
- In fact the engine does not actually match
cos(t(2))
but merely captures it to Group 1. What it matches is the assertion that at this position in the string, looking ahead, we can see x
. After matching this assertion, it tries to match it again starting from the next position in the string.
- The expression in the lookahead (which describes what we're looking for) is almost very simple: in
(\b\w+\((?:\d+|(?1))\))
, after the \d+
, the alternation |
allows us to repeat subroutine number one with (?1)
, which is to say, the whole expression we are currently within. So we don't recurse the entire regex (which includes a lookahead), but a subexpression thereof.