15

I would like to match a string within parentheses like:

(i, j, k(1))
^^^^^^^^^^^^

The string can contain closed parentheses too. How to match it with regular expression in Java without writing a parser, since this is a small part of my project. Thanks!

Edit:

I want to search out a string block and find something like u(i, j, k), u(i, j, k(1)) or just u(<anything within this paired parens>), and replace them to __u%array(i, j, k) and __u%array(i, j, k(1)) for my Fortran translating application.

Li Dong
  • 1,028
  • 2
  • 16
  • 27
  • is there a maximum "depth" of paranthesis, or can you have any depth? – radai Jul 20 '13 at 05:26
  • 1
    It doesn't sound like you need a very sophisticated parser... – Carl Norum Jul 20 '13 at 05:27
  • You can't do this with regexes, at least not with Java, since regexes are not recursive. PCRE can do this, though. But you should use/write a parser. For instance, you can try [parboiled](https://github.com/sirthias/parboiled). – fge Jul 20 '13 at 05:31
  • @radai I do not have a depth limit, but if that is needed, I can accept it. – Li Dong Jul 20 '13 at 05:31
  • I'm not saying you should, but you *can* do it. The caveat is that for every added nesting depth level you want to support, the regex gets more and more complicated. – acdcjunior Jul 20 '13 at 05:33
  • 2
    @LiDong This could be a [XY problem](http://meta.stackexchange.com/q/66377/219205). What do you want, exactly? Tell if the string is well-formed? The data in it? What? – acdcjunior Jul 20 '13 at 05:34
  • @acdcjunior I have added my full target. – Li Dong Jul 20 '13 at 05:41

3 Answers3

26

As I said, contrary to popular belief (don't believe everything people say) matching nested brackets is possible with regex.

The downside of using it is that you can only up to a fixed level of nesting. And for every additional level you wish to support, your regex will be bigger and bigger.

But don't take my word for it. Let me show you. The regex:

\([^()]*\)

Matches one level. For up to two levels, you'd need:

\(([^()]*|\([^()]*\))*\)

And so on. To keep adding levels, all you have to do is change the middle (second) [^()]* part to ([^()]*|\([^()]*\))* (check three levels here). As I said, it will get bigger and bigger.

Your problem:

For your case, two levels may be enough. So the Java code for it would be:

String fortranCode = "code code u(i, j, k) code code code code u(i, j, k(1)) code code code u(i, j, k(m(2))) should match this last 'u', but it doesnt.";
String regex = "(\\w+)(\\(([^()]*|\\([^()]*\\))*\\))"; // (\w+)(\(([^()]*|\([^()]*\))*\))
System.out.println(fortranCode.replaceAll(regex, "__$1%array$2"));

Input:

code code u(i, j, k) code code code code u(i, j, k(1)) code code code u(i, j, k(m(2))) should match this last 'u', but it doesnt.

Output:

code code __u%array(i, j, k) code code code code __u%array(i, j, k(1)) code code code u(i, j, __k%array(m(2))) should match this last 'u', but it doesnt.

Bottom line:

In the general case, parsers will do a better job - that's why people get so pissy about it. But for simple applications, regexes can pretty much be enough.

Note: Some flavors of regex support the nesting operator R (Java doesn't, PCRE engines like PHP and Perl do), which allows you to nest arbitrary number of levels. With them, you could do: \(([^()]|(?R))*\).

acdcjunior
  • 114,460
  • 30
  • 289
  • 276
  • 1
    @acdjunior Sir, can you please explain why this regex: `\((?:[^()]|(?:\([^()]*\)))*\)` would not work for any depth? – Ahmed Akhtar May 06 '16 at 11:11
  • @AhmedAkhtar Because it matches only two levels. Roughly speaking, it only matches two because after the second bracket is opened, it "ignores" any other bracket opening, meaning the first closing bracket it founds, it considers it is closing the second opened (not the last opened). Example: `aaa 1( aaa 2( aaa 3( aaa 4) aaa 5) aaa 6) aaa..`, in this case, the two-level regex interprets `4)` as closing `2(`, not `3(` as you'd expect. – acdcjunior Apr 19 '17 at 19:37
  • Sir please try to answer to comments earlier, it has been almost a year since I posted this comment and I really don't remember the context in which I asked the question. Thanks anyways btw. – Ahmed Akhtar Apr 20 '17 at 04:39
  • @AhmedAkhtar Yes, of course, sorry about the delay. Sometimes we don't answer right away and end up forgetting it altogether. Will try to be quicker next time, anyway. Cheers! – acdcjunior Apr 22 '17 at 01:08
  • I think this regex does not work for nested patterns like (abc (de) (fg) hi ). What modification can be done to the regex to support this ? – girijanandan nucha Jul 20 '20 at 08:45
  • @girijanandannucha Wha do you want to match, exactly? One level gets you this: https://regexr.com/58os7 Two levels would match the whole string as first match. – acdcjunior Jul 20 '20 at 17:58
1

Separate your job. Have the regex be:

([a-z]+)\((.*)\)

The first group will contain the identifier, the second the parameters. Then proceeed as such:

private static final Pattern PATTERN = Pattern.compile("([a-z]+)\\((.*)\\)");

// ...

final Matcher m = Pattern.matcher(input);

if (!m.matches())
    // No match! Deal with it.

// If match, then:

final String identifier = m.group(1);
final String params = m.group(2);

// Test if there is a paren
params.indexOf('(') != -1;

Replace [a-z]+ with whatever an identifier can be in Fortran.

fge
  • 110,072
  • 26
  • 223
  • 312
0

Please check this answer as it does basically what you try to do (in short it's not really possible with regexps)

Regular Expression to match outer brackets

Community
  • 1
  • 1
lpiepiora
  • 13,246
  • 1
  • 33
  • 45
  • It is right, regular expression (at least in Java) is not the right tool for this task. And I think I should change my plan to just match `u(` and replace it to `__u%array(`. – Li Dong Jul 20 '13 at 06:05