1

I'm writing a application to parse some commands. Commands are given in the form:

A { B }

I just want A and B. A is optional but that's easy enough to handle. The problem I'm having is that both A and B can contain almost any character including whitespace and '{' and '}'. The brackets need not be balanced, either. Is this possible to parse with a regex? If not, what is the simplest thing that you think could be done?

For example, given:

"parsme { foo { "hello" } { "goodbye" } {{{ } { bar { "up" } { "down" } }"

Then:

A = "parseme { foo { "hello" } { "goodbye" } {{{ }" and B = "bar { "up" } { "down" }"

Chad Layton
  • 129
  • 6
  • 2
    The whitespace and other characters are less of a concern, but if you say the brackets need not be balanced, how would you ever know where B starts? – jdi May 03 '12 at 01:43
  • I can't comprehend how anyone is meant to distinguish B from A! – Asherah May 03 '12 at 03:17
  • Sorry, I should have said the brackets in A need not be balanced. – Chad Layton May 03 '12 at 13:41
  • 1
    You can use Balancing groups in .NET like described here: http://www.marcomilani.it/2012/07/english-nested-strings-with-regular-expressions-similar-to-recursive-regex.html?lang=en – Marco Jul 17 '12 at 11:31

1 Answers1

5

You can't use a regular expression to parse anything that requires arbitrary nesting like parenthesis (this is a well established limitation of regular expressions, a little googling here will help you).

You will need to use a context-free grammar for this using a tool like Antlr.

Francis Upton IV
  • 18,850
  • 3
  • 50
  • 55
  • 1
    +1. Well, you [**can** use](http://stackoverflow.com/a/4234491/1191425) (Perl) regular expressions to parse nested structures. Whether you want to or not is up to *you*... – Li-aung Yip May 03 '12 at 02:44
  • 1
    @Li-aungYip: careful, that could scare some people away from programming for life! – Asherah May 03 '12 at 03:17
  • 1
    @Len: In this particular case, that was the effect tchrist was going for. ;) Not every string processing problem admits a regex-based solution, and even for the ones that do admit regex solutions, sometimes you're better off with something else anyway. – Li-aung Yip May 03 '12 at 03:47
  • Francis in .NET exist "Balancing groups" – Marco Jul 17 '12 at 11:32