12

For a templating engine, I am using regular expressions to identify content under brackets in a string. For example the regex needs to match {key} or <tag> or [element].

Currently my regular expression looks like this:

var rx=/([\[\{<])([\s\S]+?)([\]\}>])]/;

The issue is that such a regular expression doesn't force brackets to match. For example in the following string:

[{lastName},{firstName}]

the regular expression will match [{lastName}

Is there a way to define matching brackets? Saying for example that if the opening bracket is a [ then the closing bracket must be a ], not a } or a >

Christophe
  • 24,147
  • 23
  • 84
  • 130

4 Answers4

26

The best way to do this, especially if different brackets can have different meanings, is to split into 3 regular expressions:

var rx1 = /\[([^\]]+)]/;
var rx2 = /\(([^)]+)\)/;
var rx3 = /{([^}]+)}/;

These will match any text surrounded by [], (), and {} respectively, with the text inside in the first matched group.

murgatroid99
  • 15,284
  • 8
  • 52
  • 88
4

you could use alternatives using pipe character (|) like this one /\[([\s\S]+?)\]|\{([\s\S]+?)\}|<([\s\S]+?)>/, although it gets pretty long.

EDIT: shortend the regex, is not that long any more...

StrubT
  • 998
  • 7
  • 18
  • 3
    Instead of `([\s\S]+?)` you could use something like `([^\]]+)`, which would make it clearer that you are matching the longest sequence of characters that does not contain the ending bracket (and shorter) – murgatroid99 Aug 10 '12 at 18:48
4
var rx = /\[[^\]]+\]|\{[^}]+\}|<[^>]+>/;
MikeM
  • 9,855
  • 2
  • 27
  • 42
  • right, this is similar to the first answer. I was actually interested in the use of an object you showed in another post: {"[":"]","",...}. The object used in a for...in loop might help make the pattern more generic. – Christophe Jan 17 '13 at 00:53
  • 1
    @Christophe. The answer you have accepted is of poor quality. You would have to go through the matches of the three regexs and compare them: e.g. with "{[abc]}" the first regex will match [abc] which is not what you want etc. Further, the answer doesn't handle `<>` at all and it handles `()` unnecessarily. You are not obliged to accept the first answer just because it is half-way correct - and the one you have accepted isn't even that! – MikeM Jan 17 '13 at 14:50
  • Well, it sounded like a good idea at the time... I am actually thinking that the solution I posted in http://stackoverflow.com/questions/14334740/missing-parentheses-with-regex/14369329#14369329 might be a good fit here too. – Christophe Jan 17 '13 at 19:08
1

Is there a way to define matching brackets? Saying for example that if the opening bracket is a [ then the closing bracket must be a ], not a } or a >

Sort-of.

ERE does not provide a way for you to match a closing bracket to an opening bracket the way you describe. (It may be possible using PREG magic, but I'll have to leave that for someone else.) You'll need either to have multiple regular expressions, or multiple atoms within a single regular expression.

If you use a single regex, I gather you'll need to determine the type of bracketed string you're detecting, as well as the content of that string. As was mentioned in comments, you'll need to do this in your programming language, but you can at least get what you need out of the regex.

In the regex below, each style of string is represented as a "branch" in the RE. Branches are separated by or-bars (|). For clarity, I'm assuming all strings are [:alnum:]. You haven't specified content, so you'll need to adjust for your particular requirements.

/(\[)([[:alnum:]]+)\]|(\()([[:alnum:]]+)\)|(\{)([[:alnum:]]+)\}/
 ↑   ↑               ↑                    ↑
 $1  $2           divider              divider

Note that in each branch, the first character is enclosed by round brackets, making it an "atom". You need your code to refer to this atom like a backreference. The second atom is the inner string. Now ... my JavaScript isn't as strong as my, say, baking skill, but this might be a start:

String.prototype.bracketstyle = function() {
  var re = /(\[)([:alnum:]+)\]|(\()([:alnum:]+)\)|(\{)([:alnum:]+)\}/;
  return this.replace(re,"$1");
}

String.prototype.innerstring = function() {
  var re = /(\[)([:alnum:]+)\]|(\()([:alnum:]+)\)|(\{)([:alnum:]+)\}/;
  return this.replace(re,"$2");
} 

I suspect you could combine these into a single function, or use them differently without making them a function, but you get the idea.

ghoti
  • 41,419
  • 7
  • 55
  • 93