3

I'm trying to match a type definition

def euro : t1 -> t2 -> t3 (and this pattern my repeat further in other examples)

I came up with this regex

^def ([^\s]*)\s:\s([^\s]*)(\s->\s[^\s]*)*

But while it matches euro and t1 it

  • then matches -> t2 rather than t2
  • fails to match anything with t3

I can't see what I am doing wrong, and my goal is to capture

euro t1 t2 t3

as four separate items, and what I currently get is

0: "def euro : t1 -> t2 -> t3"
1: "euro"
2: "t1"
3: " -> t3"
Simon H
  • 17,952
  • 10
  • 57
  • 101
  • The regex matches the whole string, which does include the substring `t2`, and the substring `-> t2`? Did you want `t2` in its own capture group? – CertainPerformance Oct 18 '18 at 06:12

2 Answers2

1

You can't use a repeated capturing group in JS regex, all but the last values will be "dropped", re-written upon each subsequent iteration.

When creating a regular expression that needs a capturing group to grab part of the text matched, a common mistake is to repeat the capturing group instead of capturing a repeated group. The difference is that the repeated capturing group will capture only the last iteration, while a group capturing another group that's repeated will capture all iterations.

The way out can be capturing the whole substring and then split it. Here is an example:

var s = "def euro : t1 -> t2 -> t3";
var rx = /^def (\S*)\s:\s(\S*)((?:\s->\s\S*)*)/;
var res = [];
var m = s.match(rx);
if (m) {
  res = [m[1], m[2]];
  for (var s of m[3].split(" -> ").filter(Boolean)) {
     res.push(s);
  }
}
console.log(res);

Pattern details

  • ^ - start of string
  • def - a literal substring
  • (\S*) - Capturing group 1: 0+ non-whitespace chars
  • \s:\s - a : enclosed with single whitespaces
  • (\S*) - Capturing group 2: 0+ non-whitespace chars
    • ((?:\s->\s\S*)*) - Capturing group 3: 0+ repetitions of the following pattern sequences:
    • \s->\s - whitespace, ->, whitespace
    • \S* - 0+ non-whitespace chars
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • @SimonH That is a [non-capturing group](https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-what-does-do). It is only used for grouping patterns, hence there is no need of the capturing construct overhead. – Wiktor Stribiżew Oct 18 '18 at 06:43
0

Details:

  • ?: - creates a non-capturing group
  • $1 - recieves the result of first capturing group i.e., \w+
  • \s[\:\-\>]+\s - matches " : " or " -> "
  • \w+ - matches repeating alphanumeric pattern
  • let str = 'def euro : t1 -> t2 -> t3';
    let regex = /(?:def\s|\s[\:\-\>]+\s)(\w+)/g;
    
    let match = str.replace(regex, '$1\n').trim().split('\n');
    console.log(match);
    vrintle
    • 5,129
    • 1
    • 9
    • 39