-2

Spent half a day reading up on SO and elsewhere.

Say I have a string:

"a_b_c_d_e_f_g_1_2_3_4_5"

Is there a single regex that can construct a result based on two matches? E.g. construct a string which is a concatenation of two matches: First one - between 3rd and 5th and the second one between 8th and 10th underscores (regardless of how many other characters are between them)?

The result for the above sample would be:

"d_e_2_3"

Thanks!

Peter Seliger
  • 4,001
  • 1
  • 20
  • 27
user2349195
  • 62
  • 1
  • 8

1 Answers1

2

Is it possible to find and extract two substrings with a single regex?

No it is not, but one can use a combination of a regular expression that captures groups and String.prototype.replace.

The regex for the OP's use case might look like this ...

(/^(?:[^_]*_){3}([^_]+_[^_]+)_(?:[^_]+_){3}([^_]+_[^_]+).*/)

... and can be read as follows ...

  1. One wants to start searching from the very beginning of a String ... ^.
  2. Next one wants to find any character sequence that is NOT _ ... [^_].
  • but since it might not always be assured that a string starts not with _, one is looking it up optionally ... thus it turns into [^_]*.
  • of course such a sequence should be followed by a _, thus the former term turns into [^_]*_.
  • since this pattern is supposed to repeat itself 3 times ({3}) it needs to be grouped ((...)) but it should not be captured (?:) ... thus the partial expression turn into ^(?:[^_]*_){3} and would already match 'a_b_c_' from the OP's example of 'a_b_c_d_e_f_g_1_2_3_4_5'.
  1. Now one wants to match a non-_ character sequence, followed by an _, followed by a non-_ character sequence, followed by an _. One wants to capture everything except for the last _. Thus the 2nd part of the regex looks like this ... ([^_]+_[^_]+)_.
  2. The third part is similar to the first one except that one is sure about the existence of the next character (sequence) which is not an _. Thus the 3rd part of the regex looks like this ... (?:[^_]+_){3}.
  3. The 4th part is an exact copy of the 2nd one ... ([^_]+_[^_]+).
  4. In order to match the string entirely, one goes for the rest of the string via a greedy wild cart ... . matches anything ... * in case there was still something to match.
  5. Since one also might support multiline matches, one has to provide both flags the global (g) and the multiline (m) one.

Example code ...

const regX = (/^(?:[^_]*_){3}([^_]+_[^_]+)_(?:[^_]+_){3}([^_]+_[^_]+).*/gm);


console.log(
  'a_b_c_d_e_f_g_1_2_3_4_5'.replace(regX, '$1_$2')
);
console.log(
  '_b_c_d_e_f_g_1_2_3_4_5'.replace(regX, '$1_$2')
);
console.log(
  'b_c_d_e_f_g_1_2_3_4_5'.replace(regX, '$1_$2')
);
console.log(
  '_c_d_e_f_g_1_2_3_4_5'.replace(regX, '$1_$2')
);


console.log([...
`a_b_c_d_e_f_g_1_2_3_4_5
_b_c_dd_ee_f_g_1_222_333_4_5
b_c_dd_ee_f_g_1_222_333_4_5
_c_dd_ee_ff_g_1_222_333_444_5
c_dd_ee_ff_g_1_222_333_444_55_66
_dd_ee_ff_gg_1_222_333_444_55_66`
.matchAll(regX)].map(([match, $1, $2]) => ($1 + '_' + $2))
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
Peter Seliger
  • 4,001
  • 1
  • 20
  • 27