2

I have a string with multiple substrings of the format {{******}} where ***** can be a couple things. I'm trying to split my string so that the resulting array contains the substrings before and after these substrings, as well as the full substrings themselves.

I've created a regular expression that works here: https://regex101.com/r/I65QQD/1/

I want the resulting array when I call str.split(...) to contain the full matches as seen in the link above. Right now it is returning subgroups so my array looks really weird:

let body = "Hello, thanks for your interest in the Melrose Swivel Stool. Although it comes in 2 different wood finishes, there aren't any options for the upholstery fabric. {{youtube:hyYnAioXOqQ}}\n Some similar stools in different finishes are below for your review. I hope this is helpful to you!\n\n{{attachment:2572795}}\n\n{{attachment:2572796}}\n\n{{attachment:2572797}}\n\n{{attachment:2572798}}\n";

let bodyComponents = body.split(/{{attachment:([\d]+)}}|{{(YOUTUBE|VIMEO):([\d\w]+)}}/i);

console.log(bodyComponents);

Is there any way to have the resulting array contain the full matches instead of the subgroups? So that it looks like this:

[
"Hello, thanks for your interest in the Melrose Swivel Stool. Although it comes in 2 different wood finishes, there aren't any options for the upholstery fabric. ",
"{{youtube:hyYnAioXOqQ}}",
...
]

Thanks

MarksCode
  • 5,456
  • 12
  • 42
  • 95

1 Answers1

1

You need to remove unnecessary capturing parentheses and turn an alternation group you have into a non-capturing one:

/({{attachment:\d+}}|{{(?:YOUTUBE|VIMEO):\w+}})/

Note that [\d\w] = \w and [\d] = \d.

Note that the whole pattern is wrapped with a single capturing group. ({{attachment:\d+}} has no capturing group round \d+, (?:YOUTUBE|VIMEO) is now a non-capturing group (and thus its value won't appear as a separate item in the resulting array) and ([\d\w]+) is turned into \w+ (\d is redundant as \w matches digits, too).

let body = "Hello, thanks for your interest in the Melrose Swivel Stool. Although it comes in 2 different wood finishes, there aren't any options for the upholstery fabric. {{youtube:hyYnAioXOqQ}}\n Some similar stools in different finishes are below for your review. I hope this is helpful to you!\n\n{{attachment:2572795}}\n\n{{attachment:2572796}}\n\n{{attachment:2572797}}\n\n{{attachment:2572798}}\n";
let bodyComponents = body.split(/({{attachment:\d+}}|{{(?:YOUTUBE|VIMEO):\w+}})/i);
console.log(bodyComponents);
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • Welp... that explanation of why the results still appear in the resulting array didn't help much in my understanding. Hopefully I'll wrap my head around it sometime! – MarksCode Sep 06 '17 at 02:06
  • @MarksCode Do you mean you do not understand why *captured group* values are stored in the resulting array after `split`? Because `split()` is written so. It keeps all captures in the resulting `split()` array. Thus, when you need to avoid that, replace all capturing groups (`(...)`) with non-capturing ones (`(?:...)`) – Wiktor Stribiżew Sep 06 '17 at 06:41
  • Oh, I see. I thought `split()` always just threw out whatever matched. I guess it works a bit different than I thought. – MarksCode Sep 06 '17 at 17:03
  • Not always, and not in all languages. In Java, it is not working the same way, capturing groups do not get into the resulting array. – Wiktor Stribiżew Sep 06 '17 at 17:06
  • This regex is very unintuitive. Why would the `YOUTUBE|VIMEO` have `?:` to start off, but the `attachment` doesn't? Is there a better way to structure this to make it more obvious what this is trying to do? – MarksCode Sep 06 '17 at 17:28
  • I do not understand why it is unintuitive: in regex, if you need to match a sequence of chars, you write the patterns matching them one after another (so, to match `attachment:` and then 1+ digits, you write `a`, t`...`:`, `\d+` one after another), if you want to match this char sequence or that, you use an alternation group (so, to match either `YOUTUBE` or `VIMEO`, you use `(YOUTUBE|VIMEO)` (capturing, if you need to store the submatch in the memory) or `(?:YOUTUBE|VIMEO)` (non-capturing, if the value is not necessary)). – Wiktor Stribiżew Sep 06 '17 at 17:41
  • Ok. It makes more sense to me now. I guess the confusing part for me was the caputuring/non-capturing. I'd never heard of those until now. – MarksCode Sep 06 '17 at 17:49
  • @MarksCode See [What is a non-capturing group? What does a question mark followed by a colon (?:) mean?](https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-what-does-a-question-mark-followed-by-a-colon) – Wiktor Stribiżew Sep 06 '17 at 17:51