0

Revised Question

Can a regex match an unlimited number of capture groups? If so, how?

Here is an example from the path-to-regexp which seems to match an unlimited number of groups:

([\\/.])?(?:(?:\\:(\\w+)(?:\\(((?:\\\\.|[^()])+)\\))?|\\(((?:\\\\.|[^()])+)\\))([+*?])?|(\\*))

Original Question

How does this regex:

([\\/.])?(?:(?:\\:(\\w+)(?:\\(((?:\\\\.|[^()])+)\\))?|\\(((?:\\\\.|[^()])+)\\))([+*?])?|(\\*))

from path-to-regexp work?

I know it's for parsing URIs for path segments, but it seems to work for any number of path segments -- I didn't think that was possible to do with regular expressions.

Is there a way to break this down into smaller chunks and explain how it works?

Max Heiber
  • 10,336
  • 4
  • 47
  • 67
  • 2
    First, check [live explanation of what it does](https://regex101.com/r/gJ9aB7/2) – Thomas Ayoub Jun 15 '16 at 16:42
  • @Wiktor Stribiżew: How is this a duplicate? The page you linked to doesn't answer the question. – Max Heiber Jun 15 '16 at 17:54
  • @ThomasAyoub thanks for the link, which explains what each of the pieces do. The part I don't understand is how the pieces come together to make the whole, however. – Max Heiber Jun 15 '16 at 17:55
  • 1
    @Wiktor Stribiżew: The page you linked to is meant to cut down on questions like "What is a regex that solves my problem?" rather than what my question was, which is "HOW does this regex solve the problem?" While I appreciate your efforts in moderating, you pulled the trigger on this one too soon. You marked this as an "exact duplicate" while it's nothing like the forum post you linked to. – Max Heiber Jun 15 '16 at 18:00
  • No, it has all you need. All you need is to put the regex you have at [regex101.com](http://regex101.com), and read the explanations because you ask for a regex pattern explanation. I – Wiktor Stribiżew Jun 15 '16 at 18:04
  • @WiktorStribiżew all that does is break down the syntax. I understand regex syntax. The person who came up with that regex didn't just spurt out the whole thing at once: there is some human-understandable way to write something like that. That's what I'm looking for. If you run it through regex101.com, do you feel like you understand it well enough to reproduce it (but not from memory)? – Max Heiber Jun 15 '16 at 19:27
  • It helps to understand the pattern very well . Add x modifier and split the groups with line breaks to see what each group does.Unless you have a real problem with this regex, I would be glad to help. – Wiktor Stribiżew Jun 15 '16 at 19:41
  • Besides, what the pattern does is described very well [at that GitHub page you linked to](https://github.com/pillarjs/path-to-regexp/blob/master/index.js#L17). *Match Express-style parameters and un-named parameters with a prefix and optional suffixes.* – Wiktor Stribiżew Jun 15 '16 at 20:28
  • It's an extremely simple regex to understand `opt[/.], EITHER( EITHER( :, \w, opt(block) _OR_ (block) ), opt[+*?]) _OR_ [*] )` But, whoever wrote it probably is a _novice_. One of the novice things done is seen here `\((?:\\.|[^()])+\)`. In an attempt to allow escaped parenthesis he/she uses `\\.` however, when combined with `[^()]` allows unbalanced escapes that lets `(\\(\\))` match. The general rule when parsing escaped characters is that the _escape_ itself must be forced to be escaped. Correctly done its this `\((?:\\.|[^\\()])+\)` or `\([^\\()]*(?:\\.[^\\()]*)*\)` –  Jun 16 '16 at 00:03
  • `([/.])?(?:(?::(\w+)(?:\(((?:\\.|[^()])+)\))?|\(((?:\\.|[^()])+)\))([+*?])?|(\*))` Get a program like www.regexformat.com that can parse your strings, format it out for you and press it back. Save yourself some aggravation. –  Jun 16 '16 at 00:10
  • @sln I'd be that novice, do you have an example where it can be unbalanced? – blakeembrey Jun 16 '16 at 00:58
  • I believe I see where you think it'd be unbalanced - I'm not sure if that's a feature though. My original thoughts was that it should match from the first open paren consistently. – blakeembrey Jun 16 '16 at 01:27
  • 1
    @blakeembrey - Using `\((?:\\.|[^()])+\)` lets the escape by itself exist in the wild, rendering the check for `escape + anything` meaningless. Since the engine wants to match as much as it can, even though you have the `escape + anything` first when it see's a `\\(`, it will backtrack, and match the first \ with `[^()]` then match the \\( just to get past it and continue to consume more stuff. It's a not so intuitive thing. –  Jun 16 '16 at 02:20
  • But it won't go back unless it can't find the subsequent matching paren right? The only place I can seem to reproduce the issue is when a closing paren is actually missing, the other cases seem to be handled properly already. It's a great addition though, thanks. – blakeembrey Jun 16 '16 at 03:32
  • @blakeembrey - I was just stating the condition for open parenthesis. For closing, It's really not like you think. For that, he bottom line is that your regex will match anything up to this closing `)`, _escaped or not_. Example, it matches `(\)` and `(\)\)` and `(\\\\\)\\)\)`, etc. So, front to back, the `escape+anything` is largely useless. If it considers `(\)` a valid open / close, it can't consider `(\))` a valid open / close, if you get my drift. Ergo, you can't parse `escape+anything` without _forcing_ the _escape_ itself to be escaped. It's a tenant in string parsing. No free escapes. –  Jun 16 '16 at 19:03
  • Are you sure? I haven't seen what you describe, maybe it's because the escaped characters you talk about are already being ignored in the line above that regexp string. The opening parens were never an issue, since I had https://github.com/pillarjs/path-to-regexp/blob/master/index.js#L20 – blakeembrey Jun 16 '16 at 20:09
  • For example, here's an existing test checking that functionality: https://github.com/pillarjs/path-to-regexp/blob/master/test.ts#L1714-L1727 – blakeembrey Jun 16 '16 at 20:16

0 Answers0