Regex: When is matching a pattern 0 or more times be useful?

Question

I'd like to ask if some examples are possible for when matching a pattern 0 times is useful?

My understanding from this question is that it is used to also catch an empty string as well as the > 0 occurrences of a string (or character). Are there any practical examples of uses so I can understand the intuition better please?

Well, that answer is clear: if you need to match an optional pattern `*` should be used. If there must be at least one occurrence, use `+`. — Wiktor Stribiżew, Jul 12 '18 at 09:01
Indeed, in some dialects more useful than in others. For example, in Oniguruma/Onigmo (Ruby regexp engine), `{0}` can be very useful for declaring "subroutines". I am not aware of any other dialect in which `{0}` is useful at all except to "comment out" a part of the pattern. — Amadan, Jul 12 '18 at 09:03
Could I please ask you for examples? Thank you both for your answers. — ZakS, Jul 12 '18 at 09:09

Sweeper · Accepted Answer · 2018-07-12T09:40:06.277

If you want to see real examples of the * quantifier being used. Just look around Stack Overflow! Find someone who answers a lot of regex questions, like Wiktor Stribiżew and search for the use of * in his answers. Here's an answer of mine that uses *.

A use case of * that is very common is to match spaces which are optional. Oftentimes when you ask for user input, you'd want to be as permissive as possible (users can add as many spaces as they like, or none at all) instead of following a very strict syntax.

For example, phone numbers. Where I live, phone numbers are 8 digits and they are grouped in groups of 4. e.g.

1234 5678

To be as permissive as possible, a regex like this can be used:

^\s*(\d{4})\s*(\d{4})\s*$

See the use of *? It allows any number (including 0) of trailing, leading spaces, and spaces in the middle. Even if the user accidentally typed two spaces in the middle, the program can still understand them.

The regex can match all these

1234 5678
12345678
   1234 5678   
1234    5678

Or I can be even more permissive and allow spaces everywhere:

^(?:\s*\d\s*){8}$

Anyway, things you think are not useful will come in handy one day, when you need them. When I was learning how to code, I used to think "There's no way this language feature is useful", but when I was actually writing code to solve problems, very often I started using those features that I thought "were not useful". You just haven't encountered problems where it is suitable to use *.

that's a really great answer, thank you. I got that I wasn't getting it, which is why I needed examples. Your answer is very helpful. — ZakS, Jul 12 '18 at 09:42

score 1 · Answer 2 · answered Jul 12 '18 at 09:47

To expand upon comments about {0} quantifier... Ruby's Onigmo is one of the more feature-rich regexp engines (although it has its downsides as well, particularly in Unicode compliance, AFAIK). One very interesting bit is that it allows you to make subroutines - basically, recursively match the defined named groups. This, in turn, almost allows you to make a parser (though I'd still rather suggest Treetop or some other "real" parser library for when you actually need a parser).

Here's a toy example (translation of the Tiny C grammar from https://tomassetti.me/ebnf/). {0} is used to not match the group at the time of definition, but only when explicitly invoked by \g<name> construct.

tiny_c_re = %r{
  (?<program> \g<statement>+){0}
  (?<statement>
    if \g<paren_expr> \g<statement> (?:else \g<statement>)? |
    while \g<paren_expr> \g<statement> |
    do \g<statement> while \g<paren_expr> ; |
    { \g<statement>* } |
    \g<expr> ; |
    ;
  ){0}
  (?<paren_expr> \( \g<expr> \) ){0}
  (?<expr>
    \g<test> |
    \g<id> = \g<expr>
  ){0}
  (?<test> \g<sum> (?: < \g<sum> )? ){0}
  (?<sum> \g<term> (?: [+-] \g<term> )? ){0}
  (?<term> \g<id> | \g<integer> | \g<paren_expr> ){0}
  (?<id> \g<string> ){0}
  (?<integer> \g<int> ){0}
  (?<string> [a-z]+ ){0}
  (?<int> [0-9]+ ){0}

  ^ \g<program> $
}x

good = <<EOF
  count = 5;
  sum = 0;
  while (0 < count) {
    sum = sum + count;
    count = count - 1;
  }
EOF
puts good.gsub(/[ \r\n\t]+/, '') =~ tiny_c_re ? "Correct" : "Syntax error"
# => Correct

bad = <<EOF
  count = 5;
  sum = 0;
  while (0 < count) {
    sum = = sum + count;
    count = count - 1;
  }
EOF
puts bad.gsub(/[ \r\n\t]+/, '') =~ tiny_c_re ? "Correct" : "Syntax error"
# => Syntax error

Regex: When is matching a pattern 0 or more times be useful?

2 Answers2