Why is the special character not captured in the regex group

Question

I have the following regular expression for capturing positive & negative time offsets.

\b(?<sign>[\-\+]?)(?<hours>2[1-3]|[01][0-9]|[1-9]):(?<minutes>[0-5]\d)\b

It matches fine but the leading sign doesn't appear in the capture group. Am I formatting it wrong? You can see the effect here https://regex101.com/r/CQxL8q/1/

score 1 · Accepted Answer · answered Sep 28 '17 at 10:22

That is because of the first \b. The \b word boundary does not match between a start of the string/newline and a - or + (i.e. a non-word char).

You need to move the word boundary after the optional sign group:

(?<sign>[-+]?)\b(?<hours>2[1-3]|[01][0-9]|[1-9]):(?<minutes>[0-5][0-9])\b
              ^^

See the regex demo.

Now, since the char following the word boundary is a digit (a word char) the word boundary will work correctly failing all matches where the digit is preceded with another word char.

axiac · Answer 2 · 2017-09-28T10:52:38.000

The word boundary anchor (\b) matches the transition between a word character (letter, digit or underscore) to a non-word character or vice-versa. There is no such transition in -13:21.

The word boundary anchor could stay between the sign and the hours to avoid matching it in expressions that looks similar to a time (65401:23) but you cannot prevent it match 654:01:23 or 654-01:23.

As a side note [\-\+] is just a convoluted way to write [-+]. + does not have any special meaning inside a character class, there is no need to escape it. - is a special character inside a character class but not when it is the first or the last character (i.e. [- or -]).

Another remark: you use both [0-9] and \d in your regex. They denote the same thing¹ but, for readability, it's recommended to stick to only one convention. Since other character classes that contain only digits are used, I would use [0-9] and not \d.

And some bugs in the regex fragment for hours: 2[1-3]|[01][0-9]|[1-9] do not match 0 (but it matches 00) and 20.

Given all the above corrections and improvements, the regex should be:

(?<sign>[-+]?)\b(?<hours>2[0-3]|[01][0-9]|[0-9]):(?<minutes>[0-5][0-9])\b

¹ \d is the same as [0-9] when the Unicode flag is not set. When Unicode is enabled, \d also matches the digits in non-Latin based alphabets.

@paddyb: Just FYI: [`\d` is not always matching the same chars as `[0-9]`](https://stackoverflow.com/a/16621778/3832970). It is also true if we use the pattern in Python 3, or with a unicode modifier in PHP, Java, Python 2. — Wiktor Stribiżew, Sep 28 '17 at 10:46

Why is the special character not captured in the regex group

2 Answers2