0

I found and tested a regex to validate a time string such as 11:30 AM:

^(1[0-2]|0?[1-9]):([0-5][0-9])(\s[A|P]M)\)?$

I understand most of it except the beginning:

(1[0-2]|0?[1-9])

Can someone explain what is going on? 1[0-2] is there is a fist digit that can be between 0 and 2? And then I don't understand |0?[1-9].

Mischa
  • 41,338
  • 8
  • 91
  • 105
Aaron
  • 3,790
  • 11
  • 69
  • 121
  • 1
    @Xuflx, what is your rationale for suggesting this may be a dup of the question you referenced. That question seems to just deal with regular expressions in a general way. – Cary Swoveland Oct 26 '15 at 03:11
  • @Mischa is correct. This matches `"11:30 |M"`. – Cary Swoveland Oct 26 '15 at 03:16
  • 2
    @CarySwoveland: That is a question for closing dup all these "explain me this regex". All these questions are useless - just take a look at the reference questions, or use an online tester. – nhahtdh Oct 26 '15 at 03:30
  • 1
    Readers: this question was previously closed (by Xufix and nhahtdh) as a dup (of the same question now referenced by Jerry) and then reopened. If you do not regard it as a dup, please vote to re-open. (I cannot--the SO software says I've already voted to reopen, which I find a bit odd.) – Cary Swoveland Oct 26 '15 at 11:19
  • 1
    @nhahtdh from that "Reference" question: "regex is suffering from give me ze code type of questions and poor answers with no explanation. This reference is meant to provide links to quality Q&A." Neither "give me ze code" nor "poor answers" are the case here. It's a normal question with good answers, so why the need to close? – Mischa Oct 26 '15 at 12:00
  • (context: this popped in my review queue, and I abstained) Yes, this question is well formed, and a definitive answer exists. However, I believe such questions are unlikely to be helpful in general: the question is hard to search for (who would search for `"1[0-2]"`) and the right answer would be very specific (something along the lines of the output of https://regex101.com/, as suggested in one of the comments in the "reference" question. – RandomSeed Oct 26 '15 at 12:30
  • 1
    @RandomSeed, it's not hard to search for. You'll find it by searching for "regex to validate a time", which I can imagine to be a quite common query. The reference question however will not be in the search results for that query. – Mischa Oct 26 '15 at 13:26
  • 1
    Aaron, I suggest you remove "(1[0-2]|0?[1-9])" from the title. It adds nothing, is a distraction and may suggest to some that your question probably does not have merit. – Cary Swoveland Oct 27 '15 at 05:22

2 Answers2

3
(1[0-2]|0?[1-9])

| separates the regex into two parts, where

1[0-2]

matches 10, 11 or 12, and

0?[1-9]

matches 1 to 9, with an optional leading 0.

Yu Hao
  • 111,229
  • 40
  • 211
  • 267
2

I will explain by writing the regex in extended mode, which permits comments:

r = /
    ^     # match the beginning of the string
    (     # begin capture group 1
    1     # match 1
    [0-2] # match one of the characters 0,1,2
    |     # or
    0?    # optionally match a zero
    [1-9] # match one of the characters between 1 and 9
    )     # end capture group 1
    :     # match a colon
    (     # begin capture group 2
    [0-5] # match one of the characters between 0 and 5
    [0-9] # match one of the characters between 0 and 9
    )     # end capture group 2
    (     # begin capture group 3
    \s    # match one whitespace character
    [A|P] # match one of the characters A, | or P
    M     # match M
    )     # end capture group 3
    \)?   # optionally match a right parenthesis
    $     # match the end of the string
    /x    # extended mode

As noticed by @Mischa, [A|P] is incorrect. It should be [AP]. That's because "|" is just an ordinary character when it's within a character class.

Also, I think the regex would be improved by moving \s out of capture group 3. We therefore might write:

r = /^(1[0-2]|0?[1-9]):([0-5][0-9])\s([AP]M)\)?$/

It could be used thusly:

result = "11:39 PM" =~ r
if result
  puts "It's #{$2} minutes past #{$1}, #{ $3=='AM' ? 'anti' : 'post' } meridiem."
else
  # raise exception?
end
  #=> It's 39 minutes past 11, post meridiem.

In words, the revised regex reads as follows:

  • match the beginning of the string.
  • match "10", "11", "12", or one of the digits "1" to "9", optionally preceded by a zero, and save the match to capture group 1.
  • match a colon.
  • match a digit between "0" and "5", then a digit between "0" and "9", and save the two digits to capture group 2.
  • match a whitespace character.
  • match "A", or "P", followed by "M", and save the two characters to capture group 3.
  • optionally match a right parenthesis.
  • match the end of the string.
Cary Swoveland
  • 94,081
  • 5
  • 54
  • 87
  • `[A|P]` simply means `A` or `P`, right? Not "match one of the characters A, | or P" – Mischa Oct 26 '15 at 02:37
  • 1
    @Mischa, in a character class, no. It is just the character `|`. Only outside a character class does it mean "or". – Cary Swoveland Oct 26 '15 at 02:39
  • So this regex is wrong, because I assume they want `A` or `P`. How would you write that correctly? Simply `[AP]`? – Mischa Oct 26 '15 at 02:59
  • @Mischa, if you are correct, yes, that would be just `[AP]`, but I don't see how you can draw that conclusion. – Cary Swoveland Oct 26 '15 at 03:01
  • 1
    Are you kidding? If they are trying to match a time, why would they want to match something like `10:30 |M`? – Mischa Oct 26 '15 at 03:12
  • 1
    @Mischa, ha! I didn't look at the context until after I had completed my answer. You are correct, of course! I finally clued in and left a comment on the question just before reading your comment. – Cary Swoveland Oct 26 '15 at 03:19