11

Wishing to put some order into my knowledge of regular expressions I decided to go through a book about them, Introducing Regular Expressions. And I know it's silly but one of the introductory examples doesn't make sense to me.

(\d)\d\1

Sample text:

123-456-7890

(should capture the first number, 123)

Can anyone explain what is going on in here?

As far as I can figure out, the first \d captures the number 123. The \1 backreferences (marks) the group for later use. The parenthesis limit the scope of the group. But what does the second \d does?

Simple explanation, like to a small child or a golden retriever are prefered.

Rook
  • 54,867
  • 44
  • 156
  • 233
  • 1
    To truly grok regex, go read: [Mastering Regular Expressions (3rd Edition)](http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124 "By Jeffrey Friedl. Best book on Regex - ever!") – ridgerunner Feb 19 '14 at 16:33
  • @ridgerunner - I think that one is a little too advanced for me at this stage. Maybe, in the days to come :) – Rook Feb 19 '14 at 16:40
  • Actually, MRE is written in a tutorial style and starts from the ground up starting with the very basics. Hands down, the most useful book I've ever read. – ridgerunner Feb 19 '14 at 17:51
  • This question has been added to the [Stack Overflow Regular Expression FAQ](http://stackoverflow.com/a/22944075/2736496), under "Groups". – aliteralmind Apr 10 '14 at 00:24

2 Answers2

13

\d is just one digit.

This regular expression doesn't match the "123-456-7890" string but it would match "323" (which could be part of a greater string, for example "323-456-7890") :

 (\d) : first digit ("3")
 \d   : another digit ("2")
 \1   : first group (which was "3")

Now, if your book pretends that (\d)\d\1 should capture "123" in "123-456-7890", then it might contain an error...

Denys Séguret
  • 335,116
  • 73
  • 720
  • 697
  • 1
    If I understood you right. The first \d is just one digit (one character). In a parenthesis which represent a group. The second \d is just another digit. The \1 references to the last parenthesis, e.g. group no.1. Could I have just put \2, if I wanted to backreference it that way later on? – Rook Feb 19 '14 at 14:14
  • (\d) matched 3, \d matched 2, and \1 matched 3 again since that's what was matched / referenced from the first group? – Rook Feb 19 '14 at 14:16
  • @ldigas yes to all, apart I've not understood your question regarding \2. – Denys Séguret Feb 19 '14 at 14:51
  • Could I have named it ref. group "2" ... \2 instead of \1, or does \1 stand for 1st? – Rook Feb 19 '14 at 16:32
  • No : \1 is always the first one. The second group is named \2, etc. – Denys Séguret Feb 19 '14 at 16:52
8

(\d)\d\1 step by step:

  1. The first \d matches one digit
  2. And the parentheses () mark this as a capturing group - this is the first one, so the digit is remembered as "group 1"
  3. The second \d says there is another digit
  4. \1 says "here is the value from our previous group 1" - that is the digit that was matched in step 1.

So like dystroy already said: the regex should match a sequence of three digits of which the first and the third are equal.

piet.t
  • 11,035
  • 20
  • 40
  • 49