Regexp: What's the difference between these 4 expressions

Question

Trying to learn regexp but got confused about the syntax. How would these different expressions differ? :

([A-Z]){3}
([A-Z]{3})
[A-Z]{3}
[A-Z]\3 edit: meant ([A-Z])\3

score 2 · Answer 1 · answered Mar 10 '16 at 07:08

2

([A-Z]){3} - matches 3 uppercase letters, there will be 3 groups
([A-Z]{3}) - matches 3 uppercase letters into one group
[A-Z]{3} - matches 3 uppercase letters, no grouping
[A-Z]\3 - should be an invalid regex in most languages (matches one uppercase letter and a backreference to group 3) e.g. ([A-Z])([A-Z])([A-Z])\3 would matche 2 uppercase letters and another uppercase letter that occurs two times

answered Mar 10 '16 at 07:08

Sebastian Proske

7,985
2
26
36

Mistyped the last one. Meant ([A-Z])\3 – r .r Mar 10 '16 at 16:41
@r.r this won't change much, as it is still an invalid backrefernce – Sebastian Proske Mar 10 '16 at 17:04

Tim Biegeleisen · Answer 2 · 2016-03-10T07:21:03.780

2

([A-Z]){3} - This matches three capture groups, with each containing letters from A-Z

([A-Z]{3}) - This is the same as above, but it encloses all three letteres in a single capture group

[A-Z]{3} - This matches letters from A-Z three times, with no capture group

[A-Z]\3 - This matches a single character from A-Z followed by \3 (at least in Java)

You might be wondering what a "capture group" is. It is a way of keeping track of things which matched during the course of evaluating your regular expression. For example, consider your first regex:

([A-Z]){3}

which is equivalent to

([A-Z])([A-Z])([A-Z])

If you evaluated this regex, then, in Java for example, you would be able to access each of the three matched letters using the variables $1, $2, and $3.

edited Mar 10 '16 at 07:21

answered Mar 10 '16 at 07:10

Tim Biegeleisen

387,723
20
200
263

And usually the best approach is to see it yourself: [one](https://regex101.com/r/aC0rS0/1) matches `C` in `ABCDEFG`, [two](https://regex101.com/r/aC0rS0/2) will match `ABC` in `ABCDEFGH` and [three](https://regex101.com/r/aC0rS0/3) matches the same as **two** but does not provide any capture groups. – Jan Mar 10 '16 at 07:13
1

The fourth expalantion is not always correct - it will match `A-Z` and tries then to find a backreference to the third subpattern (which is not there, so the expression will fail). – Jan Mar 10 '16 at 07:16
@Jan I tested the fourth expression in Java, and it _did_ match the sequence `A\3`. So, at least in Java, my explanation is correct. – Tim Biegeleisen Mar 10 '16 at 07:27

Regexp: What's the difference between these 4 expressions

2 Answers2