2

Trying to learn regexp but got confused about the syntax. How would these different expressions differ? :

([A-Z]){3}
([A-Z]{3})
[A-Z]{3}
[A-Z]\3 edit: meant ([A-Z])\3
r .r
  • 41
  • 7

2 Answers2

2
  • ([A-Z]){3} - matches 3 uppercase letters, there will be 3 groups
  • ([A-Z]{3}) - matches 3 uppercase letters into one group
  • [A-Z]{3} - matches 3 uppercase letters, no grouping
  • [A-Z]\3 - should be an invalid regex in most languages (matches one uppercase letter and a backreference to group 3) e.g. ([A-Z])([A-Z])([A-Z])\3 would matche 2 uppercase letters and another uppercase letter that occurs two times
Sebastian Proske
  • 7,985
  • 2
  • 26
  • 36
2

([A-Z]){3} - This matches three capture groups, with each containing letters from A-Z

([A-Z]{3}) - This is the same as above, but it encloses all three letteres in a single capture group

[A-Z]{3} - This matches letters from A-Z three times, with no capture group

[A-Z]\3 - This matches a single character from A-Z followed by \3 (at least in Java)

You might be wondering what a "capture group" is. It is a way of keeping track of things which matched during the course of evaluating your regular expression. For example, consider your first regex:

([A-Z]){3}

which is equivalent to

([A-Z])([A-Z])([A-Z])

If you evaluated this regex, then, in Java for example, you would be able to access each of the three matched letters using the variables $1, $2, and $3.

Tim Biegeleisen
  • 387,723
  • 20
  • 200
  • 263
  • And usually the best approach is to see it yourself: [one](https://regex101.com/r/aC0rS0/1) matches `C` in `ABCDEFG`, [two](https://regex101.com/r/aC0rS0/2) will match `ABC` in `ABCDEFGH` and [three](https://regex101.com/r/aC0rS0/3) matches the same as **two** but does not provide any capture groups. – Jan Mar 10 '16 at 07:13
  • 1
    The fourth expalantion is not always correct - it will match `A-Z` and tries then to find a backreference to the third subpattern (which is not there, so the expression will fail). – Jan Mar 10 '16 at 07:16
  • @Jan I tested the fourth expression in Java, and it _did_ match the sequence `A\3`. So, at least in Java, my explanation is correct. – Tim Biegeleisen Mar 10 '16 at 07:27