-2

I study Regex from last 5 days and still don't get it. It seems Regex is not possible to understand.

If (\w) mean every alphanumeric character in groups, \1 mean group number 1 and {4,} mean match it 4 or more times, how Regex (\w)\1{4,} in string "aa bbbb abcdefg ccccc 111121111 999999999" match to "ccccc"? Can someone explain me it? For me with (\w)\1 the answer should be just second "a". Cant find the answer anywhere.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
Lukas
  • 7
  • 1
  • 1
  • The pattern `(\w)\1{4,}` repeats 4 or more times the value captured in group 1. Number 4 is the minimum part of the quantifier, so it can not match the `aa ` at the start. – The fourth bird Sep 07 '20 at 21:58
  • so in my opinion it should be 'aaaa' and it is not – Lukas Sep 07 '20 at 22:00
  • There is no `aaaaa` in the string. See https://regex101.com/r/tjUwn1/1 Also see https://stackoverflow.com/questions/21880127/have-trouble-understanding-capturing-groups-and-back-references and for example https://stackoverflow.com/questions/17032914/what-do-comma-separated-numbers-in-curly-braces-at-the-end-of-a-regex-mean – The fourth bird Sep 07 '20 at 22:01
  • You literally just (correctly) explained what the regex pattern does. How do you expect it to match `aa`? You need at least `aaaaa`. Yes, `(\w)\1` should match `aa`, but your pattern is _**not**_ `(\w)\1`. It's `(\w)\1{4,}`. – 41686d6564 Sep 07 '20 at 22:04
  • Im using regex101. But cant understand what it is saying. Thats why i wrote here. – Lukas Sep 07 '20 at 22:05

1 Answers1

-1

This regex just means "match a word character((\w)), then match it again (\1) 4+ times ({4,})", or just "match the same word character 5+ times, but only capture one character". For instance, ccccc matches because first (\w) matches the first "c", then \1{4,} matches 4+ of the first capture group, or 4+ "c"s.

SuperStormer
  • 3,554
  • 4
  • 16
  • 29