-1

I have two regular expressions that I am having a hard time understanding. Below are the two expressions followed by what I believe to be correct. I am not sure if these expressions are either RE or ERE.

^\([a-z]\)\1

Search the beginning of the line any lowercase letter from a to z and matches one occurrence.

^.*\([a-z]*\).*\1.*\1

Search for at the beginning of the line, any single character followed by zero or more lowercase letters a through z followed by any single character then followed by any two additional "any" characters or no characters

Cœur
  • 32,421
  • 21
  • 173
  • 232

2 Answers2

2

^\([a-z])\1

  • ^ match begin of input
  • (...) capture everything inside
  • [a-z] this character class matches any (lowercase) character from a-z
  • \1 back reference to the first capture group

The first ( in the capture group (...) is escaped, the second is not, so this regex is syntactically incorrect.

^.*\([a-z]*\).*\1.*\1

  • ^.* match everything from the beginning of input
  • [a-z]* match 0..* lowercase characters from a-z
  • .* match everything
  • \1 back reference to the first capture group (any number of lowercase letters)

Two comments:

  1. Here, both (and ) are well esacped
  2. This basically matches three times the following and therefore doesn't make much sense:
    • everything
    • any number of lower case characters

To learn more about why it doesn't make much sense I'd suggest to read about greedyness in regular expressions.

steffen
  • 13,255
  • 2
  • 36
  • 69
  • `\1` is supported in Vim, at least it is in my version – JGNI Oct 02 '18 at 08:33
  • @JGNI What's your version? In my vim 8.0 searching with `/` doesn't seem to support back references. – steffen Oct 02 '18 at 08:36
  • My version is 8.0.707 test data is `bbdf` with the regex `/^\([a-z]\)\1` it matches `bb`. Note that vim uses V5 Regexes not Perl style so you have to escape the `()` i.e. write `\(\)` to cereate a capture buffer – JGNI Oct 02 '18 at 08:40
  • I stand corrected :-) I didn't have the correct data in my buffer. – steffen Oct 02 '18 at 08:47
2

Your first regex is invalid as it has an unmatched \( assuming that you meant ^\([a-z]\)\1 you have the following

^ Match at start of line
\([a-z]\) Match a lower case letter and put it into capture buffer 1
\1 Match what is in capture buffer 1

More simply match any line that starts with the same two lower case characters

For ^.*\([a-z]*\).*\1.*\1

^ Match at start of line
.* Match 0 or more characters
\([a-z]*\) Match a lower case letter 0 or more times and place in capture buffer 1
.* Match 0 or more characters
\1 Match what is in capture buffer 1
.* Match 0 or more characters
\1 Match what is in capture buffer 1

I suspect that this is trying to match a line that has any sequence of lower case letters three or more times in it. However it is badly written and will match any line as \([a-z]*\) can match an empty string which with the rest of the regex can match the whole regex at the start if the line before the first character. To fix this you need to change \([a-z]*\) to \([a-z][a-z]*\). I.e. make sure that you capture at least one lower case letter.

JGNI
  • 3,524
  • 9
  • 19