Regular expression - Complete match

Question

r'^a$' is used as complete match.

Above pattern says... a string should start with letter a and end with letter a.

What stops this pattern(r'^a$') to match string 'anna'?

"string should start with letter a and end with letter a" and that's it. So only "a" will match. You may want to change to `^a.*a$` to include middle characters. — karakfa, Jul 10 '18 at 19:15
@karakfa `'anna'` starts with letter `'a'` and end with letter `'a'` — overexchange, Jul 10 '18 at 19:16
Yes, but you see it's the same `a` that's at the beginning and at the end. See the regex in my earlier comment to achieve what you want. — karakfa, Jul 10 '18 at 19:17

score 3 · Answer 1 · answered Jul 10 '18 at 19:18

3

a string should start with letter a and end with letter a

That's not the only thing the regex says: it also requires the string to have no other characters in between the initial and final letter, meaning that the only string matched by this expression is a single-character string a.

In order to fix this, add .*? to match "the middle" of the string:

^a.*?a$

Note that this expression no longer matches a single-character string a, requiring at least two as to be there.

Demo

answered Jul 10 '18 at 19:18

Sergey Kalinichenko

675,664
71
998
1,399

1

If they still want to match 'a' you could use `^(?:a.*?a|a)$` – Jacob Boertjes Jul 10 '18 at 19:22
1

@JacobBoertjes Right, special-casing the single-character match is a very easy and effective approach to adding a single-character match. Thank you! – Sergey Kalinichenko Jul 10 '18 at 19:23
But... that meaning(*no other characters*) is hidden... unless you test it... Meaning is not visible just by seeing the pattern.. i use this pattern for complete match... just by remembering that it works this way... This is the problem – overexchange Jul 10 '18 at 19:23
1

@overexchange Well, a regex with no asterisks, pluses, curly braces, or question marks gives you a hint at a fixed-length of the match it is going to produce. – Sergey Kalinichenko Jul 10 '18 at 19:26
None of the remaining meta characters are used(`. * + ? { } [ ] \ | ( )`)....you are right. But the idea is... both `re.search(r'^a', 'a')` & `re.search(r'^a', 'aa')` matches as well as `re.search(r'a$', 'a')` & `re.search(r'a$', 'aaaa')` matches... How do i deduce for a hint of fixed-length? – overexchange Jul 10 '18 at 19:29
@overexchange Since you were talking about complete match, I assumed the `^` and `$` anchors would be there. Without both anchors the match itself would still be fixed-length, but of course the length of the whole string on which you have the match would be unlimited. – Sergey Kalinichenko Jul 10 '18 at 19:51
Without both anchors `re.search(r'a', 'aaaaa')` matches.... yes as you said length of the whole string on which you have the match is unlimited.... But how `^` and `$` would make you think complete match... No idea – overexchange Jul 10 '18 at 19:55
@overexchange The reason that this example matches is because it is only matching with a single a. It would also match 'bbbbbbbbabbbbbb'. It looks for only a single 'a'. The `^` indicates the beginning of the string, then `$` indicates the end, this means the only way you can match is if the *entire* string matches, because you have to be including the start and end in the match. – Jacob Boertjes Jul 10 '18 at 19:57

Barmar · Accepted Answer · 2018-07-10T20:08:20.680

1

You're not interpreting it correctly.

A regular expression is processed left-to-right, matching parts of the input as it goes along.

^a$

means that the match starts at the beginning of the string, then has to match a right after, then has to match the end of the string immediately after that.

It's no different from

abc

meaning that b has to follow a immediately, and c has to follow b immediately.

edited Jul 10 '18 at 20:08

answered Jul 10 '18 at 19:55

Barmar

596,455
48
393
495

`^a` mean the match start at the beginning of the string, then has to match `a`. Resulting object points to that `a`. `a$` mean the match start at the end of the string, then has to match `a`. Resulting object points to that `a` – overexchange Jul 10 '18 at 20:01

score 0 · Answer 3 · answered Jul 10 '18 at 19:22

You're interpreting the meaning of the regular expression wrong.

r'^a$' says a string that starts with letter "a" and ends with that same letter "a". That "a" character that is in the expression must be both the starting and ending characters in the string.

To extract strings that start and end with DIFFERENT a's, you can use r^a.*a$. But this requires that the two a's be different. To get any string that starts with "a" and ends with "a", you can OR these two together:

r'^a$|^a.*a$'

Regular expression - Complete match

3 Answers3