1

I want to write a program which finds palindromes (words which are the same from start to end and end to start like anna).

But it should also work for multiple words car a rac and inside sentences asdcar a racbnm.

I wrote a regular expression to find the span of a start of a palindrome:

([a-z])(\s*)[a-z]?(\\2)(\\1)

It finds a letter then there can be spaces, then there can be another letter, spaces again, and the first letter again.

It works fine but for the string xxxxx it behaves strange:

import re
p = re.compile('([a-z])(\s*)[a-z]?(\\2)(\\1)')
finds = p.finditer('xxxxx')
for m in finds:
    print m.span()

output

(0, 3)
(3, 5)

It doesn't find the one I'm searching for: (1, 4)

What´s wrong with my re?

Edit: it should just find the start of the palindrome. The algorithm will do the rest.

Stefan van den Akker
  • 5,714
  • 7
  • 38
  • 57
  • you 'solved ' a problem using a RegEx - now you have two problems! – Mitch Wheat Apr 27 '14 at 09:22
  • 3
    I'm inclined to say that regular expressions aren't a good way to search for palindromes as palindromes aren't a regular language. It doesn't mean it can't be done (backreferences allow recognizing of many non-regular languages) but it could be done with less hassle and probably more efficiently as well using other tools. – kviiri Apr 27 '14 at 09:29
  • 1
    You are matching ` - - - - `. That first matches at position 0, but regular expressions never match overlapping regions. – Martijn Pieters Apr 27 '14 at 09:30
  • thanks Martjin i think this is the solution – Florian Groetzner Apr 27 '14 at 09:33

1 Answers1

3

Your regular expression cannot match overlapping regions (you'd need to play with look-arounds with capturing groups to do that).

The expression matches the first three x characters first; it matches:

  • one character (group 1), zero spaces (group 2), an optional character (the ? is greedy), the zero spaces from group 2, the one character from group 1.

The second match then has to start after that; the two xx characters match because the [a-z]? pattern is optional.

You cannot create a regular expression to match palindromes in general (at least not with the Python re engine), as there is no facility to match an arbitrary-width previous group in reverse.

Community
  • 1
  • 1
Martijn Pieters
  • 889,049
  • 245
  • 3,507
  • 2,997
  • 1
    "Regular" expressions *can* match overlapping regions (or at least find overlapping matches): http://stackoverflow.com/q/11326284/20670, and can even find palindromes if the engine is sufficiently sophisticated (Python's isn't, though). For example, PHP can use `/^(?:(.)(?=.*(\1(?(2)\2|))$))*.?\2?$/` to detect palindromes: http://stackoverflow.com/q/3746487/20670 – Tim Pietzcker Apr 27 '14 at 10:09
  • 1
    @TimPietzcker: Right, I was not familiar with any engine that could; the PHP engine looks sufficiently painful.. – Martijn Pieters Apr 27 '14 at 10:11
  • 1
    @TimPietzcker: I had not yet learned about the look-around-with-capturing-group tricks; added to my regex playbook. Updated the answer with your references, thanks. – Martijn Pieters Apr 27 '14 at 10:19