2

I have the following code:

import re

pattern = re.compile('^(.+)\'s (.+)$',re.I)

string = "bill's uncle's pony"

matchObjects = pattern.finditer(string)

for i in matchObjects:
    if i:
        print i.group(1)
        print i.group(2)

This produces the output:

bill's uncle
pony

When I hoped it would produce the output

bill's uncle
pony
bill
uncle's pony

That is, I want all the ways in which the string matches. This code only gives me one of them. Any ideas much appreciated.

  • The `+` qualifier is **greedy**, meaning it takes as much text as possible. That's why you got `bill's uncle` instead of `bill`. You'll have to experiment with non-greedy qualifiers to get what you want, but note that those will be *different* regular expressions. You can't have just one RE for multiple match possibilities. – chrisaycock Jul 09 '14 at 16:46
  • 1
    This link may or may not help: http://stackoverflow.com/questions/1667528/regular-expression-listing-all-possibilities – Zhouster Jul 09 '14 at 16:55

2 Answers2

1

"That is, I want all the ways in which the string matches. This code only gives me one of them."

That's not true. It gives you the only way in which your regex matches your input. That's what the greedy + quantifier does: it skips over all possibilites except the last--as long as a match is still possible.

So you'll never get just uncle's with your current regex.

Perhaps you could do this twice, first with your current greedy regex

^(.+)\'s (.+)$

Regular expression visualization

Debuggex Demo

and then again with the reluctant version:

^(.+?)\'s (.+)$

Regular expression visualization

Debuggex Demo

Greedy matches bill's uncle and pony. Reluctant matches bill and uncle's pony.

But the concept of "finding all matches" doesn't really make sense. A regular expression matches, or does not match, an input string in exactly one way.

(I'm surprised the debuggex images are exactly the same.)


Please consider bookmarking the Stack Overflow Regular Expressions FAQ for future reference.

Community
  • 1
  • 1
aliteralmind
  • 18,274
  • 16
  • 66
  • 102
-1

Previously suggested the DOTALL but that was no good. Try using the non-greedy modifier "?"

pattern = re.compile('^(.+?)\'s (.+?)$',re.I)

Still does not give exactley the desired output, but maybe it'll help put us on the right track.

output:

bill
uncle's pony
beiller
  • 3,025
  • 1
  • 9
  • 17