2

In python, suppose I want to search the string

"123" 

for occurrences of the pattern

"abc|1.*|def|.23" .

I would currently do this as follows:

import re
re.match ("abc|1.*|def|.23", "123") .

The above returns a match object from which I can retrieve the starting and ending indices of the match in the string, which in this case would be 0 and 3.

My question is: How can I retrieve the particular word(s) in the regular expression which matched with

"123" ?

In other words: I would like to get "1.*" and ".23". Is this possible?

  • 2
    If your pattern is always in the form `pattern1|pattern2|pattern3|...`, you can manually split it and test each one individually using `re.match`. – Aziz Aug 01 '20 at 00:10
  • 1
    You say you want those which "matched". That's only `1.*`. The engine stopped checking when that one matched. – superb rain Aug 01 '20 at 00:20
  • 1
    Getting all the words which matched would be a "nice to have" for me. I really need a way to get at least one of them. – James Hungerford Aug 01 '20 at 00:28

2 Answers2

3

Given your string always have a common separator - in our case "|"

you can try:

str = "abc|1.*|def|.23"

matches = [s for s in str.split("|") if re.match(s, "123")]
print(matches)

output:

['1.*', '.23']
JaySabir
  • 156
  • 9
3

Another approach would be to create one capture group for each token in the alternation:

import re

s = 'def'
rgx = r'\b(?:(abc)|(1.*)|(def)|(.23))\b'

m = re.match(rgx, s)
print(m.group(0)) #=> def
print(m.group(1)) #=> None
print(m.group(2)) #=> None
print(m.group(3)) #=> def
print(m.group(4)) #=> None

This example shows the match is 'def' and was matched by the 3rd capture group,(def).

Python code

Cary Swoveland
  • 94,081
  • 5
  • 54
  • 87
  • I just noticed that the question asks that all elements of the alternation that match a string be identified (though a comment on the question puts that in doubt). My solution will only identify the first match, of course. Obviously, one must iterate through the elements of the alternation to identify all that match. – Cary Swoveland Aug 01 '20 at 02:08
  • Bingo! Thanks, Cary. This is what I was looking for, an approach that only requires you to call the match method once. Its a shame that you can't see more than one match, but I suppose that is just for optimization, I'll live. – James Hungerford Aug 01 '20 at 13:03