1

I was trying to match text with regular expression.

The following code works a bit strangely. It returns the result twice.

regex = r"((\w+\s*){1,3} from)"
test_str = "text text this is Alex Smith from text text"
re.findall(regex, test_str)

Can someone point whats wrong here?

If you are curious, my final goal is to mach any 2/3 words NAMEs like 'Alex Smith or Mr. Alex Smith' between (or at one side) specific text. For instance,

1. this is Alex Smith from Japan (2/3 words after 'this is' or before from)
2. this is Mr. Alex Smith from japan  (Optional Mr.)

2.     Mr. Alex Smith from Tokyo  (2/3 words before from)
3. this is Alex Smith text text

So basically it should trigger on 'this is' or 'from'. Any suggestion? text text Alex Smith from Japan I am Alex Smith text text

Droid-Bird
  • 1,085
  • 5
  • 12
  • 32
  • Capturing groups make `re.findall` return a list of tuples in this case. Convert to non-capturing those parts you do not need to return. – Wiktor Stribiżew Apr 17 '18 at 10:37
  • Thank you for your answer. What confusing me is, its returning <> Why is it matching with Smith separately again? – Droid-Bird Apr 17 '18 at 10:51
  • `Smith` is the value of Group 2, that is why it is output. See [Repeating a Capturing Group vs. Capturing a Repeated Group](https://www.regular-expressions.info/captureall.html), and [Capturing repeating subpatterns in Python regex](https://stackoverflow.com/questions/9764930/capturing-repeating-subpatterns-in-python-regex). – Wiktor Stribiżew Apr 17 '18 at 10:52

0 Answers0