0

I can't figure out why my regex works on regex101 but not programmatically on the command line

>>> import re
>>> s = 'City of Arvada\n\nRevenue Division\nsome address\nArvada, CO 80001\n other words'
>>> regex = r'Arvada( [a-zA-Z]+){0,4}[,\.] [A-Z]{2}'
>>> result = re.findall(regex, s)
>>> print(result)
['']

I've narrowed it down to something to do with the group ( [a-zA-Z]+){0,4} because this regex r'Arvada[,\.] [A-Z]{2}' works via the command line. I feel like I've seen python regex's behave strange with groups (...) before, too

What I would need is to match the first word, then conditionally match 0-4 words after, then a comma and a two digit state abbreviation. So if there is a better way to match that, I am all ears.

I'm guessing it's a Python specific regex difference, but I just don't know enough about Python to figure it out. I am trying this with Python 3.6 (though 2.7 is yielding the same results)

I know there are similar SO posts, but I can find one specific to this issue, perhaps there are too many to search through

Jeremy
  • 1,313
  • 3
  • 20
  • 42

0 Answers0