I can't figure out why my regex works on regex101 but not programmatically on the command line
>>> import re
>>> s = 'City of Arvada\n\nRevenue Division\nsome address\nArvada, CO 80001\n other words'
>>> regex = r'Arvada( [a-zA-Z]+){0,4}[,\.] [A-Z]{2}'
>>> result = re.findall(regex, s)
>>> print(result)
['']
I've narrowed it down to something to do with the group ( [a-zA-Z]+){0,4}
because this regex r'Arvada[,\.] [A-Z]{2}'
works via the command line. I feel like I've seen python regex's behave strange with groups (...)
before, too
What I would need is to match the first word, then conditionally match 0-4 words after, then a comma and a two digit state abbreviation. So if there is a better way to match that, I am all ears.
I'm guessing it's a Python specific regex difference, but I just don't know enough about Python to figure it out. I am trying this with Python 3.6 (though 2.7 is yielding the same results)
I know there are similar SO posts, but I can find one specific to this issue, perhaps there are too many to search through