0

I am parsing out zip codes from address strings which are stored in a pandas column using regex. The zipcodes are 5 digits, however, there are building/unit numbers that are also 5 digits. So, I'd like the last instance of the match/search.

Here's my code:

# Function to search Zipcode from Address

def zipregex(address):
    
    zipre = re.search('(\d{5})([- ])?(\d{4})?', address)
    
    if zipre:
    
        print(address, zipre.groups())

# Function call
df['Zip'] = df.apply(lambda x: zipregex(x['Address']), axis=1)

returns,

642b N 17th Ave, Phoenix, AZ 85007, USA ('85007', None, None)
38956-38962 N New River Rd, Peoria, AZ 85383, USA ('38956', '-', '3896')

In the 2nd case, I need it to return 85383 and not 38956-38962.

kms
  • 741
  • 7
  • 22
  • If your goal is to just get the zipcode I'm not understanding your current regex. Why not something along the lines of `\s(\d{5})\D*$`. – JvdV Jul 04 '20 at 17:41
  • 1
    Matching last occurrence is a known and solved problem, `x(?!.*x)`. So, you may just use `\d{5}(?!.*\d{5})`. Or, if there can be no digits after the five digits, you may use JvdV's suggestion. – Wiktor Stribiżew Jul 04 '20 at 17:42

0 Answers0