0

I am trying to extract a possible postcode from anywhere in string using PRXCHANGE.

PRXCHANGE ( "s/^.*([A-Z][A-Z]?[0-9][0-9A-Z]?\s?[0-9][A-Z][A-Z]).*$/$1/" , -1 , a.address )

The regular expression correctly identifies postcodes but the replacement always removes the first letter of the postcode such that AZ12 3ZA will become Z12 3ZA.

I attempted tried several variation of the first two values such as [A-Z]{1,2} but always had the same issue.

I added spaces at the start and end of the postcode regex as follows;

PRXCHANGE ( "s/^.*( [A-Z][A-Z]?[0-9][0-9A-Z]?\s?[0-9][A-Z][A-Z] ).*$/$1/" , -1 , a.address )

This seems to fix the problem and correctly returns possible postcodes. I don't understand why the space after the open bracket has worked. Can anyone explain why the first doesn't work but the second does?

Thanks

  • Replace the first `.*` with `.*?`. To see what your regex does, use http://regex101.com – Wiktor Stribiżew Aug 16 '17 at 10:14
  • In your first example the expression between paretheses looks for something starting with 1 or 2 letters and followed by a number. Therefore both 'AZ12 3ZA' and 'Z12 3ZA' match that criteria. In the former case, the starting 'A' matches `.*`. If you add an explicit search for a space, 'Z12 3ZA' no longer matches the expression because SAS explicitely looks for a space followed by one or two letters and then a number. Only ' AZ12 3ZA' matches. – user2877959 Aug 16 '17 at 10:31

0 Answers0