26

I'm wondering if there's a best practice for validation for the Irish Eircode format. My best attempt so far, using REGEX in JavaScript, is the following based on the official spec found on page 11 here.

(Page 11 based on the page numbers in the document, or page 12 if you include the cover)

/^[A,C,D,E,F,H,K,N,P,R,T,V,W,X,Y]{1}[0-9]{1}[0-9,W]{1}[\ \-]?[0-9,A,C,D,E,F,H,K,N,P,R,T,V,W,X,Y]{4}$/

I didn't find any Eircode related questions on here so I thought I'd open up this one and see what other people thought, and to see what better/shorter/more efficient patterns anyone could come up with.

Edit: Removed commas as per @Asunez answer.

/^[ACDEFHKNPRTVWXY]{1}[0-9]{1}[0-9W]{1}[\ \-]?[0-9ACDEFHKNPRTVWXY]{4}$/
ConorLuddy
  • 1,877
  • 16
  • 18

4 Answers4

22

Since @Manwal's answer doesn't exactly do what it should, here is my attempt at shortening the regex for OP:

(?:^[AC-FHKNPRTV-Y][0-9]{2}|D6W)[ -]?[0-9AC-FHKNPRTV-Y]{4}$

Updated version supporting the A65 B2CD postcodes - (?:^[AC-FHKNPRTV-Y][0-9]{2}|D6W)[ -]?[0-9AC-FHKNPRTV-Y]{4}$

This is basically what your Regex is, with a few changes:

  • Removed commas. You do not need commas to list items inside [] brackets.
  • Added ranges where possible and where it would save some space (C-F, V-Y). Elsewhere it's not beneficial to add ranges, as it won't make regex shorter.
  • You do not need to escape a space. " " in regex is literal.
  • You also do not need to escape the dash if it's the last character in character class (square brackets)
  • The first part of the regex is now in a non-capturing group to allow ORing it with the only possible letter for 3rd position, the "D6W" case.

It is also possible to deal with D6W exclusively with lookbehind, but this is more of an art than regex.

See Regex Demo: here

You can also invert the character class to not include given characters, and while it doesn't make the regex shorter, it's also worth noting. However, you need to make sure that other characters (like dots, commas) are not included too. I do it by adding the \W token.

You can try it here

Andrew
  • 603
  • 1
  • 5
  • 19
Asunez
  • 2,179
  • 1
  • 19
  • 44
  • 1
    You can shorten it even further by `^[AC-FHKNPRTV-Y]\d[0-9W][ -]?[0-9AC-FHKNPRTV-Y]{4}$` – Alexey Shein Oct 29 '15 at 10:36
  • 1
    @AlexeyShein Not necessarily - there is a difference between `\d` and `[0-9]`, the first one also accepts hebrew or other languages digits, while the second accepts only 0-9 exclusively. – Asunez Oct 29 '15 at 11:29
  • @Asunez I think this difference can almost always be neglected, but, of course, you should know your dataset. – Alexey Shein Oct 29 '15 at 11:32
  • 1
    @ConorLuddy I edited my answer with an example of using negative character set - you may want to take a look at this. – Asunez Oct 29 '15 at 11:43
  • Interesting technique @Asunez, I never thought of that - thanks! – ConorLuddy Oct 29 '15 at 12:01
  • 2
    Your regex has two issues: it allows dash (not defined in the eircode spec) and accepts all routing codes that end with W and are invalid like: A1W, C2W. Big plus for optional space - not defined but shown in eircode examples and following the same patter as British post-codes. – iaforek Sep 19 '16 at 13:03
  • @iaforek That's true - as I mentiond in my answer, _It is also possible to deal with D6W exclusively with lookbehind_. However, as of my knowledge, it would make the regex way longer and less legible. My answer is the best I could think of without using lookarounds. – Asunez Sep 21 '16 at 09:19
12

According to Product guide chapter 1.5.4 allowed signs are:

-----------------------------------------------------------------------
|     Component     | Position | Allowed characters                   |
-----------------------------------------------------------------------
| Routing Keys      |    1     | A,C,D,E,F,H,K,N,P,R,T,V,W,X,Y        |
-----------------------------------------------------------------------
| Routing Keys      |    2     | 0-9                                  |
-----------------------------------------------------------------------
| Routing Keys      |    3     | 0-9 with the exception of W for D6W  |
-----------------------------------------------------------------------
| Unique Identifier |    4     | 0-9, A,C,D,E,F,H,K,N,P,R,T,V,W,X,Y   | 
-----------------------------------------------------------------------
| Unique Identifier |    5     | 0-9, A,C,D,E,F,H,K,N,P,R,T,V,W,X,Y   | 
-----------------------------------------------------------------------
| Unique Identifier |    6     | 0-9, A,C,D,E,F,H,K,N,P,R,T,V,W,X,Y   | 
-----------------------------------------------------------------------
| Unique Identifier |    7     | 0-9, A,C,D,E,F,H,K,N,P,R,T,V,W,X,Y   | 
-----------------------------------------------------------------------

Every routing key must contain letter and two digits except ONE specific situation which is D6W code.

So codes begening with A5W, C6W, V0W are invalid.

According to chapter 1.5.1 Recommendations for Storage and Presentation

  • An Eircode should always be stored as a single string of seven upper case characters in IT systems, i.e. A65F4E2.
  • An Eircode should always be presented in upper case as two parts separated by a space, on stationary, mail items, computer forms, etc. i.e. A65 F4E2 and never A65F4E2.

Codes stored in database shouldn't be separated with space or dash, should be separated but only by space and only for displaying.

Assuming, correct regex should looks like:

/([AC-FHKNPRTV-Y]\d{2}|D6W)[0-9AC-FHKNPRTV-Y]{4}/

Regex online tester

Ericode guide

hywak
  • 763
  • 1
  • 10
  • 24
  • 1
    hi kmike, your regex is good for storing eircodes ("Eircode should always be stored as a single string of seven upper case characters") but not great for validating user inputted eircodes, which are likely to contain spaces. – michaelmcandrew Jan 08 '19 at 22:47
  • Also, you need to wrap it is a ^ and $ otherwise it is going to also validate things like 'not an A65TA33 eircode' – michaelmcandrew Jan 09 '19 at 13:08
  • @michaelmcandrew If specification says that it single string, then there shouldn't be possibility to pass eircode with spaces. Front-end should filter or validate input too, same as backend does. I've put `A65TA33` to regex tester linked in my answer and it's marked as valid. – hywak Jan 09 '19 at 14:26
  • Given that "Eircode should always be presented in upper case as two parts separated by a space", that's how people will type it in. I'm looking for a regex that will treat that user input as valid. I suspect lots of others will do too. – michaelmcandrew Jan 10 '19 at 16:07
  • To clarify my second comment, without the ^ and $, your regex also marks XA65TA33X as valid, but it not a valid Eircode. – michaelmcandrew Jan 10 '19 at 16:08
  • Just to weigh in on this: I appreciate that a bit of preformatting can be done to strip spaces or change character case, but far more often than not, people _do_ add a space between the routing key and the unique identifier. Eircodes are often presented this way in print and online—notably, even the Eircode lookup site itself always includes this space. Adding `\s*` after the routing key might make this regex more robust for validation. – Darragh Enright Feb 23 '19 at 02:06
  • Eircode's official spec linked in the question recommends that a space be used—"Our recommendation for displaying an Eircode on screens or printed correspondence is to use the three plus four format – i.e. Routing Key, space, Unique Identifier". – Darragh Enright Feb 23 '19 at 02:08
5

Updated this answer avoiding char B. You can try this:

/^[AC-Y]{1}[0-9]{1}[0-9W]{1}[ \-]?[0-9AC-Y]{4}$/

Description:

^ assert position at start of the string
[AC-Y]{1} match a single character present in the list below
Quantifier: {1} Exactly 1 time (meaningless quantifier)
A the literal character A (case sensitive)
C-Y a single character in the range between C and Y (case sensitive)
[0-9]{1} match a single character present in the list below
Quantifier: {1} Exactly 1 time (meaningless quantifier)
0-9 a single character in the range between 0 and 9
[0-9W]{1} match a single character present in the list below
Quantifier: {1} Exactly 1 time (meaningless quantifier)
0-9 a single character in the range between 0 and 9
W the literal character W (case sensitive)
[ \-]? match a single character present in the list below
Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
  the literal character  
\- matches the character - literally
[0-9AC-Y]{4} match a single character present in the list below
Quantifier: {4} Exactly 4 times
0-9 a single character in the range between 0 and 9
A the literal character A (case sensitive)
C-Y a single character in the range between C and Y (case sensitive)
$ assert position at end of the string
Manwal
  • 22,117
  • 10
  • 57
  • 89
  • Well, this isn't exactly what OPs regex do. Note that yours will accept codes with B in the beginning, for example. – Asunez Oct 28 '15 at 15:36
  • No problem, but it's not only `B` that's not present in Eircode. Also your regex (even after edit) will accept `B`s as your range is *from* `B` *to* `Y` inclusive. – Asunez Oct 29 '15 at 07:50
  • Ohhhhhh God completely missed. Updated answer thanks again. @Asunez – Manwal Oct 29 '15 at 10:30
  • Yeah it's not just the letter B that should be excluded though. Check out page 11/12 of this for details - https://www.eircode.ie/docs/default-source/Common/prepareyourbusinessforeircode-edition3published.pdf?sfvrsn=2 – ConorLuddy Oct 29 '15 at 11:27
3

Starting from hywak answer and following the other comments suggestions, this is my php regex:

/^([AC-FHKNPRTV-Y]\d{2}|D6W)\s[0-9AC-FHKNPRTV-Y]{4}$/

I added ^ and $ to define the starting and ending of the string. Added \s to consider the space and accept the format XXX XXXX.

Reference regarding format letter/numbers and letters to avoid: https://en.wikipedia.org/wiki/List_of_postal_codes

Regex tester

Here the explanation of the last codes that do pass the test:

  • D14 N2Fz -> Last letter lowercase
  • a65 f4e2 -> All charcters are lowercase
  • D6W FNTO -> Letter O is not allowed
alula
  • 101
  • 4